EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings

He, Meide; Tan, Lei; Yang, Xiaohui; Liu, Fei; Zhao, Zhimin; Wu, Xiaochun

doi:10.3390/app152412859

Open AccessArticle

EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings

by

Meide He

^1,2,

Lei Tan

^1,3,*

,

Xiaohui Yang

¹,

Fei Liu

²,

Zhimin Zhao

² and

Xiaochun Wu

⁴

¹

Beijing Municipal Engineering Research Institute, Beijing 100037, China

²

Beijing Third Construction Engineering Quality Inspection Institute Co., Ltd., Beijing 100037, China

³

State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University, Beijing 100044, China

⁴

School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(24), 12859; https://doi.org/10.3390/app152412859 (registering DOI)

Submission received: 1 November 2025 / Revised: 30 November 2025 / Accepted: 3 December 2025 / Published: 5 December 2025

(This article belongs to the Special Issue AI- and Digital Twin-Driven Intelligent Diagnostics and Predictive Maintenance for Transportation Systems)

Download

Browse Figures

Versions Notes

Abstract

Water leakage in subway tunnel linings poses significant risks to structural safety and long-term durability, making accurate and efficient leakage detection a critical task. Existing deep learning methods, such as UNet and its variants, often suffer from large parameter sizes and limited ability to capture multi-scale features, which restrict their applicability in real-world tunnel inspection. To address these issues, we propose an Efficient Multi-Scale U-shaped KAN-based Segmentation Network (EMS-UKAN) for detecting water leakage in subway tunnel linings. To reduce computational cost and enable edge-device deployment, the backbone replaces conventional convolutional layers with depthwise separable convolutions, and an Edge-Enhanced Depthwise Separable Convolution Module (EEDM) is incorporated in the decoder to strengthen boundary representation. The PKAN Block is introduced in the bottleneck to enhance nonlinear feature representation and improve the modeling of complex relationships among latent features. In addition, an Adaptive Multi-Scale Feature Extraction Block (AMS Block) is embedded within early skip connections to capture both fine-grained and large-scale leakage features. Extensive experiments on the newly collected Tunnel Water Leakage (TWL) dataset demonstrate that EMS-UKAN outperforms classical models, achieving competitive segmentation performance. In addition, it effectively reduces computational complexity, providing a practical solution for real-world tunnel inspection.

Keywords:

tunnel water leakage detection; multi-scale convolution; KAN segmentation model

1. Introduction

Tunnels have become an important part of modern transport, water conservancy and mining infrastructures [1]. However, due to complex geological conditions, lateral earth pressure and the effect of gravity, and tunnel lining connections [2], there is the possibility of water leakage occurring within the tunnel system [3,4]. Once such localized leakage occurs, it may result in the loss of surrounding soil, subsequently altering the stress distribution in the tunnel structure and potentially triggering ground settlement or collapse [5,6]. In addition, it may prevent the normal function of the tunnel. Historically, severe water leakage problems in shield tunnels have caused very significant socioeconomic problems [7,8]. Therefore, in order for shield tunnels to operate in a stable manner, it is critically important to accurately determine the water leakage areas.

Traditional methods for detecting water leakage in tunnels rely on periodic manual inspections. However, due to poor lighting, harsh environment, and geometric obstacles in tunnels in most situations, it is difficult to avoid missed detections and false detections. In addition, manual inspection is costly, labor intensive and inefficient, especially for large-scale, deep, or long tunnels where continuous or high-frequency monitoring would normally not be possible [9,10,11]. To address these limitations, researchers have explored the use of auxiliary technologies such as LiDAR, photogrammetry, and infrared thermal imaging to assist inspections [12,13]. Among these, photogrammetry offers the advantages of being non-contact, intuitive, and information-rich, providing a feasible approach for automated tunnel water leakage detection [14].

Many scholars have employed traditional image processing and threshold-based segmentation methods to segment water leakage stains from tunnel images [15,16]. However, the accuracy of these methods is often affected by variations in illumination, background interference, and the diverse patterns of leakage, leading to limited stability and generalization capability. With the advancement of artificial intelligence technology, machine learning-based automated detection techniques have emerged as a current research hotspot [17,18]. As an important branch of machine learning, deep learning has shown remarkable advantages in image recognition, object detection, and semantic segmentation. In recent years, deep learning has also been applied to tunnel water leakage detection and has demonstrated significant advancements. Man et al. [19] applied ResNet models to tunnel crack and leakage identification, demonstrating that deeper architectures and transfer learning can significantly enhance accuracy, with ResNet50 reaching 96.30%. However, this study was confined to a classification framework, which limits practical applicability. To address the need for more precise localization of defects, Xue et al. [20] constructed a tunnel defect detection model based on Faster R-CNN. By analyzing the characteristics of cracks and water leakage, they optimized the anchor box parameters, improving the model’s performance. Li et al. [21] proposed a Metro Tunnel Surface Inspection System (MTSIS) that combines high-resolution imaging, image pre-processing, and a Faster R-CNN-based multi-layer feature fusion network, achieving high-precision defect detection and successful application on metro lines. To reduce model complexity, Wang et al. [22] improved YOLOv8n by introducing bi-directional feature fusion and multi-scale attention mechanisms. This approach significantly enhanced detection accuracy while reducing the number of parameters.

While object detection-based methods have demonstrated success in localizing water leakage, these approaches provide limited information about the actual shape and extent of leakage areas. Recent studies have increasingly adopted semantic segmentation techniques to address this limitation. Semantic segmentation provides pixel-level predictions that accurately delineate the irregular boundaries and spatial characteristics of leakage areas, facilitating more precise assessment of tunnel structural integrity and damage conditions. For example, Xue et al. [23] improved the Mask R-CNN framework by introducing data augmentation, transfer learning, and a cascade strategy to alleviate the noise issue caused by low IoU thresholds, achieving more accurate tunnel leakage segmentation and area estimation. Wang et al. [24] further enhanced Mask R-CNN with non-local attention, channel attention, and a bidirectional feature pyramid, which strengthened multi-scale feature representation and significantly improved the robustness and accuracy of tunnel leakage detection. Wang et al. [25] proposed the CSG-Unet network, which utilizes side guidance and attention mechanisms to achieve high-precision segmentation of water leakage areas in complex tunnel environments.

Although the above-mentioned segmentation methods exhibit acceptable performance in water leakage segmentation task, accurate segmentation of water stains in subway tunnel linings remains highly challenging. First, as shown in Figure 1, the visual characteristics of water leakage areas vary widely in morphology, resulting in multi-scale and irregular features. In addition, the insufficient and uneven lighting in tunnels often leads to low image contrast, further complicating feature extraction. Second, most existing models are based on convolutional neural networks (CNNs), which perform well at capturing local information but have limited capacity to represent global contextual information and long-range relationships. Finally, many existing approaches rely on intensive computational and memory resources. This reliance substantially limits their feasibility for deployment on edge devices, which are widely used in practical, real-world tunnel inspection. Consequently, achieving high segmentation accuracy while maintaining computational efficiency and low-latency processing remains a significant and open challenge.

To tackle the challenges of accurately segmenting water leakage in subway tunnel linings, this study proposes an efficient multi-scale U-shaped KAN-based segmentation network, named EMS-UKAN. EMS-UKAN builds upon a UNet-like architecture and introduces several key enhancements to improve both feature representation and computational efficiency. Specifically, conventional convolutional layers in the backbone are replaced with depthwise separable convolutions to reduce parameter count and computational cost while maintaining segmentation accuracy. An Edge-Enhanced Depthwise Separable Convolution Module (EEDM) is added in the decoder to strengthen boundary feature representation. In the bottleneck, the PKAN Block is employed to perform flexible nonlinear transformations of latent features and enrich high-level feature representations. An adaptive multi-scale feature extraction module (AMS Block) is embedded in the early skip connections to handle both fine-grained details and large-scale leakage regions. The EMS-UKAN model is implemented using PyTorch version 2.5.1, leveraging its efficient automatic differentiation, GPU acceleration, and flexible deep learning framework to ensure reproducible and high-performance training and evaluation.

The main contributions of this work are summarized as follows:

Efficient backbone with depthwise separable convolutions and edge enhancement: Reduces computational cost and parameter count while preserving accuracy, with EEDM enhancing boundary feature representation in the decoder.
Incorporation of PKAN block: Captures complex nonlinear relationships and long-range dependencies, improving representation of subtle and irregular leakage patterns.
AMS Block in skip connections: Captures both fine-grained local details and large-scale leakage regions, enhancing robustness under varying conditions.
Validation on the TWL dataset: Extensive experiments demonstrate superior segmentation with an accuracy of 86.52%, an Intersection over Union (IoU) of 82.19%, and a Dice coefficient of 85.46% while reducing computational complexity—highlighting the model’s practical potential for real-world tunnel inspection scenarios.

The remainder of this paper is organized as follows. Section 2 reviews existing methods for feature extraction architectures and tunnel leakage segmentation models, highlighting their strengths and limitations. In Section 3, we present a comprehensive description of the EMS-UKAN architecture and elaborate on the structural details of each module integrated into the model. Section 4 describes the datasets used in the experiments, the hyperparameter configurations, the training loss functions, and the evaluation metrics, providing a foundation for thorough model assessment. Section 5 reports the results, including quantitative comparisons, ablation studies, and qualitative analysis to illustrate the model’s performance on complex leakage patterns. Finally, Section 6 concludes the paper, summarizes the main contributions, and outlines potential directions for future research.

2. Related Works

2.1. Feature Extraction Architectures

CNNs [26,27] have emerged as the cornerstone of encoders, thanks to their proficiency in capturing spatial relationships within images [28,29,30,31]. Pioneering architectures such as AlexNet [32,33] and VGGNet [34,35] laid the groundwork by leveraging deep convolutional layers to progressively extract features. GoogleNet [36] further advanced this field by introducing the Inception module, which enabled more efficient computation of multi-scale representations. ResNet [28] then addressed the vanishing gradient problem through residual connections, allowing the training of significantly deeper networks. Despite their widespread success, traditional CNNs are inherently limited in their ability to capture long-range dependencies due to their local receptive fields.

Recently, Vision Transformers (ViTs) have revolutionized the field by enabling the learning of long-range pixel relationships through Self-attention (SA) mechanisms, as first demonstrated by Dosovitskiy et al. [37]. Subsequent enhancements to ViTs have included the integration of CNN features [38,39], the development of novel self-attention modules [39,40], and the introduction of innovative architectural designs [41,42]. For instance, the Swin Transformer employs a sliding window attention mechanism to improve computational efficiency, while SegFormer utilizes Mix FFN blocks to achieve hierarchical feature extraction. Although ViTs have effectively addressed the limitations of CNNs in capturing long-range dependencies [34,36,43,44], they still face challenges in modeling local spatial relationships among pixels. To achieve a balance between computational efficiency and model performance, KAN [45] modules are incorporated to enhance nonlinear representation while effectively capturing both local spatial and global dependencies in this study.

2.2. Segmentation Models Applied to Tunnel Water Leakage Detection

Deep learning-based segmentation models have been increasingly applied to automatic detection of water leakage in tunnels, providing a promising alternative to manual inspection. Early approaches primarily relied on classical encoder–decoder architectures. For example, Liu [46] applied U-Net to field-collected tunnel lining images, demonstrating the feasibility of semantic segmentation for leakage detection. Wang et al. [47] optimized encoder–decoder architectures to achieve lightweight models capable of segmenting tunnel water leakage. However, these models often struggle in complex tunnel environments, such as varying lighting conditions, shadows, and textured backgrounds. To address these challenges, Wang et al. [25] further enhanced the U-Net architecture by incorporating side-guided attention, parallel attention, and channel attention mechanisms, improving the robustness of leakage segmentation under complex conditions. DAEiS-Net [48] introduced a Deep Aggregation Module, Multiscale Cross-Attention, Edge Information Supplement Module, and Sub-Pixel module to integrate multi-scale features and better capture the edges and details of water stains. Similarly, Wang et al. [49] designed Attention-Guided Feature Fusion and Auxiliary Boundary Awareness modules to provide additional guidance for segmentation masks and enhance the network’s perception of leakage boundaries. Despite these improvements, convolutional neural networks often lack the ability to capture global context. To overcome this limitation, Song et al. proposed SE-TransUNet [50], which combines global context modeling with fine local feature extraction, improving segmentation robustness in complex scenarios. Overall, the evolution of these segmentation models demonstrates continuous progress in capturing complex leakage patterns. Future research should focus on integrating high segmentation accuracy with lightweight, real-time, and deployment-ready designs to enable practical tunnel inspection systems.

3. Methods

3.1. Overall Architecture

The proposed EMS-UKAN builds upon the classical UNet architecture and introduces several key modifications to enhance feature representation, capture multi-scale leakage patterns, and improve computational efficiency. As illustrated in Figure 2, the network is organized into three principal components: the Encoder, the Bottleneck, and the Decoder, which together form a U-shaped architecture suitable for pixel-level segmentation of tunnel water leakage.

In the backbone, standard convolutional layers in the original UNet are replaced with depthwise separable convolutions(DWConv) [51]. This modification substantially reduces parameter count and computational cost, while preserving segmentation accuracy, thereby making the network more feasible for deployment on resource-constrained edge devices commonly used in tunnel inspection. To further improve boundary sensitivity, we introduce the Edge-Enhanced Depthwise Separable Convolution module(EEDM) into the Decoder, which strengthens the representation of fine-grained structures and irregular contours without incurring significant additional complexity.

The Bottleneck serves as a crucial stage for feature abstraction and transformation. To address the limitations of conventional convolution in modeling nonlinear and long-range dependencies, we introduce a tokenized PKAN block(PKAN Block) that leverages the representational capacity of Kolmogorov–Arnold Networks (KAN) [45]. The PKAN block integrates KAN layers with additional partial convolutional operations, which not only mitigate the computational overhead of KAN but also enhance local contextual modeling. This hybrid design enables the network to effectively capture subtle and irregular water leakage patterns that are often overlooked by purely convolution-based structures.

Finally, to alleviate the loss of multi-scale contextual information in higher semantic layers, the skip connections between Encoder and Decoder are augmented with an Adaptive Multi-Scale Feature Extraction module(AMS Block). By applying atrous convolutions with varying dilation rates, the AMS Block enriches the multi-scale representation of feature maps, ensuring that both small-scale stains and large leakage regions are accurately delineated.

3.2. Efficient Backbone with Depthwise Separable Convolutions and EEDM

To improve efficiency while maintaining segmentation accuracy, EMS-UKAN adopts DWConv to replace the standard convolutional layers in the UNet backbone. This factorization decomposes a standard convolution into a depthwise convolution followed by a pointwise convolution, significantly reducing parameters and computational operations while preserving the representational capacity of the backbone.

Formally, a standard 2-D convolution applied to an input

X \in R^{H \times W \times C_{in}}

with kernel size

K \times K

and output channels

C_{out}

can be expressed as:

Y (h, w, c) = \sum_{i = 1}^{K} \sum_{j = 1}^{K} \sum_{k = 1}^{C_{in}} X (h + i, w + j, k) W_{i, j, k, c}

(1)

Depthwise separable convolution factorizes this process into two steps:

1. Depthwise convolution (per-channel spatial filtering):

Z (h, w, k) = \sum_{i = 1}^{K} \sum_{j = 1}^{K} X (h + i, w + j, k) W_{i, j, k}^{d}, k = 1, \dots, C_{in}

(2)

This step uses

K^{2} \cdot C_{i n}

parameters, as each input channel is convolved separately without mixing channels.

2. Pointwise convolution (channel mixing with

1 \times 1

kernels):

Y (h, w, c) = \sum_{k = 1}^{C_{in}} Z (h, w, k) W_{k, c}^{p}, c = 1, \dots, C_{out}

(3)

This step uses

C_{i n} \cdot C_{o u t}

parameters. Hence, the total number of parameters in a DWConv layer is:

K^{2} \cdot C_{i n} + C_{i n} * C_{o u t}

(4)

which is much smaller than that of a standard convolution

K^{2} \cdot C_{i n} \cdot C_{o u t}

, particularly when

C_{o u t}

and K are large.

While DWConv enhances overall efficiency, it may weaken boundary feature representation, which is crucial for accurate leakage segmentation. To address this, we design an Edge-Enhanced Depthwise Separable Convolution Module (EEDM) and integrate it into the decoder stages. As illustrated in Figure 3, the EEDM explicitly extracts edge cues and fuses them with the main features.

Each EEDM receives two inputs: the upsampled feature map from the previous decoder stage, denoted as

X_{up}

, and the corresponding feature map from the encoder via the skip connection, denoted as

X_{skip}

. These two feature maps are first concatenated along the channel dimension to form a combined feature map:

X_{concat} = Concat (X_{up}, X_{skip}),

(5)

where

Concat (\cdot)

represents channel-wise concatenation. This operation enables the network to fuse high-level semantic information from the decoder with spatially detailed information from the encoder.

The concatenated feature is then processed through a depthwise separable convolution followed by batch normalization and ReLU activation:

X^{d} = ReLU (BN (DWConv (X_{concat})))

(6)

where

DWConv (\cdot)

denotes a depthwise separable convolution. This step extracts channel-wise spatial features while keeping the computational cost low, and the normalization and activation enhance the stability and expressiveness of the learned features.

A second depthwise convolution is subsequently applied to

X^{d}

to produce an intermediate feature map

X^{e}

:

X^{e} = DWConv (X^{d})

(7)

To explicitly enhance edge information, we employ a high-frequency emphasis operation by subtracting the average-pooled feature map from

X^{e}

:

\begin{matrix} X^{edge} & = X^{e} - A P (X^{e}) \end{matrix}

(8)

where

A P (\cdot)

denotes average pooling across spatial dimensions. This operation suppresses low-frequency content and highlights the high-frequency edge responses in

X^{e}

.

The final edge-enhanced feature map

X^{o u t}

is obtained by applying a

1 \times 1

convolution to

X^{edge}

and adding the original

X^{e}

:

\begin{matrix} X^{o u t} & = {Conv}_{1 \times 1} (X^{edge}) + X^{e} \end{matrix}

(9)

This fusion ensures that the enhanced edge features are combined with the original semantic information, allowing the network to preserve both boundary details and the overall structural context. By explicitly integrating edge cues, EEDM strengthens the model’s ability to capture fine-grained and irregular boundary structures, which are often critical for precise water leakage delineation.

3.3. Tokenized PKAN Block

The bottleneck layer of encoder-decoder architectures presents a critical juncture for feature transformation, where the network must balance computational efficiency with expressive capacity. To enhance this transformation process, we integrate a novel tokenized PKAN module [52] that leverages the mathematical properties of Kolmogorov–Arnold representations while mitigating their computational overhead through strategic incorporation of partial convolutions. The structure of the PKAN Block is illustrated in Figure 4.

In the PKAN Block, we first perform tokenization by reshaping the output feature of Encoder

X_{i n} \in R^{H \times W \times C}

into a sequence of flattened 2D patches

{X_{i n}^{i} \in R^{P^{2} \cdot C} | i = 1, 2, \dots, N}

, where each patch size is

P \times P

and

N = \frac{H \times W}{P^{2}}

is the number of patches. Subsequently, the flattened patch tokens are projected into a learnable D-dimensional latent space via a trainable linear transformation

E \in R^{(P^{2} \cdot C) \times D}

, as,

Z = [X_{i n}^{1} E; X_{i n}^{2} E; \dots; X_{i n}^{N} E]

(10)

The linear projection

E \in R^{(P^{2} \cdot C) \times D}

is realized using a convolutional layer with a kernel size of 3. The resulting token sequence is then fed into a sequence of

N = 3

KAN layers, with each KAN layer capturing complex nonlinear dependencies as illustrated in Figure 4.

The overall transformation performed by a K-layer KAN network can be formally represented as

KAN (Z) = (Φ_{K - 1} \circ Φ_{K - 2} \circ \dots \circ Φ_{1} \circ Φ_{0}) Z

(11)

where Z denotes the input token sequence to the first KAN layer, and

Φ_{k}

corresponds to the mapping implemented by the k-th KAN layer as illustrated in Figure 4. Each KAN layer, with

n_{i n}

-dimensional input and

n_{o u t}

-dimensional output,

Φ

comprises

n_{i n} \times n_{o u t}

learnable activation functions

ϕ

:

Φ = {ϕ_{q, p}}, p = 1, 2, \dots, n_{i n}, q = 1, 2, \dots, n_{o u t}

(12)

The result of the KAN network computation from layer k to layer

k + 1

can be expressed as a matrix from

Z_{k + 1} = Φ_{k} Z_{k}

, where:

Φ_{k} = (\begin{matrix} ϕ_{k, 1, 1} (\cdot) & ϕ_{k, 1, 2} (\cdot) & \dots & ϕ_{k, 1, n_{k}} (\cdot) \\ ϕ_{k, 2, 1} (\cdot) & ϕ_{k, 2, 2} (\cdot) & \dots & ϕ_{k, 2, n_{k}} (\cdot) \\ ⋮ & ⋮ & ⋮ \\ ϕ_{k, n_{k + 1}, 1} (\cdot) & ϕ_{k, n_{k + 1}, 2} (\cdot) & \dots & ϕ_{k, n_{k + 1}, n_{k}} (\cdot) \end{matrix})

(13)

Compared with conventional MLPs, KANs employ learnable activation functions on connections, achieving comparable or superior performance with smaller models while providing enhanced interpretability.

To alleviate computational overhead, partial convolutions [53] are incorporated following each KAN layer, reducing the number of parameters and operations while preserving the expressiveness of the features. Building upon the integration of KAN layers and partial convolutions, each module further incorporates a residual connection and layer normalization to stabilize training and retain input information. Formally, the transformation of a single module can be expressed as

X_{o u t} = L N (X_{i n} + P C o n v (K A N (x_{i n}))),

(14)

where

X_{in}

and

X_{out}

denote the input and output feature maps of the module, respectively. Here,

KAN (\cdot)

represents the learnable KAN transformation that captures complex nonlinear relationships more efficiently than conventional MLPs.

PConv (\cdot)

is the partial convolution operation, and

LN (\cdot)

denotes layer normalization. The residual connection ensures effective gradient flow and stabilizes training.

3.4. Adaptive Multi Scale Feature Extraction Block

In the task of tunnel lining water leakage segmentation, accurately capturing both high-level semantic information and fine-grained structural details is critical. To enhance the representation of high-level features while providing rich multi-scale contextual information to the decoder, we introduce an Adaptive Multi-Scale Feature Extraction block (AMS Block), positioned within the skip connections. The structure of the module is illustrated in Figure 5. This block extracts features at different receptive fields via dual dilated convolution branches, followed by adaptive fusion, while preserving the input residual to facilitate gradient flow and stabilize training.

Given the input feature map

X_{in} \in R^{C \times H \times W}

, the AMFE block first applies two parallel dilated convolution branches to extract multi-scale features:

X_{M S} = Concat ({DilatedConv}_{5} (X_{in}), {DilatedConv}_{7} (X_{in}))

(15)

Here,

{DilatedConv}_{5}

and

{DilatedConv}_{7}

are designed to capture local texture information and broader contextual information, respectively. The concatenated multi-scale features

X M S

are then fed into both global average pooling (AP) and global max pooling (MP) operations to aggregate spatial context, followed by a convolution and a Sigmoid activation to generate adaptive weights for each branch:

\begin{matrix} X_{A P} & = σ (Conv (AP (X_{M S}))) \\ X_{M P} & = σ (Conv (MP (X_{M S}))) \end{matrix}

(16)

In this formulation, the convolution

Conv (\cdot)

shares parameters for both pooling branches, and the Sigmoid activation function

σ (\cdot)

normalizes the weights to the range

[0, 1]

. This allows the network to adaptively emphasize the most informative scale features while suppressing less relevant information.

Finally, the multi-scale features are adaptively fused and combined with the input residual to produce the module output:

X_{out} = X_{M S} ⊙ X_{A P} + X_{M S} ⊙ X_{M P} + X_{in}

(17)

where ⊙ denotes element-wise multiplication. The residual connection preserves the original input information and mitigates potential gradient vanishing in deep networks, thereby improving training stability. The fused output

C_{out}

integrates high-level semantic information with rich multi-scale details, providing the decoder with enhanced feature representations for precise localization and segmentation of leakage regions in tunnel linings, as illustrated in Figure 5.

Overall, the AMS block effectively leverages features from different receptive fields, adaptively selects the most informative scale features, and maintains gradient stability, resulting in improved performance for tunnel lining water leakage segmentation tasks.

4. Experiments Details

This section outlines the experimental setup, including the datasets employed, the training configurations, and the evaluation metrics adopted to systematically assess the performance of the proposed method.

4.1. Datasets and Preprocessing

The Tunnel Water Leakage (TWL) Dataset was established to support research on tunnel lining leakage segmentation. All images in this dataset were collected from real tunnel environments under diverse and complex scenarios of water leakage. The dataset includes a total of 1555 images, each with a resolution of

1315 \times 986

or

986 \times 1315

. Each image was carefully annotated at the pixel level, including detailed contours of both leakage areas and other shapes and sizes of water stains so that the resulting ground truth would be accurate and verifiable. Example images included in the dataset are shown in Figure 6.

The dataset was separated into training, validation, and testing subsets at a 6:2:2 ratio to facilitate the model training and evaluation process. The training set was further augmented using a variety of random data augmentation techniques, including random cropping, rotating the image, and adding Gaussian noise into the images, to promote the model’s generalization across scenarios. During the training process, all images were uniformly resized to

512 \times 512

to maintain consistency of input across the model.

4.2. Experimental Details

All models in this study were implemented and trained using the PyTorch framework. The experiments were conducted on a Linux operating system with an NVIDIA GeForce RTX 4090 GPU to accelerate the training process. The Adam optimizer was employed to update the model parameters, with the initial learning rate set to 0.001. To achieve stable convergence and improve training efficiency, a cosine annealing learning rate scheduler [54] was employed to dynamically adjust the learning rate. Specifically, the learning rate at the t-th epoch,

η_{t}

, can be calculated as:

η_{t} = η_{m i n} + \frac{1}{2} (η_{m a x} - η_{m i n}) (1 + c o s (\frac{T_{c u r}}{T_{m a x}} π))

(18)

where

η_{m a x}

and

η_{m i n}

are the maximum and minimum learning rates,

T_{c u r}

denotes the current epoch,

T_{m a x}

is the total number of epochs. In this study,

η_{m a x} = 0.001

,

η_{m i n} = 1 \times 10^{- 5}

, and

T_{m a x} = 50

.

During the experiments, the maximum number of training epochs was set to 150, and the batch size was fixed at 8. To prevent overfitting and accelerate convergence, an early stopping strategy was applied, in which the training was terminated if the validation performance did not improve for 10 consecutive epochs. All hyperparameters, including learning rate, batch size, maximum epochs, and early stopping patience, were determined empirically through preliminary experiments to achieve an optimal trade-off between training efficiency and model performance.

4.3. Training Loss Function

During training, the model was supervised using a combination of Binary Cross-Entropy(BCE) loss and Dice loss [55], which effectively balances pixel-wise accuracy and the overall shape similarity of the predicted masks. The combined loss function,

L_{t o t a l}

, is defined as:

L_{t o t a l} = 0.5 \times L_{B C E} + L_{D i c e}

(19)

where

L_{B C E}

the binary cross-entropy loss, calculated as:

L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} log (p_{i}) + (1 - y_{i}) log (1 - p_{i})]

(20)

where N is the total number of pixels,

y_{i}

denotes the ground truth label of pixel i (0 or 1), and

p_{i}

is the predicted probability for that pixel.

The Dice loss,

L_{D i c e}

, is defined as:

L_{D i c e} = 1 - \frac{2 \sum_{i = 1}^{N} (p_{i} \cdot y_{i}) + ε}{\sum_{i = 1}^{N} p_{i} + \sum_{i = 1}^{N} y_{i} + ε}

(21)

where

ϵ

is a small constant to prevent division by zero. Dice loss measures the overlap between the predicted mask and the ground truth, emphasizing the accuracy of predicted regions, which is particularly useful for handling class imbalance in segmentation tasks.

By combining BCE loss and Dice loss, the model benefits from both pixel-level accuracy and overall mask consistency, leading to more accurate and robust segmentation results, especially for irregular and sparse water leakage regions in tunnel images.

4.4. Evaluation Metrics

The segmentation of water leakage in subway tunnel linings is fundamentally a pixel-level segmentation task, which requires accurate classification of each pixel as leakage or non-leakage. To quantitatively evaluate the performance of the proposed model, three commonly used metrics were adopted: Intersection over Union (

I o U

), Dice coefficient (

D i c e

), and pixel accuracy (

A c c

) [56,57,58,59]. The IoU measures the overlap between the predicted values of tunnel water stain and the ground truth labels, serving as a critical metric for evaluating pixel-level tasks. Dice Coefficient is used to quantify the overall similarity between the predicted leakage mask and the ground truth. The Acc refers to the model’s pixel classification accuracy across all categories. The formulas for these metrics are as follows:

\begin{matrix} I o U = \frac{T P}{T P + F P + F N} \end{matrix}

(22)

\begin{matrix} D i c e = \frac{2 \cdot T P}{2 \cdot T P + F P + F N} \end{matrix}

(23)

\begin{matrix} A c c = \frac{T P + T N}{T P + F N + F P + T N} \end{matrix}

(24)

In the formulas above,

T P

(true positive) denotes pixels correctly predicted as leakage,

T N

(true negative) denotes pixels correctly predicted as non-leakage,

F P

(false positive) denotes pixels incorrectly predicted as leakage, and

F N

(false negative) denotes pixels incorrectly predicted as non-leakage.

5. Results and Analysis

To evaluate the effectiveness of the proposed method, we conducted experiments comparing it with several state-of-the-art semantic segmentation networks, including U-Net [43], UNet++ [60], CCNet [61], Deeplabv3+ [62], TransUNet [63], and SwinUNet [64]. All models were trained from scratch using open-source implementations or the authors’ code. No pre-trained weights were used for any network, including ours, and all experiments followed the same data processing and augmentation procedures. In addition to these baseline comparisons, we performed ablation studies to investigate the contribution of each component in the proposed architecture, providing a more detailed analysis of its effectiveness.

5.1. Quantitative Comparison

Following the evaluation metrics introduced in Section 4.4, we report Acc, IoU, and Dice to assess segmentation performance. In addition, GFLOPs and the number of parameters are provided to evaluate computational cost and model complexity. As shown in Table 1, UNet++ achieves the highest accuracy (86.03%) among the conventional networks and a competitive Dice score (83.79%). CCNet and Deeplabv3+ demand substantially more computation (216.78 and 164.1 GFLOPs, respectively) but exhibit lower IoU and Dice values. TransUNet, despite its large parameter size (67.87 M), does not surpass EMS-UKAN in any metric. SwinUNet shows limited segmentation performance, particularly in IoU (64.31%) and Dice (72.51%).

Our EMS-UKAN achieves the best overall performance in the water leakage detection task, with 86.52% accuracy, 82.19% IoU, and 85.46% Dice, while maintaining a moderate computational cost (34.9 GFLOPs) and a relatively compact model size (19.37 M parameters). Compared with conventional convolution-based networks, EMS-UKAN more accurately identifies leakage regions and preserves fine boundary details, which are essential for detecting small or irregular water seepage areas. Although UNet++ and Deeplabv3+ reach similar accuracy or IoU, they require significantly higher computation or larger model sizes, reducing their efficiency. Transformer-based models, such as TransUNet and SwinUNet, show lower segmentation performance in this task and are less efficient overall. These results indicate that EMS-UKAN effectively balances segmentation accuracy, boundary representation, and computational efficiency, demonstrating its suitability for practical tunnel inspection applications where precise detection of water leakage is critical.

5.2. Qualitative Comparison

As illustrated in Figure 7, we conducted a qualitative comparison between EMS-UKAN and several representative segmentation models, including U-Net, UNet++, CCNet, Deeplabv3+, TransUNet, and SwinUNet, across diverse tunnel environments. Tunnel leakage images are often affected by occlusions, illumination variations, surface reflections, and complex background textures, making qualitative analysis essential for understanding practical performance.

As illustrated in Figure 7, the comparative results reveal distinct differences in model robustness across a variety of challenging tunnel scenarios. In the first row, where relatively thick pipelines occlude the stained regions, most models can correctly delineate the pipeline structure; however, shadowed areas cast by the pipeline often cause misclassification. In particular, transformer-based models tend to mistake portions of actual leakage for shadows due to their heavy reliance on global contextual cues. In contrast, our model effectively suppresses such ambiguity and accurately distinguishes between shadows and true leakage regions.

The second row presents clear, well-defined water stains under uneven illumination. Although all competing models can detect the primary leakage patterns, they consistently misidentify shadowed areas produced by protrusions or lighting variations as leakage. This systematic failure highlights their limited capability in reasoning about local texture continuity and intensity gradients. Our method, benefiting from enhanced multi-scale representations, avoids this confusion and yields more precise boundaries even in strongly shadowed regions.

In the third row, which depicts large-area leakage with concave cavities, all other models mistakenly segment the dark cavity areas as water stains, revealing an overdependence on low-level intensity cues. Transformer-based models exhibit slightly improved behavior but still misinterpret part of the cavity region due to insufficient fine-grained structural modeling. Our model, by contrast, maintains clear separation between actual leakage and cavity shadows, demonstrating stronger spatial discrimination and resilience to structural irregularities.

The fourth row further examines large leakage areas partially occluded by thin cables. While other models frequently classify corroded cable sections as leakage or, conversely, suppress true leakage regions near the cable, transformer-based models—although able to identify the cable itself—often cause large portions of adjacent leakage areas to remain unsegmented. Our method avoids these pitfalls, capturing both the cable and the surrounding leakage regions with high fidelity.

Overall, across diverse tunnel conditions involving occlusions, shadows, illumination variations, and structural irregularities, our model consistently demonstrates superior robustness and precision. The results indicate that EMS-UKAN not only mitigates the common failure modes exhibited by convolution-based and transformer-based baselines, but also establishes a more reliable segmentation performance that generalizes effectively to complex real-world tunnel environments.

5.3. Ablation Study

To investigate the contributions of each component in EMS-UKAN for water leakage detection, we conducted a series of ablation experiments based on the U-Net backbone with depthwise separable convolutions as the baseline. Starting from this baseline, we incrementally incorporated the EEDM, the PKAN block, and the AMS block to evaluate their individual and combined effects on segmentation performance. In these experiments, one variant added only the EEDM to the baseline, another combined the EEDM with the PKAN block, and the final configuration included all three modules, representing the full EMS-UKAN architecture. Apart from the structural modifications, all experiments used identical hyperparameters, including learning rate, batch size, data augmentation strategies, and training epochs, ensuring a fair comparison across models. The quantitative results are summarized in Table 2.

The results show that each module contributes to improved performance in water leakage detection. Adding the EEDM slightly improves the IoU and Dice scores by enhancing boundary representation, which is essential for detecting small and irregular leakage regions in tunnel linings. Incorporating the PKAN block further improves performance, particularly IoU (from 79.30% to 81.70%), reflecting its ability to capture complex nonlinear relationships within the feature space. Finally, the full EMS-UKAN model, including the AMS block, achieves the highest accuracy and segmentation metrics, demonstrating the combined effect of boundary enhancement, nonlinear feature modeling, and multi-scale aggregation. To provide a more detailed assessment of the impact of each module, a set of representative images from the dataset was selected for qualitative visualization of the segmentation results.

As shown in Figure 8, the qualitative results from the ablation study indicate that the baseline model with depthwise separable convolutions exhibits large areas of misclassification under different environmental conditions. Introducing the EEDM module significantly reduces these large-scale errors, producing segmentation boundaries that more closely align with the ground truth, particularly along the edges of leakage regions. Adding the PKAN block further improves segmentation, enabling the network to capture more complex relationships and better distinguish subtle or irregular leakage patterns. Moreover, the PKAN module enhances the model’s ability to correctly identify small leakage regions that were previously overlooked. Finally, the inclusion of the AMS block allows for more comprehensive multi-scale feature aggregation, resulting in more complete and precise segmentation of both large and small leakage areas. These observations clearly demonstrate that the progressive addition of modules improves boundary delineation, small region detection, and robustness to varying environmental conditions.

6. Conclusions

In this work, we proposed EMS-UKAN, an efficient multi-scale U-shaped KAN-based segmentation network for automatic detection of water leakage in tunnel linings. By replacing standard convolutions in the UNet backbone with depthwise separable convolutions and incorporating EEDM, the model significantly reduces parameter size and computational cost while effectively capturing complex boundary features. In the bottleneck, we introduced the PKAN block, which combines the global modeling capacity of KAN with the local adaptability of partial convolutions, enabling the extraction of subtle and irregular leakage patterns often overlooked by conventional convolutional structures. Importantly, this study represents one of the first attempts to integrate KAN into tunnel defect detection, demonstrating its strong potential for complex spatial representation. Furthermore, the AMS block embedded in skip connections enhances the ability to capture leakage features across diverse scales. Extensive experiments demonstrate superior segmentation with an Acc of 86.78%, an IoU of 83.81%, and a Dice of 87.31% while reducing computational complexity—highlighting the model’s practical potential for real-world tunnel inspection scenarios. In addition, detailed qualitative analysis illustrates the advantages of EMS-UKAN in complex tunnel environments, and ablation studies further validate the effectiveness of each proposed module.

Future research will focus on three directions to further advance the framework. First, we will explore the synergy between KAN and lightweight network architectures to develop more efficient models for edge deployment. Second, the EMS-UKAN framework will be extended to broader tunnel defect detection tasks (e.g., cracks, spalling) to realize multi-defect integrated inspection. Third, we will investigate self-supervised learning and domain adaptation techniques to reduce reliance on large-scale annotated datasets, lowering the cost of practical application. Additionally, optimization strategies for real-time inference on edge devices will be studied to promote the practical implementation of intelligent tunnel safety monitoring systems. Overall, EMS-UKAN provides an efficient, interpretable, and practical solution for automatic water leakage detection in tunnel linings. With the continuous development of KAN and its cross-domain applications, this work lays a foundation for the integration of advanced mathematical modeling techniques into infrastructure health monitoring.

Author Contributions

Conceptualization, M.H.; methodology, L.T. and X.W.; software, L.T. and X.Y.; validation, X.Y., F.L. and X.W.; investigation, Z.Z.; resources, M.H.; data curation, X.Y.; writing—original draft preparation, L.T., X.Y. and Z.Z.; writing—review and editing, F.L.; visualization, L.T.; supervision, X.Y.; project administration, M.H.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Excellent Talent Training Program of Xicheng District, Beijing, China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the corresponding author upon request.

Conflicts of Interest

Authors Meide He, Fei Liu and Zhimin Zhao were employed by the company Beijing Third Construction Engineering Quality Inspection Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhang, F.; Wang, Z.; Zhang, X.; Wang, X.; Hu, X. Research on Inertial Force Suppression Control for Hydraulic Cylinder Synchronization of Shield Tunnel Segment Erector Based on Sliding Mode Control. Actuators 2025, 14, 449. [Google Scholar] [CrossRef]
Tan, L.; Zhang, X.; Zeng, X.; Hu, X.; Chen, F.; Liu, Z.; Liu, J.; Tang, T. LCJ-Seg: Tunnel Lining Construction Joint Segmentation via Global Perception. In Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), Edmonton, AB, Canada, 24–27 September 2024; IEEE: New York, NY, USA, 2024; pp. 993–998. [Google Scholar]
Jiang, J.; Shen, Y.; Wang, J.; Wang, J.; Huang, J.; Fu, S.; Guo, K.; Ferreira, V. Advances and challenges in water leakage detection techniques for shield tunnels: A comprehensive review. Measurement 2025, 257, 118763. [Google Scholar] [CrossRef]
Tan, L.; Chen, X.; Hu, X.; Tang, T. Dmdsnet: A computer vision-based dual multi-task model for tunnel bolt detection and corrosion segmentation. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: New York, NY, USA, 2023; pp. 4827–4833. [Google Scholar]
Gao, X.; Li, P.; Zhang, M.; Ge, Z.; Chen, C. Experimental investigation of ground collapse induced by Soil-Water leakage in local failed tunnels. Tunn. Undergr. Space Technol. 2025, 157, 105950. [Google Scholar] [CrossRef]
Tan, L.; Hu, X.; Tang, T.; Yuan, D. A lightweight metro tunnel water leakage identification algorithm via machine vision. Eng. Fail. Anal. 2023, 150, 107327. [Google Scholar] [CrossRef]
Liu, Z.; Gao, X.; Yang, Y.; Xu, L.; Wang, S.; Chen, N.; Wang, Z.; Kou, Y. EDT-Net: A Lightweight Tunnel Water Leakage Detection Network Based on LiDAR Point Clouds Intensity Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 7334–7346. [Google Scholar] [CrossRef]
Zhang, C.; Chen, X.; Liu, P.; He, B.; Li, W.; Song, T. Automated detection and segmentation of tunnel defects and objects using YOLOv8-CM. Tunn. Undergr. Space Technol. 2024, 150, 105857. [Google Scholar] [CrossRef]
Zhang, C.; Wang, R.; Yu, L.; Xiao, Y.; Guo, Q.; Ji, H. Localization of cyclostationary acoustic sources via cyclostationary beamforming and its high spatial resolution implementation. Mech. Syst. Signal Process. 2023, 204, 110718. [Google Scholar] [CrossRef]
Ren, Q.; Wang, Y.; Xu, J.; Hou, F.; Cui, G.; Ding, G. REN-GAN: Generative adversarial network-driven rebar clutter elimination network in GPR image for tunnel defect identification. Expert Syst. Appl. 2024, 255, 124395. [Google Scholar] [CrossRef]
Lee, C.H.; Chiu, Y.C.; Wang, T.T.; Huang, T.H. Application and validation of simple image-mosaic technology for interpreting cracks on tunnel lining. Tunn. Undergr. Space Technol. 2013, 34, 61–72. [Google Scholar] [CrossRef]
Wang, K.; Yao, X. Rapid detecting equipment for structural defects of metro tunnel. In Life Cycle Analysis and Assessment in Civil Engineering: Towards an Integrated Vision; CRC Press: Boca Raton, FL, USA, 2018; pp. 1561–1568. [Google Scholar]
Zhang, Y.; Adin, V.; Bader, S.; Oelmann, B. Leveraging acoustic emission and machine learning for concrete materials damage classification on embedded devices. IEEE Trans. Instrum. Meas. 2023, 72, 1–8. [Google Scholar] [CrossRef]
Huang, H.; Sun, Y.; Xue, Y.; Wang, F. Inspection equipment study for subway tunnel defects by grey-scale image processing. Adv. Eng. Inform. 2017, 32, 188–201. [Google Scholar] [CrossRef]
Zhu, X.; Zheng, Y.; Qi, L.; Wang, N.; Ni, S. Research on Recognition Algorithm of Tunnel Leakage Based on Image Processing; Technical report, SAE Technical Paper; SAE International: Warrendale, PA, USA, 2020. [Google Scholar]
Attard, L.; Debono, C.J.; Valentino, G.; Di Castro, M. Tunnel inspection using photogrammetric techniques and image processing: A review. ISPRS J. Photogramm. Remote Sens. 2018, 144, 180–188. [Google Scholar] [CrossRef]
Huang, C.; Sun, X.; Zhang, Y. Tiny-machine-learning-based supply canal surface condition monitoring. Sensors 2024, 24, 4124. [Google Scholar] [CrossRef] [PubMed]
Gui, S.; Song, S.; Qin, R.; Tang, Y. Remote sensing object detection in the deep learning era—A review. Remote Sens. 2024, 16, 327. [Google Scholar] [CrossRef]
Man, K.; Liu, R.; Liu, X.; Song, Z.; Liu, Z.; Cao, Z.; Wu, L. Water leakage and crack identification in tunnels based on transfer-learning and convolutional neural networks. Water 2022, 14, 1462. [Google Scholar] [CrossRef]
Xu, Y.; Gong, J.; Li, Y.; Zhang, W.; Zhang, G. Optimization of Shield Tunnel Lining Defect Detection Model Based on Deep Learning. J. Hunan Univ. Nat. Sci. 2020, 47, 137–146. [Google Scholar]
Li, D.; Xie, Q.; Gong, X.; Yu, Z.; Xu, J.; Sun, Y.; Wang, J. Automatic defect detection of metro tunnel surfaces using a vision-based inspection system. Adv. Eng. Inform. 2021, 47, 101206. [Google Scholar] [CrossRef]
Wang, S.; Mo, J.; Xu, L.; Zheng, X. Lightweight Tunnel Leakage Water Detection Algorithm Based on YOLOv8n. In Proceedings of the 2024 IEEE 25th China Conference on System Simulation Technology and Its Application (CCSSTA), Tianjin, China, 21–23 July 2024; IEEE: New York, NY, USA, 2024; pp. 259–263. [Google Scholar]
Xue, Y.; Cai, X.; Shadabfar, M.; Shao, H.; Zhang, S. Deep learning-based automatic recognition of water leakage area in shield tunnel lining. Tunn. Undergr. Space Technol. 2020, 104, 103524. [Google Scholar] [CrossRef]
Wang, B.; He, N.; Xu, F.; Du, Y.; Xu, H. Visual detection method of tunnel water leakage diseases based on feature enhancement learning. Tunn. Undergr. Space Technol. 2024, 153, 106009. [Google Scholar] [CrossRef]
Wang, P.; Shi, G. Image segmentation of tunnel water leakage defects in complex environments using an improved Unet model. Sci. Rep. 2024, 14, 24286. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Luo, X.; Haruna, S.A.; Zareef, M.; Chen, Q.; Ding, Z.; Yan, Y. Au-Ag OHCs-based SERS sensor coupled with deep learning CNN algorithm to quantify thiram and pymetrozine in tea. Food Chem. 2023, 428, 136798. [Google Scholar] [CrossRef]
Cao, J.; Liu, Z.; Hu, X.; Miao, Y.; Li, J.; Ma, D. Adaptive Measurement for High-Speed Electromagnetic Tomography via Deep Reinforcement Learning. IEEE Trans. Instrum. Meas. 2025, 74, 1–14. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: New York, NY, USA, 2020; pp. 1055–1059. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 2 December 2025). [CrossRef]
Liu, J.; Abbas, I.; Noor, R.S. Development of Deep Learning-Based Variable Rate Agrochemical Spraying System for Targeted Weeds Control in Strawberry Crop. Agronomy 2021, 11, 1480. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Hu, X.; Cao, Y.; Sun, Y.; Tang, T. Railway Automatic Switch Stationary Contacts Wear Detection Under Few-Shot Occasions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14893–14907. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 459–479. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [PubMed]
Liu, Y. Intelligent Identification of Tunnel Lining Water Leakage Based on Deep Learning. Hans J. Civ. Eng. 2023, 12, 1123–1128. [Google Scholar] [CrossRef]
Wang, W.; Su, C.; Han, G.; Dong, Y. Efficient segmentation of water leakage in shield tunnel lining with convolutional neural network. Struct. Health Monit. 2024, 23, 671–685. [Google Scholar] [CrossRef]
Wang, Y.; Huang, K.; Zheng, K.; Liu, S. DAEiS-Net: Deep Aggregation Network with Edge Information Supplement for Tunnel Water Stain Segmentation. Sensors 2024, 24, 5452. [Google Scholar] [CrossRef]
Wang, Y.; Huang, K.; Sun, L.; Gao, J.; Guo, Z.; Chen, X. WLAN: Water Leakage-Aware Network for water leakage identification in metro tunnels. Neural Comput. Appl. 2025, 37, 22179–22189. [Google Scholar] [CrossRef]
Song, R.; Wu, Y.; Wan, L.; Shao, S.; Wu, H. SE-TransUNet-Based Semantic Segmentation for Water Leakage Detection in Tunnel Secondary Linings Amid Complex Visual Backgrounds. Appl. Sci. 2025, 15, 7872. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Deng, L.; Wang, W.; Chen, S.; Yang, X.; Huang, S.; Wang, J. PDS-UKAN: Subdivision hopping connected to the U-KAN network for medical image segmentation. Comput. Med. Imaging Graph. 2025, 2025, 102568. [Google Scholar] [CrossRef]
Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: New York, NY, USA, 2016; pp. 565–571. [Google Scholar]
Yang, J.; Wang, Z.; Guo, Y.; Gong, T.; Shan, Z. A novel noise-aided fault feature extraction using stochastic resonance in a nonlinear system and its application. IEEE Sens. J. 2024, 24, 11856–11866. [Google Scholar] [CrossRef]
He, C.; Huo, X.; Zhu, C.; Chen, S. Minimum redundancy maximum relevancy-Based multiview generation for time series sensor data classification and its application. IEEE Sens. J. 2024, 24, 12830–12839. [Google Scholar] [CrossRef]
Gao, J.; Zhou, S.; Yu, H.; Li, C.; Hu, X. SCESS-Net: Semantic consistency enhancement and segment selection network for audio-visual event localization. Comput. Vis. Image Underst. 2025, 262, 104551. [Google Scholar] [CrossRef]
Hu, X.; Zhang, X.; Chen, F.; Liu, Z.; Liu, J.; Tan, L.; Tang, T. Simultaneous Fault Diagnosis for Sensor and Railway Point Machine for Autonomous Rail System. In Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), Edmonton, AB, Canada, 24–27 September 2024; pp. 1011–1016. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the International Workshop on Deep Learning in Medical Image Analysis, Granada, Spain, 20 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]

Figure 1. Examples of tunnel lining images with varying lighting conditions and annotation of segmentation challenges, including multi-scale water leakage areas (blue lines), occlusions (red boxes), and bright light regions (yellow box).

Figure 2. Our proposed efficient multi-scale segmentation network for water leakage detection of subway tunnel linings (EMSNet), which includes hierarchical encoder with our proposed decoder architecture.

Figure 3. The structure of EEDM.

Figure 4. The architecture of PKAN Block.

Figure 5. The structure of AMS Block.

Figure 6. Different types of tunnel water stain samples in TWL dataset.

Figure 7. Qualitative analysis of the results. The first column shows the original images, the second column shows their corresponding ground truth values, and columns (3)–(9) show the segmentation results of our proposed model and the other classical models. The red box highlights the regions where our model demonstrates a clear advantage compared with the other methods.

Figure 8. Qualitative results of the ablation study on EMS-UKAN for water leakage detection.

Table 1. Comparison of our EMS-UKAN with the classic methods.

Model	GFLOPs	Param (M)	Acc (%)	IoU (%)	Dice (%)
U-Net	124.37	13.4	85.96	78.60	83.27
UNet++	138.86	9.16	86.03	79.18	83.79
CCNet	216.78	49.48	82.79	68.00	68.28
Deeplabv3+	164.1	39.63	83.68	71.39	74.18
TransUNet	130.1	67.87	85.57	74.13	79.92
SwinUNet	30.88	27.18	84.16	64.31	72.51
EMS-UKAN	34.9	19.37	86.52	82.19	85.46

Table 2. Ablation study on the proprosed EMS-UKAN.

Model	Acc (%)	IoU (%)	Dice (%)
UNet w/DWConv(Baseline)	86.26	78.92	83.62
Baseline + EEDM	86.29	79.30	84.24
Baseline + EEDM + PKAN	86.31	81.70	84.95
EMS-UKAN	86.52	82.19	85.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, M.; Tan, L.; Yang, X.; Liu, F.; Zhao, Z.; Wu, X. EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings. Appl. Sci. 2025, 15, 12859. https://doi.org/10.3390/app152412859

AMA Style

He M, Tan L, Yang X, Liu F, Zhao Z, Wu X. EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings. Applied Sciences. 2025; 15(24):12859. https://doi.org/10.3390/app152412859

Chicago/Turabian Style

He, Meide, Lei Tan, Xiaohui Yang, Fei Liu, Zhimin Zhao, and Xiaochun Wu. 2025. "EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings" Applied Sciences 15, no. 24: 12859. https://doi.org/10.3390/app152412859

APA Style

He, M., Tan, L., Yang, X., Liu, F., Zhao, Z., & Wu, X. (2025). EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings. Applied Sciences, 15(24), 12859. https://doi.org/10.3390/app152412859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

EMS-UKAN: An Efficient KAN-Based Segmentation Network for Water Leakage Detection of Subway Tunnel Linings

Abstract

1. Introduction

2. Related Works

2.1. Feature Extraction Architectures

2.2. Segmentation Models Applied to Tunnel Water Leakage Detection

3. Methods

3.1. Overall Architecture

3.2. Efficient Backbone with Depthwise Separable Convolutions and EEDM

3.3. Tokenized PKAN Block

3.4. Adaptive Multi Scale Feature Extraction Block

4. Experiments Details

4.1. Datasets and Preprocessing

4.2. Experimental Details

4.3. Training Loss Function

4.4. Evaluation Metrics

5. Results and Analysis

5.1. Quantitative Comparison

5.2. Qualitative Comparison

5.3. Ablation Study

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI