Next Article in Journal
Integrating Color and Contour Analysis with Deep Learning for Robust Fire and Smoke Detection
Previous Article in Journal
An Improved Autonomous Emergency Braking Algorithm for AGVs: Enhancing Operational Smoothness Through Multi-Stage Deceleration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Advances in Deep Learning for Semantic Segmentation of Low-Contrast Images: A Systematic Review of Methods, Challenges, and Future Directions

by
Claudio Urrea
* and
Maximiliano Vélez
Electrical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Las Sophoras 165, Estación Central, Santiago 9170124, Chile
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(7), 2043; https://doi.org/10.3390/s25072043
Submission received: 11 February 2025 / Revised: 18 March 2025 / Accepted: 23 March 2025 / Published: 25 March 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

:
The semantic segmentation (SS) of low-contrast images (LCIs) remains a significant challenge in computer vision, particularly for sensor-driven applications like medical imaging, autonomous navigation, and industrial defect detection, where accurate object delineation is critical. This systematic review develops a comprehensive evaluation of state-of-the-art deep learning (DL) techniques to improve segmentation accuracy in LCI scenarios by addressing key challenges such as diffuse boundaries and regions with similar pixel intensities. It tackles primary challenges, such as diffuse boundaries and regions with similar pixel intensities, which limit conventional methods. Key advancements include attention mechanisms, multi-scale feature extraction, and hybrid architectures combining Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs), which expand the Effective Receptive Field (ERF), improve feature representation, and optimize information flow. We compare the performance of 25 models, evaluating accuracy (e.g., mean Intersection over Union (mIoU), Dice Similarity Coefficient (DSC)), computational efficiency, and robustness across benchmark datasets relevant to automation and robotics. This review identifies limitations, including the scarcity of diverse, annotated LCI datasets and the high computational demands of transformer-based models. Future opportunities emphasize lightweight architectures, advanced data augmentation, integration with multimodal sensor data (e.g., LiDAR, thermal imaging), and ethically transparent AI to build trust in automation systems. This work contributes a practical guide for enhancing LCI segmentation, improving mean accuracy metrics like mIoU by up to 15% in sensor-based applications, as evidenced by benchmark comparisons. It serves as a concise, comprehensive guide for researchers and practitioners advancing DL-based LCI segmentation in real-world sensor applications.

1. Introduction

Semantic segmentation (SS) is a computer vision technique that classifies each pixel in an image into a defined class, enabling the recognition and labeling of different regions in a scene [1]. This technique is pivotal for intelligent systems in automation and robotics, assisting in environmental understanding and supporting professionals in decision-making across applications such as autonomous navigation, medical diagnostics, and industrial quality control [2,3,4,5].
Image contrast, defined as the difference in pixel intensity, significantly influences SS performance [6]. Traditional deep learning (DL) approaches, often based on encoder–decoder architectures [7], excel when images exhibit a balanced intensity distribution, as regions are more distinguishable [8,9]. However, low-contrast images (LCIs) pose a challenge due to their diffuse boundaries and uniform tonal regions, which obscure object outlines and hinder accurate segmentation [10,11].
Conventional SS techniques, such as thresholding and edge detection, struggle in LCI scenarios because they depend on sharp intensity gradients, which are typically absent [10]. For example, in medical ultrasound imaging, tissue boundaries lack clear separation, while in autonomous navigation, low-light conditions blur object edges, often yielding segmentation accuracies below practical thresholds (e.g., Dice Similarity Coefficient (DSC) < 60%) [11]. Enhancing LCI SS is vital for critical applications, including disease diagnosis [12], remote sensing [13], defect detection [14], autonomous vehicles [15], and mineral exploration [16], where sensors like cameras, scanners, and industrial detectors frequently generate LCI, demanding advanced solutions [12,13,14,15,16]. Recent DL innovations improve performance by expanding the Effective Receptive Field (ERF), refining information flow, and capturing contextual features [17]. Techniques such as visual attention mechanisms and Atrous Spatial Pyramid Pooling (ASPP) enlarge the ERF, while dense connections preserve high-resolution details [18,19], and Vision Transformers (ViTs) enhance contextual understanding when paired with Convolutional Neural Networks (CNNs) [20].
This systematic review develops a comprehensive assessment of DL methods tailored for LCI SS, aiming to improve segmentation accuracy in sensor-driven automation and robotics applications. This work contributes a practical solution for addressing LCI challenges, enhancing segmentation performance by up to 15% in mean accuracy metrics (e.g., mIoU) compared to traditional methods, as demonstrated through benchmark evaluations. It evaluates the strengths, limitations, mechanisms, architectures, and practical implementations of these methods, serving as an essential resource for researchers and practitioners in automation and robotics seeking to apply DL to sensor-driven LCI segmentation. Unlike previous reviews that focus on general SS or specific fields (e.g., medical imaging [21]), this work uniquely bridges LCI challenges across diverse sensor-based domains, marking it as the first systematic review of its kind. We compare these DL advancements with traditional methods, assess their implications for automation systems, and propose future research directions, including integration with emerging paradigms like foundation models [3].
The review is organized into eight sections. Section 2 outlines the systematic review methodology, including the process and evaluation tools. Section 3 introduces the selected studies, followed by Section 4, which details DL techniques and their applications for LCIs. Section 5 compares the studies, Section 6 discusses key findings, Section 7 presents conclusions, and Section 8 explores future perspectives.

2. Systematic Review Methodology

This review adopts a methodology based on the framework proposed by [21], which offers a streamlined, reliable approach for conducting computer science reviews. The process follows five stages outlined in [22], summarized below:
Step 1: Framing Questions: Research questions are defined to align the review with its objectives and scope.
Step 2: Identifying Relevant Studies: Studies matching the research questions are sourced using specific inclusion and exclusion criteria.
Step 3: Assessing Study Quality: The rigor, credibility, and relevance of selected studies are evaluated with a standardized checklist.
Step 4: Summarizing Evidence: Key findings are extracted and synthesized to highlight trends, limitations, and research gaps.
Step 5: Interpreting Findings: The synthesized evidence is analyzed to answer the research questions, draw conclusions, and suggest future work.

2.1. Framing Questions for the Review

Defining research questions (RQs) ensures the review meets its goals. Using the PICOC criteria (population, intervention, comparison, outcome, context; see Appendix A), we formulated the following questions:
  • RQ1: What is LCI SS?
  • RQ2: How does LCI SS benefit automation applications?
  • RQ3: Which DL methods are employed in these studies?
  • RQ4: How do these studies compare with state-of-the-art approaches?
  • RQ5: What are the strengths and weaknesses of the selected studies?
  • RQ6: What results do these studies achieve?
  • RQ7: What future research opportunities exist for LCI SS?
  • RQ8: What dataset limitations affect LCI SS model training, and how well do they represent real-world scenarios?

2.2. Identifying Relevant Studies

Articles were retrieved from academic search engines: Bielefeld Academic Search Engine (BASE), Google Scholar, and Refseek. Search terms were derived from the PICOC criteria’s context, intervention, and population attributes (Appendix A).
Inclusion and exclusion criteria (Appendix B) guided the selection process. We focused on supervised DL methods for SS applied to LCI datasets, published between 2022 and 2024 in Q1 journals (per Journal Citation Reports (JCRs) and SCImago Journal Rank (SJR) metrics). Given the rapid evolution of DL in computer vision, this two-year window ensures that this review captures the most current advancements.

2.3. Assessing the Quality of Studies

Study quality was evaluated using a custom instrument developed by our team (Appendix C), which assesses rigor, relevance, result presentation, and credibility [23]. A Likert-type scale classified studies as high or low quality, ensuring a systematic and reproducible assessment [24].

2.4. Summary and Analysis

After selection, study data were synthesized based on mechanisms, base architectures, application domains, and segmentation performance. This synthesis enabled us to group methods, identify trends, and visualize patterns in LCI SS research, providing a foundation for subsequent analysis.

3. Selected References

This section outlines the selection process and key characteristics of the 25 studies reviewed for the DL-based SS of LCIs. The process, depicted in Figure 1, began with a bibliographic search using keywords derived from the population, intervention, and context attributes of the PICOC criteria (Appendix A). Searches were conducted across three academic engines—Bielefeld Academic Search Engine (BASE), Google Scholar, and RefSeek—with Google Scholar yielding the most results (203 publications).
Next, duplicates were removed, and the inclusion and exclusion criteria (Appendix B) were applied to filter the initial 264 publications. Studies focusing on synthetic images, multitask architectures, 3D convolution modules, or pure machine learning models were excluded, eliminating 161 publications. The remaining 103 studies underwent quality assessment using the checklist in Appendix C. Approximately half were discarded due to misalignment with the review’s core objective—enhancing LCI SS—leaving 25 high-quality studies that specifically address this challenge. Discarded articles are listed in Appendix D.
Figure 2 illustrates the distribution of application domains before (a) and after (b) quality assessment. Medical applications dominate, comprising 45% of initial studies and 59% of final selections, followed by surface defect detection (24% and 17%, respectively). In Figure 2a, the “Others” category includes niche applications like fingerprint segmentation and mail label detection [25,26]. Notably, ref. [27] was categorized under smoke detection, as its primary focus aligns with smoke-related datasets, despite broader applicability.
Table 1 summarizes the 25 selected studies, detailing their architectural types, mechanisms for expanding the ERF, model size (parameters in millions), maximum performance metrics, such as mean Intersection over Union (mIoU) and Dice Similarity Coefficient (DSC). It also includes information on datasets, key highlights, and limitations. (e.g., mean Intersection over Union, Dice Similarity Coefficient), datasets, highlights, and limitations. Studies are grouped by application domain—surface defect, scene understanding, mineral exploration, remote sensing, smoke detection, and medical—to highlight domain-specific trends. “NR” denotes “Not Reported” where data were unavailable. These studies primarily fall into two categories: CNN-based models and hybrid approaches combining CNNs with ViTs or Multi-Layer Perceptron (MLP).

4. Deep Learning Method for Low-Contrast Image Segmentation and Applications

The 25 selected studies primarily employ two design approaches: CNN-based models and hybrid architectures integrating CNNs with ViTs and MLPs. This section explores their applications, datasets, baseline architectures, and mechanisms for enhancing the ERF, focusing on how these methods address LCI segmentation challenges in sensor-driven contexts.

4.1. Applications and Datasets

LCI segmentation is critical across diverse sensor-based applications. In medical diagnostics, modalities like magnetic resonance imaging (MRI), ultrasound, and Computed Tomography (CT) produce grayscale images with similar intensities due to tissue uniformity [51]. Similarly, RGB images from colonoscopies and dermoscopy show tonal overlap between lesions and surroundings [52]. In industrial quality control, surface defects (e.g., scratches, cracks) blend with their backgrounds, forming LCIs [53]. Remote sensing images for environmental monitoring, crop analysis, and disaster assessment exhibit LCI traits due to shadows or tonal similarity [54]. Autonomous navigation struggles with nocturnal scenes under limited lighting, merging object tonalities [55]. Smoke detection systems also encounter LCIs, as smoke resembles clouds or fog [56].
Figure 3 shows examples of images and segmented masks from public datasets containing LCIs.
Public datasets with LCI characteristics underpin these applications. Examples include the following:
  • AITEX [56]: 245 fabric defect images (e.g., knots, tears) from seven factories.
  • MT [57]: 1344 magnetic tile images with six defect types (e.g., cracks), 219 × 264 pixels, and pixel-level annotations.
  • TN3K [58]: 3493 ultrasound thyroid nodule images, 421 × 345 pixels, with masks.
  • CAMUS [59]: 2D echocardiographic sequences from 500 patients for cardiac analysis.
  • SCD [60]: MRI images from 45 patients, segmenting left ventricles in normal and diseased states.
  • ISIC-2016 [61]: 1279 dermoscopic images for skin cancer classification (malignant/benign).
  • CVC-ClinicDB [62]: 612 colonoscopy images, 384 × 288 pixels, with polyp masks.
  • ISPRS-Potsdam [62]: High-resolution (6000 × 6000 pixels) urban satellite images.
  • NightCity [63]: 4297 nighttime driving images with pixel-level labels.
  • DRIVE [64]: 40 retinal vessel images, 565 × 584 pixels.
Appendix E provides detailed dataset descriptions.

4.2. Design of Reviewed Methods and Baseline Architectures

The reviewed methods fall into two categories: CNN-based and hybrid (CNN + ViT/MLP). CNN-based methods leverage atrous convolution (AC) for ERF expansion, incorporating spatial attention via convolutional operations and channel attention via Squeeze-and-Excitation (SE) modules. Hybrid methods combine CNNs with ViTs for long-range dependencies and MLPs for complex feature representation, enhancing global context capture.
These methods build on state-of-the-art architectures—UNet, DeepLab, and Segformer [65,66,67]—illustrated in Figure 4:
  • Unet: Uses skip connections between the encoder and decoder to preserve spatial details (Figure 4a), widely adopted for medical imaging and extended to remote sensing and defect detection. Eighty-seven percent of reviewed methods modify UNet, enhancing feature fusion with dense connections, attention mechanisms, or multi-scale modules.
  • Deeplab: Employs an ASPP module in the encoder, merging multi-scale features with the initial feature map (Figure 4b).
  • Segformer: Integrates efficient transformer modules with lightweight MLP decoders (Figure 4c).
Figure 4. Simplified schematics of baseline architectures: (a) UNet; (b) DeepLabV3; and (c) Segformer.
Figure 4. Simplified schematics of baseline architectures: (a) UNet; (b) DeepLabV3; and (c) Segformer.
Sensors 25 02043 g004
Hybrid methods often incorporate ViTs in deeper layers for contextual selection [30,41,42,43,47]. For instance, ref. [50] uses nnUNet [68], a self-configuring UNet variant optimizing preprocessing, training, and post-processing (e.g., resolution, learning rate). In [32], DeepLab and Segformer form a dual-branch encoder, paired with a Retinex-based decomposition decoder [69]. Similarly, ref. [35] combines Segformer and HardNet branches via an MLP, balancing global (transformer) and local (convolutional) features.

4.3. Mechanisms to Enhance the Effective Receptive Field

The ERF defines the image region influencing a pixel’s activation in deep layers, shaped by filter size, stride, and pooling [70,71]. A larger ERF improves LCI SS accuracy by capturing contextual information, reducing noise, and detecting long-range pixel relationships critical for faint edges or similar-textured regions [72,73,74]. Reviewed methods enhance ERF using two strategies: specialized convolutions and attention mechanisms.

4.3.1. Convolutions for Expanding Effective Receptive Field

Figure 5 compares convolution types used to expand ERF while minimizing computational cost [75].
  • Dilated Convolution (DC): Adds spacing between kernel elements (Figure 5b), expanding ERF without extra parameters or resolution loss [76]. In [28], serial DCs with varying dilation rates at the bottleneck capture abstract features. In [31], ASPP concatenates DCs, preserving multi-resolution details.
  • Depthwise Convolution (DwC): Applies convolution per channel (Figure 5c), often paired with Pointwise Convolution (PwC) in Depth Separable Convolution (DS) to reduce computation and enhance information exchange. In [38], DS with strip convolutions captures directional, multi-scale features.
  • Deformable Convolution: In [48], learnable offsets adapt kernels to object shapes, improving flexibility over fixed-rate DCs [77].
Figure 5. Convolution types: (a) Traditional; (b) Dilated Convolution (DC); and (c) Depthwise Convolution (DwC). F = input features; f = output features; and f′ = depthwise output.
Figure 5. Convolution types: (a) Traditional; (b) Dilated Convolution (DC); and (c) Depthwise Convolution (DwC). F = input features; f = output features; and f′ = depthwise output.
Sensors 25 02043 g005

4.3.2. Attention Mechanisms

Attention mechanisms enhance LCI SS by prioritizing key features and global context, reducing noise from irrelevant regions [78,79]. They include CNN-based modules and ViT/MLP integrations.
CNN-based attention (Figure 6) includes the following:
  • Squeeze-and-Excitation (SE): Implements channel attention (Figure 6a) [80]. In [33], four SE + DC branches at the bottleneck filter features, suppressing noise.
  • Bottleneck Attention Module (BAM): Processes spatial and channel attention in parallel (Figure 6b), reducing background noise in [29,81].
  • Channel Prior Convolutional Attention (CPCA): In [36], refines features sequentially via DwC-enhanced spatial attention (Figure 6c) [82].
Figure 6. CNN-based attention modules in the reviewed methods, composed by Channel Attention (CA) and Spatial Attention (SA) modules: (a) Squeeze-and-Excitation (SE); (b) Bottleneck Attention Module (BAM); (c) Channel Prior Convolutional Attention (CPCA). F = input features; f′ = intermediate features; and f = output features. The symbol ⊗ is used to represent element-wise multiplication.
Figure 6. CNN-based attention modules in the reviewed methods, composed by Channel Attention (CA) and Spatial Attention (SA) modules: (a) Squeeze-and-Excitation (SE); (b) Bottleneck Attention Module (BAM); (c) Channel Prior Convolutional Attention (CPCA). F = input features; f′ = intermediate features; and f = output features. The symbol ⊗ is used to represent element-wise multiplication.
Sensors 25 02043 g006
ViT-based Multihead Self-Attention (MHSA) captures long-range dependencies [20]. In [41,42], MHSA applies channel attention in deep layers; Ref. [43] pairs SE with MHSA for a hybrid local–global focus. Swin Transformer variants in [40,47] use residual and star-shaped patches for efficiency [83], while [63] employs MiT for spatial parameter reduction [84]. MLP-based modules in [27] use boundary loss for edge detection, and [44] combines a Spatial Mixer MLP with PwC for channel relationships [85].

4.4. Feature Fusion

Feature fusion in semantic segmentation architectures combines information from multiple levels to boost prediction accuracy and robustness. In low-contrast image segmentation, skip connections are the most common technique, linking encoder and decoder feature maps at matching resolutions (Figure 4a). These connections preserve spatial details, mitigate vanishing gradients, and enhance fusion by integrating low- and high-level features, avoiding poor decoder interpolations [86]. Methods [28,38,39,41] use skip connections with element-wise multiplication or addition for effective feature merging.
However, skip connections face challenges, including local feature redundancy, weak long-range dependency capture, and limited cross-scale integration [87,88,89]. These limitations impair small object detection, edge delineation, and scale adaptability in LCI [34]. To address this, reviewed methods propose enhancements: attention-augmented skip connections [40,41,42] prioritize salient features; specialized blocks [18,34] enable hierarchical interactions for cross-scale fusion; frequency-domain filtering [27] reduces noise; and pyramid and multi-scale designs [29,46,47] improve local–global integration, enhancing accuracy for imbalanced or variable-scale data.

4.5. Implementation of Reviewed Methods

LCI SS methods employ techniques to optimize supervised learning, enhancing training and inference for real-world robustness. These include deep supervision in intermediate decoder layers, preprocessing during inference, and data augmentation to increase dataset diversity.
Figure 7 illustrates the LCSeg-Net model [27], which optimizes kernel weight selection through integrated strategies. These include data augmentation with synthetic masks and images, alongside deep supervision using boundary loss to enhance segmentation accuracy. This combination is reflected across reviewed models, with some adopting all techniques and others focusing on specific elements.
Data augmentation, common across reviewed methods [90], applies transformations like rotation, scaling, and tonal adjustments. Preprocessing varies by method: [48] uses wavelet transformation with multi-frequency sampling; [47] enhances contrast and brightness during inference. CASDD [28] employs a Generative Adversarial Network (GAN) to generate synthetic images, boosting dataset variety and robustness.
Deep supervision is widely adopted: [1,2] apply loss to the original mask, while [3,4] use edge masks with boundary loss. Training splits typically allocate 60–80% of data to training, with the rest for validation and testing. Five-fold cross-validation [91] is used in some studies, rotating validation sets across five cycles. Pretrained backbones (e.g., ImageNet, ADE200) are fine-tuned on study-specific datasets. NVIDIA GeForce RTX 3090 (24 GB VRAM) is the most used GPU, followed by Tesla V100 (16 GB). See Appendix F, Table A5, for details.

4.6. Training of Architectures

Most SS models use the Cross-Entropy (CE) loss function [92] for pixel-level evaluation, as it penalizes small errors heavily, accelerating optimization [31]. Methods [30,39,42] rely solely on CE, while others create hybrid loss functions: Refs. [31,36,38] combine CE with DSC (weighted) for small region emphasis; Refs. [18,33,34] add the Structural Similarity Index Measure (SSIM) or Boundary Loss (BL) for edge focus; Refs. [29,35] use mIoU alone, with [33] blending it with CE for global overlap.
Deep supervision adds auxiliary losses in intermediate layers to guide hierarchical learning [29,31,33]. In [27], it follows signal filtering, comparing features to BL. Hyperparameters include learning rates of 10−5 to 10−3 and batch sizes of 2–20, with the Adaptative Moment Estimation (ADAM) optimizer [93] used in 70% of studies. See Appendix G, Table A6, for details.

5. Study Comparison

5.1. Application Domains and Dataset Availability

LCI SS methods support diverse applications, primarily aiding human decision-making in diagnostics and risk detection. Medical diagnostics dominate, leveraging CT, MRI, and ultrasound for tasks like artery segmentation to detect cardiovascular issues. Other applications include surface defect analysis and smoke/fire detection.
Datasets focus on single-region segmentation, with MT (magnetic tiles surface defects) and ISIC (skin lesions) being the most popular public datasets. Smoke detection datasets are scarce, with only the smoke semantic segmentation (SSS) dataset noted. Four studies created custom datasets across medical, scene understanding, mineral exploration, surface defect, and smoke categories, each with one dataset, highlighting data scarcity (RQ8).

5.2. Methodology and Design of the Reviewed Studies

CNN and hybrid methods adapt established architectures—UNet, DeepLab, and Segformer—to optimize LCI SS. UNet dominates, used in 87% of studies, reflecting its prevalence in medical applications, the most common domain reviewed. These DL methods enhance inference accuracy by refining base architectures with strategies that integrate local and global feature capture and fusion, often incorporating attention mechanisms to filter spatial and channel features.
Hybrid architectures combine CNNs, ViTs, and MLPs to leverage their complementary strengths. CNNs excel at local feature extraction with a high inductive bias [94], while ViTs and MLPs capture global context, offering greater implementation flexibility.
No architectural distinctions exist between single- and multi-region segmentation in either CNN or hybrid methods. The focus remains on modular techniques, such as attention and multi-scale designs, to address LCI SS challenges effectively.

5.3. Training and Implementation of the Reviewed Methods

Per [95], non-pretrained CNNs need ~10,000 samples for optimal crack segmentation performance without augmentation. LCI datasets face challenges: limited availability, sparse annotations, class imbalance, variable scales, and low resolution [96,97]. Data augmentation (e.g., 750 to 2400 samples [98]) and pretraining improve robustness and DSC (>80%). Hybrid methods benefit most from augmentation, as ViTs require large datasets due to low inductive bias.
Hybrid loss functions dominate, with CE leading for convergence and class imbalance handling. Five-fold cross-validation aids small datasets (<20 samples). GPU needs vary by model size: UNet (28 M parameters) requires 4 GB VRAM, UNETR (133 M) needs 24 GB [99]. Most reviewed methods (<45 M parameters) use 6 GB VRAM; RNightSeg [32] and FDR-TransUNet [41] (>100 M) require 24 GB (RQ5, RQ6).

5.4. Performance of the Reviewed Methods

Inference accuracy, measured primarily by mIoU and DSC, is the key evaluation metric for the reviewed methods. All 25 methods surpass their baseline architectures, achieving mIoU and DSC values above 80% in most cases, demonstrating robust LCI segmentation performance.
Figure 8 illustrates performance against UNet: CoVi-Net [39] exceeds UNet by ~15% DSC, excelling at fine corneal vessel segmentation, while TBNet [45] improves by <1%, targeting corneal endothelium cells. Both highlight strengths in fine-structure detection.
Figure 9 compares performance across public datasets: Figure 9a for surface defect datasets (MT, NEU, RSDD) shows LCSeg-Net [27] achieving DSC > 90% with <25 M parameters and Figure 9b for skin lesion datasets (ISIC, PH2) reveals SWTRU [47] surpassing 95% DSC, with Ms RED [46] being competitive at <10 M parameters despite a smaller model size.
Figure 10 presents qualitative comparisons: [42] (fine structures) and [35] (robust structures) outperform their baselines (UNet and Segformer, respectively), showing sharper LCI segmentation.
Two outliers underperform: TBNet [45] achieves 63.9% mIoU due to limited, imbalanced data; RNightSeg [32] scores 57.9% mIoU, impacted by complex night scenes with light flares and fine details. Model size typically increases over baselines, with added modules raising parameter counts, computational complexity, and training time. Inference time, reported in nine studies, varies with hardware: LCSeg-Net [27] achieves 40–70 fps (real-time), while most others average < 1 fps, reflecting GPU diversity (e.g., RTX 3090, Tesla V100).

6. Discussion

This review’s methodology, adapted from [21], ensures a systematic, efficient analysis by integrating tools for study selection, quality assessment, and data synthesis. This structured approach enhances rigor, transparency, and reproducibility, distinguishing it from narrative reviews like [100] that lack standardization. The quality assessment (Appendix C) proved critical, with criterion 1.3—requiring explicit focus on LCI segmentation—acting as the most effective filter for ensuring relevance and credibility.
DL methods outperform traditional techniques in LCI segmentation. For instance, K-means and Gaussian Mixture Models (GMMs) achieve < 70% accuracy in breast lesion segmentation [101], while thresholding maximum principal strain maps yields < 50% DSC in crack segmentation [102]. Reviewed DL methods, mostly hybrid, leverage CNNs, ViTs, and MLPs: CNNs capture local features with high inductive bias, ViTs excel at contextual understanding, and MLPs model nonlinear feature relationships, yielding DSC and mIoU > 80% (Section 5.4).
Architectural designs target LCI challenges universally, using techniques like attention mechanisms and multi-scale fusion (Section 4). Methods like PCTNet [30] and GT-DLA-dsHFF [42] excel at fine structures (e.g., vessels, cracks), yet no distinction exists between single- and multi-region segmentation designs. This flexibility, as shown in [103], allows single-region datasets to support multi-region extraction via probabilistic modules and trainable classifiers, suggesting adaptability across tasks.
Most methods use non-pretrained models, despite LCI dataset scarcity (Section 5.3). Pretrained models could enhance performance: LCSeg-Net [27] and PCTNet [30] leverage ResNet-34 pretrained on ImageNet-10K and ADE20K, though these datasets feature balanced contrast. MedSAM [104], pretrained on 1.5 million medical LCI images, offers a promising alternative for automated diagnostics, addressing data limitations (RQ8).
Ethical, social, and legal implications of AI-assisted decision-making are significant. Human–machine collaboration is growing [105], raising concerns about accountability for errors and penalty allocation [106]. Future LCI SS research must prioritize transparency to build trust in automation applications (Section 8).

7. Conclusions

This systematic review comprehensively assessed recent DL advancements for the SS of LCIs, offering a detailed evaluation of their strengths and limitations across diverse applications. Hybrid architectures integrating CNNs and ViTs, enhanced by attention mechanisms, have proven exceptionally effective at tackling the inherent challenges of LCI segmentation, such as diffuse edges and tonal similarities that obscure object boundaries. These methods leverage CNNs’ ability to extract local spatial details and ViTs’ capacity to capture global contextual relationships, resulting in superior performance over traditional approaches. Multi-scale processing and ERF expansion further boost accuracy in complex scenes (Section 4), enabling models to adapt to varying object scales and intricate backgrounds often encountered in LCI scenarios. Among the standout methods, SWTRU achieves top-tier performance in medical segmentation, excelling at delineating fine anatomical structures critical for diagnostics; Ms RED stands out with the smallest model size, offering an efficient solution without compromising accuracy; and LCSeg-Net enables real-time inference, demonstrating robustness across multiple domains with processing speeds suitable for practical deployment (Section 5.4). These examples highlight the diversity of innovations, from high-precision medical applications to resource-efficient and time-sensitive industrial or autonomous systems.
Despite significant progress in accuracy and robustness, several persistent challenges underscore the need for further development. The scarcity of diverse, large-scale LCI datasets remains a critical bottleneck, limiting model training and generalization across varied real-world conditions (RQ8). This data scarcity is particularly pronounced in domains like smoke detection and mineral exploration, where public datasets are rare, forcing reliance on small or custom datasets that may not fully represent operational complexities. Additionally, the high computational demands of transformer-based models pose a barrier, requiring substantial processing power and memory that may not be feasible in resource-constrained environments, such as edge devices or mobile platforms. The limited availability of real-time solutions exacerbates this issue, as most methods struggle to achieve the speed necessary for applications like autonomous navigation or industrial monitoring, where immediate decision-making is essential. These challenges collectively hinder the scalability and accessibility of LCI SS, particularly in settings where computational resources or annotated data are sparse.
DL significantly enhances LCI SS across a wide range of sensor-driven domains, including medical imaging, autonomous driving, industrial inspection, and remote sensing, transforming how intelligent systems interpret challenging visual data. In medical imaging, it enables the precise detection of subtle tissue boundaries, supporting earlier and more accurate diagnoses. In autonomous driving, it improves scene understanding under poor lighting, enhancing safety and reliability. Industrial inspection benefits from better defect detection on uniform surfaces, ensuring quality control, while remote sensing gains from improved environmental analysis despite shadows or tonal overlap. Yet, to fully unlock this potential, optimizations are critical in three key areas: computational efficiency, data availability, and model interpretability (Section 6). Efficiency improvements could democratize access to these technologies, making them viable for smaller organizations or low-power devices. Enhanced data availability, through expanded datasets or synthetic generation, would strengthen model robustness, reducing overfitting and improving adaptability. Greater interpretability would build trust, especially in safety-critical applications, by clarifying how models reach their segmentation decisions. Together, these advancements promise to bridge current gaps, paving the way for broader adoption and impact in automation and robotics.

8. Future Perspectives

This review identifies key research directions to advance LCI SS:
  • Computational Efficiency: Developing lightweight, energy-efficient transformer architectures is essential for real-time deployment. Techniques like quantization, knowledge distillation, and pruning can cut computational costs without sacrificing performance (RQ7).
  • Dataset Expansion and Augmentation: Limited high-quality, annotated LCI datasets hinder progress (Section 5.3). Future efforts should create diverse, large-scale datasets spanning multiple domains. Advanced augmentation, such as synthetic image generation, can address data scarcity, including enhancing image sharpness to aid segmentation (RQ8).
  • Self-Supervised and Few-Shot Learning: Reducing reliance on labeled data, self-supervised and few-shot learning can improve generalization with minimal supervision, such as correlating pixels across classes for enhanced segmentation (RQ7).
  • Real-Time and Mobile Deployment: Enhancing real-time performance on resource-limited devices is vital. Efficient baseline frameworks and mobile-optimized architectures can balance performance and deployability (RQ7).
  • Integration of Multimodal Sensor Data: Fusing LCI with modalities like LiDAR, thermal, or hyperspectral data can improve accuracy in challenging conditions. Developing models to leverage multimodal inputs is a priority (RQ7).
  • Ethics, Interpretability, and Explainability: As DL influences safety-critical decisions, transparency and trust are paramount. Explainable AI techniques must illuminate model decisions, especially in medical and autonomous navigation contexts (Section 6).
Addressing these gaps will drive LCI SS forward, enhancing performance and enabling broader sensor-driven applications.

Author Contributions

Conceptualization, C.U. and M.V.; methodology, C.U. and M.V.; formal analysis, C.U. and M.V.; investigation, C.U. and M.V.; resources, C.U. and M.V.; data curation, C.U. and M.V.; writing—original draft preparation, C.U. and M.V.; writing—review and editing, C.U. and M.V.; visualization, C.U. and M.V.; supervision, C.U.; project administration, C.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This work has been supported by the Faculty of Engineering of the University of Santiago of Chile, Chile.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations, listed in alphabetical order, are used in this manuscript:
ADAMAdaptative Moment Estimation
AIArtificial Intelligence
A-MLPAxial MLP Attention
ACAtrous Convolution
AIABAggregation Inhibition Activation Block
ASPPAtrous Spatial Pyramid Pooling
BAMBottleneck Attention Mechanism
BASEBielefeld Academic Search Engine
BCEnBrightness and Contrast Enhancement
BLBoundary Loss
BSANetBoundary-aware and Scale-Aggregation Network
CAChannel Attention
CAMUSCardiac Acquisitions for Multi-structure Ultrasound Segmentation
CASDDComplementary Adversarial Network-Driven Surface Defect Detection
CBAMConvolutional Block Attention Mechanism
CECross Entropy
CEDCanny Edge Detection
CFDCrack Forest Dataset
C-MLPChannel MLP Attention
CNNsConvolutional Neural Networks
CPCAChannel Prior Convolutional Attention
CPCANetChannel Prior Convolutional Attention Network
CSwin-PNetCNN-Swin Transformer Pyramid Network
CTComputed Tomography
CVComputer Vision
CVC-ClinicDBComputer Vision Center Clinic Database
FAM-CRFSNFuse Attention Mechanism’s Coal Rock Full-Scale Network
DADual Attention
DCsDilated Convolutions
DeCDeformable Convolution
DLDeep Learning
DRIVEDigital Retinal Images for Vessel Extraction
DSDepth Separable Convolution
DSCDice Similarity Coefficient
DwCDepthwise Convolution
EMRA-NetEdge and Multi-scale Reverse Attention Network
ERFEffective Receptive Field
FAM-CRFSNFuse Attention Mechanism’s Coal Rock Full-Scale Network
FDR-TransUnetFeature double reuse Transformer Unet
FLFocal Loss
GANsGenerative Adversarial Networks
GMMGaussian Mixture Models
GPUGraphic Processing Unit
GT-DLA-dsHFFGlobal Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion
GVANetGrouped Multiview Aggregation Network
HEAT-NetHybrid Enhanced Attention Transformer
H2FormerHierarchical Hybrid Vision Transformer
ISICInternational Skin Imaging Collaboration
ISPRSInternational Society for Photogrammetry and Remote Sensing
JCRJournal Citation Reports
LCILow-Contrast Images
LCSeg-NetLow-Contrast Segmentation Network
LFLoss Function
LILinear Interpolation
MCMulti-Cross Attention
MFMedian Filter
MHSAMulti Head Self Attention
mIoUmean Intersection Over Union
MLPMulti-Layer Perceptron
mPaMean Pixel Accuracy
MRIMagnetic Resonance Imaging
Ms REDMulti-scale Residual Encoding and Decoding Network
MTMagnetic Tile
PCTNetPixel Crack Transformer Network
PICOCPopulation/Problem, Intervention, Comparison, Outcome, Context
PPLProgressive Perception Learning
PwCPointwise Convolution
RAReverse Attention
RQResearch Question
RNightSegRetinex Night Segmentation
RMSPropRoot Mean Square Propagation
SASpatial Attention
SESqueeze-and-Excitation
SGDStochastic Gradient Descent
SPASpatial Pyramid Attention
SPPSpatial Pyramid Pooling
SSSemantic Segmentation
SSIMStructural Similarity Index Measure
STCNet IISlab Track Crack Network II
SJRSCImago Journal Rank
SWTRUStar-shaped Window Transformer Reinforced U-Net
TBNetTransformer-embedded Boundary Perception Network
TD-NetTrans-Deformer Network
WDWavelet Decomposition
ViTsVisual Transformers

Appendix A. PICOC Criteria

Table A1. PICOC criteria implemented in the present review.
Table A1. PICOC criteria implemented in the present review.
AttributesKeywordsRelated
PopulationLCI DatasetsBlurry Images, Diffuse Images, Poor Light Scenes
InterventionSS Architectures with attention mechanismChannel Attention, Spatial Attention
SS Architecture with skip and dense connectionsUnet, DenseNet, HRNet
SS Architectures with atrous convolutionsDilatated Convolution, ASPP, Deeplab, PSPNet
SS Hibrid ArchitectureUnext, Swin-Unet
ComparisonSS CNN-pure architecturesFCN, AlexNet, VGG
SS Transformer-pure architecturesViT, EfficientViT
OutcomeInference AccuracymIoU, DSC
Inference SpeedFLOPs, Latency
ContextDiseases DiagnosisCT Images Analysis
Autonomous VehiclesScene SS
Remote SensingCloud and Snow SS
Surface Defect DetectionSurface Anomaly Detection
Mineral ExplorationMineral Prospecting
Smoke DetectionInitial Fire Detection

Appendix B. Inclusion and Exclusion Criteria

Table A2. Inclusion and exclusion criteria applied to publications found in the selected repositories and search engines.
Table A2. Inclusion and exclusion criteria applied to publications found in the selected repositories and search engines.
Criteria TypeInclusionExclusion
PeriodPublications from 2022 onwardsPublications before 2022
LanguagePublications in EnglishPublications other than English Language
Type of SourcePeer-reviewed journal articlesConference proceedings, technical reports, books, dissertations, and non-peer-reviewed works
Impact SourceJournals ranked in Q1 (based on JCR or SJR metrics)Journals ranked in Q2–Q4 (or not indexed in JCR or SJR)
AccessibilityStudies available in BASE, Google Scholar and Refseek.Studies not available in BASE, Google Scholar and Refseek.
Research FocusDL techniquesClassical machine learning techniques
Supervised learning approachesUnsupervised or self-supervised learning approaches
Studies focusing on image-based semantic segmentationStudies focusing on video segmentation or synthetic data or other computer vision task

Appendix C. Quality Assessment Checklist

Table A3. Instrument used to evaluate the rigor, relevance, presentation of results, and credibility of the selected publications after applying the inclusion and exclusion criteria.
Table A3. Instrument used to evaluate the rigor, relevance, presentation of results, and credibility of the selected publications after applying the inclusion and exclusion criteria.
Judgement CriteriaOptions
1. Scope and Objectives1.1. Is the review scope clearly defined?Yes: The review scope is clearly defined, focusing on DL methods for low-contrast images.
No: The scope is unclear, other or not defined.
1.2. Are the research questions aligned with the objectives?Yes: The research questions align with the defined objectives and address key issues.
No: The questions are not aligned with the objectives or are missing.
1.3. Does the objective of the DL methods explicitly focus on segmenting datasets composed of low-contrast images of real-world applications?Yes: The objective explicitly addresses the segmentation of low-contrast images using DL methods for real-world applications.
No: The objective does not explicitly address the segmentation of datasets composed of low-contrast images using DL methods or focuses on other topics related to datasets composed of low-contrast images for real-world applications.
2. Methodology2.1. Are the method described in sufficient detail?Yes: The methods are well-described, highlighting key components and relevance to low-contrast images.
No: The architectures are not described or descriptions are incomplete.
2.2. Are the methods compared with deep learning state-of-the-art models?Yes: The methods are adequately compared with state-of-the-art models, with results clearly contextualized.
No: No comparison is made.
2.3 Are the selected datasets suitable for the supervised training of the models?Yes: The selected datasets are suitable for the supervised training of the models.
No: The selected datasets are not suitable for the supervised training of the models.
3. Performance Evaluation3.1. Does the review discuss performance?Yes: Performance in low-contrast scenarios is clearly discussed with relevant metrics and analysis.
No: This aspect is not addressed.
3.2. Are standard semantic segmentation indexes used to evaluate the performance of the method?Yes: Standard indexes are used for evaluation.
No: Standard indexes are not used for evaluation.
4. Challenges4.1. Are the proposed future directions related to the method?Yes: Clear and relevant future directions are proposed, considering current challenges.
No: No future directions are proposed or are vague.
4.2. Are the limitations of the method discussed?Yes: Limitations are discussed.
No: No limitations are discussed or are vague.
VerdictHigh QualityAll judgments are Yes.
Low QualityAt least one of the responses is No.

Appendix D. Publications Excluded After Quality Assessment

Table A4. Classification of the excluded publications based on the judgment criteria number they did not meet from quality assessment checklist.
Table A4. Classification of the excluded publications based on the judgment criteria number they did not meet from quality assessment checklist.
PublicationsJudgment Criteria Number
[107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147]1.3
[148]2.1
[149]3.2
[15,150,151,152,153,154,155,156,157,158,159,160,161,162,163]4.1
[164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184]4.2

Appendix E. Public Dataset for Low-Contrast Semantic Segmentation

Table A5. Characteristics of some public datasets used by the methods in the reviewed publications.
Table A5. Characteristics of some public datasets used by the methods in the reviewed publications.
DatasetRef.Utilized bySegmentation Performance (%)Image ResolutionImage CountSegmented Object
AITEX[56][29]82.56 (mIoU)4096 × 256245Textile defect
MT[57][29]83.75 (mIoU)219 × 2641344Magnetic tile surface defects
TN3K[58][36]89.5 (DSC)421 × 3453493Thyroid nodules
CAMUS[59][36]92.2 (DSC)512 × 5122000Breast lesion
SCD[60][37]92.39 (DSC)512 × 512805Left ventricle
ISIC 2016[61][38]93.7 (DSC)1504 × 11291279Skin Lesion
CVC-ClinicDB[52][27]95.82 (DSC)384 × 288612Colon polyps
ISPRS-Potsdam[62][34]93.29 (DSC)6000 × 600038City objects
Nightcity[63][32]57.91 (mIoU)1024 × 5124297City objects
DRIVE[64][42]86.5 (DSC)565 x 68440Retinal Vessels

Appendix F. Implementation Details of Reviewed Methods

Table A6. Key parameters for the implementation of the reviewed methods. In ‘Dataset division’, three values indicate the order training/validation/test, while two values represent training/validation. Minimum training set size refers to the number of samples used for training from the dataset with the fewest data. CVa indicates the cross-validation method, with the accompanying number representing the number of folds implemented. DAg: data augmentation; MF: Median Filter; LI: Linear Interpolation; CED: Canny Edge Detection; WD: Wavelet Decomposition; NR: Not Reported; NA: Not Applicable.
Table A6. Key parameters for the implementation of the reviewed methods. In ‘Dataset division’, three values indicate the order training/validation/test, while two values represent training/validation. Minimum training set size refers to the number of samples used for training from the dataset with the fewest data. CVa indicates the cross-validation method, with the accompanying number representing the number of folds implemented. DAg: data augmentation; MF: Median Filter; LI: Linear Interpolation; CED: Canny Edge Detection; WD: Wavelet Decomposition; NR: Not Reported; NA: Not Applicable.
Ref.NameData PreprocessingDataset
Division (%)
Minimum Training Set SizeNvidia GPU ModelVRAM
[28]CASDDDag + GAN80/15/5750GeForce RTX 20606
[29]EMRANETNA80/20145GeForce GTX 16606
[30]PCTNetDAg80/20685 GeForce RTX 309024
[31]STCNet IIDAg80/10/10800GeForce RTX 3080Ti12
[18]NRMF + LI70/15/152205GeForce rtx 2080 ti11
[32]RNightSegDAg70/30320GeForce RTX 309024
[33]FAM-CRFSNDAgNRNRGTX 1650 Ti16
[34]GVANetDAg60/4019GeForce RTX 2080 Ti11
[27]LCSeg-NetDag + CED80/10/10489GeForce RTX 309024
[35]SmokeSegerDAg80/20332Tesla V10016
[36]HEAT-NetDAg5 CVa33Tesla V10016
[37]BSANetDAg70/10/2032GeForce RTX 309024
[38]CPCANetDAg70/10/20630GeForce RTX 309024
[39]CoVi-NetDAg10 CVa18GeForce RTX 309024
[40]CSwin-PNetNA60/20/20754GeForce RTX 308010
[41]FDR-TransUNetDAg60/20/201750GeForce RTX 309024
[42]GT-DLA-dsHFFDAg4CVa15Tesla V-10016
[43]H2FormerDAg70/10/2020GeForce RTX 309024
[44]LightCM-PNetIN5 CVa88Quadro RTX600024
[45]TBNetDag + CED5 CVa24GeForce TITAN XP12
[46]Ms REDDAg80/2080GeForce RTX 2080Ti11
[47]SWTRUEBC70/20/101815GeForce RTX 309024
[48]TD-NetWD75/2580GeForce RTX309024
[49]PPLDAg80/203712Tesla V10032
[50]U-NTCANA5 CVa108GeForce RTX309024

Appendix G. Training Details of Reviewed Methods

Table A7. Key parameters for training the reviewed methods: CE: Cross Entropy; mIoU: mean Intersection over Union; DSC: Dice Similarity Index; BL: boundary oss; SSIM: Structural Similarity Index; SGD: Stochastic Gradient Descent; RMSProp: Root Mean Square Propagation; A: Applicable; NA: Not Applicable.
Table A7. Key parameters for training the reviewed methods: CE: Cross Entropy; mIoU: mean Intersection over Union; DSC: Dice Similarity Index; BL: boundary oss; SSIM: Structural Similarity Index; SGD: Stochastic Gradient Descent; RMSProp: Root Mean Square Propagation; A: Applicable; NA: Not Applicable.
Ref.NameLoss FunctionMaximum Learning Rate (10−3)Batch SizeEpochDeep
Supervision
Optimization AlgorithmsPretrained Model
[28]CASDDCE0.1NRNRNAAdamNA
[29]EMRANETmIoU0.54100AAdamA
[30]PCTNetCE0.0452NRNAAdamA
[31]STCNet IICE + DSC0.110NRAAdamNA
[18]NRCE + SSIM0.116120NAAdamNA
[32]RNightSegCE0.061680,000NAAdamA
[33]FAM-CRFSNCE + mIoU + SSIM0.1NR50AAdamNA
[34]GVANetCE + DSC0.6NR105NAAdamA
[27]LCSeg-NetCE + DSC + BL0.1850AAdamA
[35]SmokeSegermIoU11240,000ASGDA
[36]HEAT-NetCE + DSC14500NANRNA
[37]BSANetDSC18200NAAdamNA
[38]CPCANetCE + DSC533250NAAdamNA
[39]CoVi-NetCE14150NAAdamNA
[40]CSwin-PNetCE + DSC0.14200AAdamA
[41]FDR-TransUNetCE0.34100AAdamNA
[42]GT-DLA-dsHFFCE12100NAAdamNA
[43]H2FormerCE + DSC0.11890NAAdamA
[44]LightCM-PNetCE + DSC + BL18100NAAdamNA
[45]TBNetCE0.21100NARMSPropNA
[46]Ms REDCE + DSC1NR250NAAdamNA
[47]SWTRUCE0.416200NARMSPropNA
[48]TD-NetCE + DSC0.1830AAdamNA
[49]PPLBL0.2850NAAdamNA
[50]U-NTCACE + DSC533250NAAdamNA

References

  1. Lei, T.; Nandi, A.K. Image Segmentation; Wiley: Hoboken, NJ, USA, 2022; ISBN 9781119859000. [Google Scholar]
  2. Muhammad, K.; Hussain, T.; Ullah, H.; Ser, J.D.; Rezaei, M.; Kumar, N.; Hijji, M.; Bellavista, P.; de Albuquerque, V.H.C. Vision-Based Semantic Segmentation in Scene Understanding for Autonomous Driving: Recent Achievements, Challenges, and Outlooks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22694–22715. [Google Scholar] [CrossRef]
  3. Garg, S.; Sünderhauf, N.; Dayoub, F.; Morrison, D.; Cosgun, A.; Carneiro, G.; Wu, Q.; Chin, T.-J.; Reid, I.; Gould, S.; et al. Semantics for Robotic Mapping, Perception and Interaction: A Survey. Found. Trends Robot. 2020, 8, 1–224. [Google Scholar] [CrossRef]
  4. Yang, R.; Yu, Y. Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis. Front. Oncol. 2021, 11, 638182. [Google Scholar] [CrossRef]
  5. Dong, C.-Z.; Catbas, F.N. A Review of Computer Vision–Based Structural Health Monitoring at Local and Global Levels. Struct. Health Monit. 2021, 20, 692–743. [Google Scholar] [CrossRef]
  6. Avatavului, C.; Prodan, M. Evaluating Image Contrast: A Comprehensive Review and Comparison of Metrics. J. Inf. Syst. Oper. Manag. 2023, 17, 143–160. [Google Scholar]
  7. Islam, M.M.M.; Kim, J.-M. Vision-Based Autonomous Crack Detection of Concrete Structures Using a Fully Convolutional Encoder–Decoder Network. Sensors 2019, 19, 4251. [Google Scholar] [CrossRef]
  8. Wieland, M.; Martinis, S.; Kiefl, R.; Gstaiger, V. Semantic Segmentation of Water Bodies in Very High-Resolution Satellite and Aerial Images. Remote Sens. Environ. 2023, 287, 113452. [Google Scholar] [CrossRef]
  9. Peng, Y.; Wang, A.; Liu, J.; Faheem, M. A Comparative Study of Semantic Segmentation Models for Identification of Grape with Different Varieties. Agriculture 2021, 11, 997. [Google Scholar] [CrossRef]
  10. Javed, R.; Shafry Mohd Rahim, M.; Saba, T.; Mohamed Fati, S.; Rehman, A.; Tariq, U. Statistical Histogram Decision-Based Contrast Categorization of Skin Lesion Datasets Dermoscopic Images. Comput. Mater. Contin. 2021, 67, 2337–2352. [Google Scholar] [CrossRef]
  11. Xu, Y.; Dang, H.; Tang, L. KACM: A KIS-Awared Active Contour Model for Low-Contrast Image Segmentation. Expert Syst. Appl. 2024, 255, 124767. [Google Scholar] [CrossRef]
  12. Zhu, X.; Cheng, Z.; Wang, S.; Chen, X.; Lu, G. Coronary Angiography Image Segmentation Based on PSPNet. Comput. Methods Programs Biomed. 2021, 200, 105897. [Google Scholar] [CrossRef]
  13. Liu, Y.; Li, H.; Hu, C.; Luo, S.; Luo, Y.; Chen, C.W. Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 595–609. [Google Scholar] [CrossRef] [PubMed]
  14. Usamentiaga, R.; Lema, D.G.; Pedrayes, O.D.; Garcia, D.F. Automated Surface Defect Detection in Metals: A Comparative Review of Object Detection and Semantic Segmentation Using Deep Learning. IEEE Trans. Ind. Appl. 2022, 58, 4203–4213. [Google Scholar] [CrossRef]
  15. Wang, H.; Chen, Y.; Cai, Y.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. SFNet-N: An Improved SFNet Algorithm for Semantic Segmentation of Low-Light Autonomous Driving Road Scenes. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21405–21417. [Google Scholar] [CrossRef]
  16. Leichter, A.; Almeev, R.R.; Wittich, D.; Beckmann, P.; Rottensteiner, F.; Holtz, F.; Sester, M. Automated Segmentation of Olivine Phenocrysts in a Volcanic Rock Thin Section Using a Fully Convolutional Neural Network. Front. Earth Sci. 2022, 10, 740638. [Google Scholar] [CrossRef]
  17. Liu, Y.; Yu, J.; Han, Y. Understanding the Effective Receptive Field in Semantic Image Segmentation. Multimed. Tools Appl. 2018, 77, 22159–22171. [Google Scholar] [CrossRef]
  18. Wang, Z.; Zhang, S.; Gross, L.; Zhang, C.; Wang, B. Fused Adaptive Receptive Field Mechanism and Dynamic Multiscale Dilated Convolution for Side-Scan Sonar Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 3201248. [Google Scholar] [CrossRef]
  19. Zhou, S.; Nie, D.; Adeli, E.; Yin, J.; Lian, J.; Shen, D. High-Resolution Encoder–Decoder Networks for Low-Contrast Medical Image Segmentation. IEEE Trans. Image Process. 2020, 29, 461–475. [Google Scholar] [CrossRef]
  20. Gao, Y.; Zhou, M.; Metaxas, D.N. UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation. Proc. Med. Image Comput. Comput. Assist. Interv. 2021, LNCS, 61–71. [Google Scholar]
  21. Carrera-Rivera, A.; Ochoa, W.; Larrinaga, F.; Lasa, G. How-to Conduct a Systematic Literature Review: A Quick Guide for Computer Science Research. MethodsX 2022, 9, 101895. [Google Scholar] [CrossRef]
  22. Khan, K.S.; Kunz, R.; Kleijnen, J.; Antes, G. Five Steps to Conducting a Systematic Review. J. R. Soc. Med. 2003, 96, 118–121. [Google Scholar] [CrossRef] [PubMed]
  23. Yang, L.; Zhang, H.; Shen, H.; Huang, X.; Zhou, X.; Rong, G.; Shao, D. Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective. Inf. Softw. Technol. 2021, 130, 106397. [Google Scholar] [CrossRef]
  24. Yaska, M.; Nuhu, B.M. Assessment of Measures of Central Tendency and Dispersion Using Likert-Type Scale. Afr. J. Adv. Sci. Technol. Res. 2024, 16, 33–45. [Google Scholar] [CrossRef]
  25. Liu, Y.P.; Zhong, Q.; Liang, R.; Li, Z.; Wang, H.; Chen, P. Layer Segmentation of OCT Fingerprints with an Adaptive Gaussian Prior Guided Transformer. IEEE Trans. Instrum. Meas. 2022, 71, 3212113. [Google Scholar] [CrossRef]
  26. Zhang, L.; Peng, J.; Liu, W.; Yuan, H.; Tan, S.; Wang, L.; Yi, F. A Semantic Fusion Based Approach for Express Bill Detection in Complex Scenes. Image Vis. Comput. 2023, 135, 104708. [Google Scholar] [CrossRef]
  27. Yuan, H.; Peng, J. LCSeg-Net: A Low-Contrast Images Semantic Segmentation Model with Structural and Frequency Spectrum Information. Pattern Recognit. 2024, 151, 110428. [Google Scholar] [CrossRef]
  28. Tian, S.; Huang, P.; Ma, H.; Wang, J.; Zhou, X.; Zhang, S.; Zhou, J.; Huang, R.; Li, Y. CASDD: Automatic Surface Defect Detection Using a Complementary Adversarial Network. IEEE Sens. J. 2022, 22, 19583–19595. [Google Scholar] [CrossRef]
  29. Lin, Q.; Zhou, J.; Ma, Q.; Ma, Y.; Kang, L.; Wang, J. EMRA-Net: A Pixel-Wise Network Fusing Local and Global Features for Tiny and Low-Contrast Surface Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 3151926. [Google Scholar] [CrossRef]
  30. Wu, Y.; Li, S.; Zhang, J.; Li, Y.; Li, Y.; Zhang, Y. Dual Attention Transformer Network for Pixel-Level Concrete Crack Segmentation Considering Camera Placement. Autom. Constr. 2024, 157, 105166. [Google Scholar] [CrossRef]
  31. Ye, W.; Ren, J.; Zhang, A.A.; Lu, C. Automatic Pixel-Level Crack Detection with Multi-Scale Feature Fusion for Slab Tracks. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2648–2665. [Google Scholar] [CrossRef]
  32. Sun, Z.; Zhu, H.; Xiao, X.; Gu, Y.; Xu, Y. Nighttime Image Semantic Segmentation with Retinex Theory. Image Vis. Comput. 2024, 148, 105149. [Google Scholar] [CrossRef]
  33. Chuanmeng, S.; Xinyu, L.; Jiaxin, C.; Zhibo, W.; Yong, L. Coal-Rock Image Recognition Method for Complex and Harsh Environment in Coal Mine Using Deep Learning Models. IEEE Access 2023, 11, 80794–80805. [Google Scholar] [CrossRef]
  34. Yang, Y.; Li, J.; Chen, Z.; Ren, L. GVANet: A Grouped Multi-View Aggregation Network for Remote Sensing Image Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 16727–16743. [Google Scholar] [CrossRef]
  35. Jing, T.; Meng, Q.-H.; Hou, H.-R. SmokeSeger: A Transformer-CNN Coupled Model for Urban Scene Smoke Segmentation. IEEE Trans. Ind. Inform. 2024, 20, 1385–1396. [Google Scholar] [CrossRef]
  36. Jiang, T.; Xing, W.; Yu, M.; Ta, D. A Hybrid Enhanced Attention Transformer Network for Medical Ultrasound Image Segmentation. Biomed. Signal Process. Control 2023, 86, 105329. [Google Scholar] [CrossRef]
  37. Zhang, D.; Lu, C.; Tan, T.; Dashtbozorg, B.; Long, X.; Xu, X.; Zhang, J.; Shan, C. BSANet: Boundary-Aware and Scale-Aggregation Networks for CMR Image Segmentation. Neurocomputing 2024, 599, 128125. [Google Scholar] [CrossRef]
  38. Huang, H.; Chen, Z.; Zou, Y.; Lu, M.; Chen, C.; Song, Y.; Zhang, H.; Yan, F. Channel Prior Convolutional Attention for Medical Image Segmentation. Comput. Biol. Med. 2024, 178, 108784. [Google Scholar] [CrossRef] [PubMed]
  39. Jiang, M.; Zhu, Y.; Zhang, X. CoVi-Net: A Hybrid Convolutional and Vision Transformer Neural Network for Retinal Vessel Segmentation. Comput. Biol. Med. 2024, 170, 108047. [Google Scholar] [CrossRef]
  40. Yang, H.; Yang, D. CSwin-PNet: A CNN-Swin Transformer Combined Pyramid Network for Breast Lesion Segmentation in Ultrasound Images. Expert Syst. Appl. 2023, 213, 119024. [Google Scholar] [CrossRef]
  41. Chaoyang, Z.; Shibao, S.; Wenmao, H.; Pengcheng, Z. FDR-TransUNet: A Novel Encoder-Decoder Architecture with Vision Transformer for Improved Medical Image Segmentation. Comput. Biol. Med. 2024, 169, 107858. [Google Scholar] [CrossRef]
  42. Li, Y.; Zhang, Y.; Liu, J.Y.; Wang, K.; Zhang, K.; Zhang, G.S.; Liao, X.F.; Yang, G. Global Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion for Retinal Vessel Segmentation. IEEE Trans. Cybern. 2023, 53, 5826–5839. [Google Scholar] [CrossRef] [PubMed]
  43. He, A.; Wang, K.; Li, T.; Du, C.; Xia, S.; Fu, H. H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation. IEEE Trans. Med. Imaging 2023, 42, 2763–2775. [Google Scholar] [CrossRef]
  44. Wang, W.; Pan, B.; Ai, Y.; Li, G.; Fu, Y.; Liu, Y. LightCM-PNet: A Lightweight Pyramid Network for Real-Time Prostate Segmentation in Transrectal Ultrasound. Pattern Recognit. 2024, 156, 110776. [Google Scholar] [CrossRef]
  45. Zhang, Y.; Xi, R.; Wang, W.; Li, H.; Hu, L.; Lin, H.; Towey, D.; Bai, R.; Fu, H.; Higashita, R.; et al. Low-Contrast Medical Image Segmentation via Transformer and Boundary Perception. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2297–2309. [Google Scholar] [CrossRef]
  46. Dai, D.; Dong, C.; Xu, S.; Yan, Q.; Li, Z.; Zhang, C.; Luo, N. Ms RED: A Novel Multi-Scale Residual Encoding and Decoding Network for Skin Lesion Segmentation. Med. Image Anal. 2022, 75, 102293. [Google Scholar] [CrossRef]
  47. Zhang, J.; Liu, Y.; Wu, Q.; Wang, Y.; Liu, Y.; Xu, X.; Song, B. SWTRU: Star-Shaped Window Transformer Reinforced U-Net for Medical Image Segmentation. Comput. Biol. Med. 2022, 150, 105954. [Google Scholar] [CrossRef]
  48. Dai, S.; Zhu, Y.; Jiang, X.; Yu, F.; Lin, J.; Yang, D. TD-Net: Trans-Deformer Network for Automatic Pancreas Segmentation. Neurocomputing 2023, 517, 279–293. [Google Scholar] [CrossRef]
  49. Zhang, H.; Gao, Z.; Zhang, D.; Hau, W.K.; Zhang, H. Progressive Perception Learning for Main Coronary Segmentation in X-Ray Angiography. IEEE Trans. Med. Imaging 2023, 42, 864–879. [Google Scholar] [CrossRef]
  50. Zhang, D.; Zhang, J.; Li, S.; Dong, Z.; Zheng, Q.; Zhang, J. U-NTCA: NnUNet and Nested Transformer with Channel Attention for Corneal Cell Segmentation. Front. Neurosci. 2024, 18, 1363288. [Google Scholar] [CrossRef]
  51. Shah, A.; Rojas, C.A. Imaging Modalities (MRI, CT, PET/CT), Indications, Differential Diagnosis and Imaging Characteristics of Cystic Mediastinal Masses: A Review. Mediastinum 2023, 7, 3. [Google Scholar] [CrossRef]
  52. Grand Challenges Sub-Challenge: Automatic Polyp Detection in Colonoscopy Videos—CVC-ClinicDB. Available online: https://polyp.grand-challenge.org/CVCClinicDB/ (accessed on 14 December 2024).
  53. Chen, Y.; Ding, Y.; Zhao, F.; Zhang, E.; Wu, Z.; Shao, L. Surface Defect Detection Methods for Industrial Products: A Review. Appl. Sci. 2021, 11, 7657. [Google Scholar] [CrossRef]
  54. Emek Soylu, B.; Guzel, M.S.; Bostanci, G.E.; Ekinci, F.; Asuroglu, T.; Acici, K. Deep-Learning-Based Approaches for Semantic Segmentation of Natural Scene Images: A Review. Electronics 2023, 12, 2730. [Google Scholar] [CrossRef]
  55. Bouguettaya, A.; Zarzour, H.; Taberkit, A.M.; Kechida, A. A Review on Early Wildfire Detection from Unmanned Aerial Vehicles Using Deep Learning-Based Computer Vision Algorithms. Signal Process. 2022, 190, 108309. [Google Scholar] [CrossRef]
  56. Silvestre-Blanes, J.; Albero-Albero, T.; Miralles, I.; Pérez-Llorens, R.; Moreno, J. A Public Fabric Database for Defect Detection Methods and Results. Autex Res. J. 2019, 19, 363–374. [Google Scholar] [CrossRef]
  57. Huang, Y.; Qiu, C.; Guo, Y.; Wang, X.; Yuan, K. Surface Defect Saliency of Magnetic Tile. In Proceedings of the 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany, 20–24 August 2018; pp. 612–617. [Google Scholar]
  58. Gong, H.; Chen, J.; Chen, G.; Li, H.; Li, G.; Chen, F. Thyroid Region Prior Guided Attention for Ultrasound Segmentation of Thyroid Nodules. Comput. Biol. Med. 2023, 155, 106389. [Google Scholar] [CrossRef]
  59. Leclerc, S.; Smistad, E.; Pedrosa, J.; Ostvik, A.; Cervenansky, F.; Espinosa, F.; Espeland, T.; Berg, E.A.R.; Jodoin, P.-M.; Grenier, T.; et al. Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography. IEEE Trans. Med. Imaging 2019, 38, 2198–2210. [Google Scholar] [CrossRef]
  60. Cardiac Atlas Project Sunnybrook Cardiac Data. Available online: https://www.cardiacatlas.org/sunnybrook-cardiac-data/ (accessed on 14 December 2024).
  61. ISIC Challenge ISIC Challenge Datasets. Available online: https://challenge.isic-archive.com/data/ (accessed on 14 December 2024).
  62. International Society for Photogrammetry and Remote Sensing (ISPRS). 2D Semantic Labeling Contest—Potsdam. Available online: https://www.isprs.org/education/benchmarks/UrbanSemLab/2d-sem-label-potsdam.aspx (accessed on 14 December 2024).
  63. Tan, X.; Xu, K.; Cao, Y.; Zhang, Y.; Ma, L.; Lau, R.W.H. Night-Time Scene Parsing With a Large Real Dataset. IEEE Trans. Image Process. 2021, 30, 9085–9098. [Google Scholar] [CrossRef] [PubMed]
  64. DRIVE: Digital Retinal Images for Vessel Extraction. Available online: https://drive.grand-challenge.org/ (accessed on 14 December 2024).
  65. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
  66. Ehab, W.; Huang, L.; Li, Y. UNet and Variants for Medical Image Segmentation. Int. J. Netw. Dyn. Intell. 2024, 3, 100009. [Google Scholar] [CrossRef]
  67. Liu, F.; Fang, M. Semantic Segmentation of Underwater Images Based on Improved Deeplab. J. Mar. Sci. Eng. 2020, 8, 188. [Google Scholar] [CrossRef]
  68. Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. NnU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
  69. Yang, C.; Wu, L.; Chen, Y.; Wang, G.; Weng, G. An Active Contour Model Based on Retinex and Pre-Fitting Reflectance for Fast Image Segmentation. Symmetry 2022, 14, 2343. [Google Scholar] [CrossRef]
  70. Kim, B.J.; Choi, H.; Jang, H.; Lee, D.G.; Jeong, W.; Kim, S.W. Dead Pixel Test Using Effective Receptive Field. Pattern Recognit. Lett. 2023, 167, 149–156. [Google Scholar] [CrossRef]
  71. Chen, X.; Li, Z.; Jiang, J.; Han, Z.; Deng, S.; Li, Z.; Fang, T.; Huo, H.; Li, Q.; Liu, M. Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3532–3546. [Google Scholar] [CrossRef]
  72. Loos, V.; Pardasani, R.; Awasthi, N. Demystifying the Effect of Receptive Field Size in U-Net Models for Medical Image Segmentation. J. Med. Imaging 2024, 11, 054004. [Google Scholar] [CrossRef]
  73. Kumar Singh, V.; Abdel-Nasser, M.; Pandey, N.; Puig, D. LungINFseg: Segmenting COVID-19 Infected Regions in Lung CT Images Based on a Receptive-Field-Aware Deep Learning Framework. Diagnostics 2021, 11, 158. [Google Scholar] [CrossRef]
  74. Zhao, D.; Wang, C.; Gao, Y.; Shi, Z.; Xie, F. Semantic Segmentation of Remote Sensing Image Based on Regional Self-Attention Mechanism. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8010305. [Google Scholar] [CrossRef]
  75. Ferdaus, M.M.; Abdelguerfi, M.; Niles, K.N.; Pathak, K.; Tom, J. Widened Attention-Enhanced Atrous Convolutional Network for Efficient Embedded Vision Applications under Resource Constraints. Adv. Intell. Syst. 2024; early view. [Google Scholar] [CrossRef]
  76. Pan, B.; Xu, X.; Shi, Z.; Zhang, N.; Luo, H.; Lan, X. DSSNet: A Simple Dilated Semantic Segmentation Network for Hyperspectral Imagery Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1968–1972. [Google Scholar] [CrossRef]
  77. Chen, F.; Wu, F.; Xu, J.; Gao, G.; Ge, Q.; Jing, X.-Y. Adaptive Deformable Convolutional Network. Neurocomputing 2021, 453, 853–864. [Google Scholar] [CrossRef]
  78. Hassanin, M.; Anwar, S.; Radwan, I.; Khan, F.S.; Mian, A. Visual Attention Methods in Deep Learning: An in-Depth Survey. Inf. Fusion 2024, 108, 102417. [Google Scholar] [CrossRef]
  79. Li, Y.; Liang, M.; Wei, M.; Wang, G.; Li, Y. Mechanisms and Applications of Attention in Medical Image Segmentation: A Review. Acad. J. Sci. Technol. 2023, 5, 237–243. [Google Scholar] [CrossRef]
  80. Zhong, Z.; Lin, Z.Q.; Bidart, R.; Hu, X.; Daya, I.B.; Li, Z.; Zheng, W.-S.; Li, J.; Wong, A. Squeeze-and-Attention Networks for Semantic Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13062–13071. [Google Scholar]
  81. Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. A Simple and Light-Weight Attention Module for Convolutional Neural Networks. Int. J. Comput. Vis. 2020, 128, 783–798. [Google Scholar] [CrossRef]
  82. Ye, Y.; Chen, Y.; Wang, R.; Zhu, D.; Huang, Y.; Huang, Y.; Liu, J.; Chen, Y.; Shi, J.; Ding, B.; et al. Image Segmentation Using Improved U-Net Model and Convolutional Block Attention Module Based on Cardiac Magnetic Resonance Imaging. J. Radiat. Res. Appl. Sci. 2024, 17, 100816. [Google Scholar] [CrossRef]
  83. Papa, L.; Russo, P.; Amerini, I.; Zhou, L. A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7682–7700. [Google Scholar] [CrossRef] [PubMed]
  84. Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 548–558. [Google Scholar]
  85. Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-Mixer: An All-Mlp Architecture for Vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
  86. Krithika alias AnbuDevi, M.; Suganthi, K. Review of Semantic Segmentation of Medical Images Using Modified Architectures of UNET. Diagnostics 2022, 12, 3064. [Google Scholar] [CrossRef] [PubMed]
  87. Fan, X.; Yan, C.; Fan, J.; Wang, N. Improved U-Net Remote Sensing Classification Algorithm Fusing Attention and Multiscale Features. Remote Sens. 2022, 14, 3591. [Google Scholar] [CrossRef]
  88. Wang, H.; Cao, P.; Yang, J.; Zaiane, O. Narrowing the Semantic Gaps in U-Net with Learnable Skip Connections: The Case of Medical Image Segmentation. Neural Networks 2024, 178, 106546. [Google Scholar] [CrossRef]
  89. Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 1856–1867. [Google Scholar] [CrossRef]
  90. Maharana, K.; Mondal, S.; Nemade, B. A Review: Data Pre-Processing and Data Augmentation Techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
  91. Pious, I.K.; Srinivasan, R. Segnet Unveiled: Robust Image Segmentation via Rigorous K-Fold Cross-Validation Analysis. Technol. Health Care 2024, 33, 863–876. [Google Scholar] [CrossRef] [PubMed]
  92. Li, X.; Yu, L.; Chang, D.; Ma, Z.; Cao, J. Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification. IEEE Trans. Veh. Technol. 2019, 68, 4204–4212. [Google Scholar] [CrossRef]
  93. Yaqub, M.; Feng, J.; Zia, M.; Arshid, K.; Jia, K.; Rehman, Z.; Mehmood, A. State-of-the-Art CNN Optimizer for Brain Tumor Segmentation in Magnetic Resonance Images. Brain Sci. 2020, 10, 427. [Google Scholar] [CrossRef] [PubMed]
  94. Guo, M.-H.; Liu, Z.-N.; Mu, T.-J.; Liang, D.; Martin, R.R.; Hu, S.-M. Can Attention Enable MLPs to Catch up with CNNs? Comput. Vis. Media 2021, 7, 283–288. [Google Scholar] [CrossRef]
  95. Panella, F.; Lipani, A.; Boehm, J. Semantic Segmentation of Cracks: Data Challenges and Architecture. Autom. Constr. 2022, 135, 104110. [Google Scholar] [CrossRef]
  96. Xun, S.; Li, D.; Zhu, H.; Chen, M.; Wang, J.; Li, J.; Chen, M.; Wu, B.; Zhang, H.; Chai, X.; et al. Generative Adversarial Networks in Medical Image Segmentation: A Review. Comput. Biol. Med. 2022, 140, 105063. [Google Scholar] [CrossRef]
  97. Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
  98. Alomar, K.; Aysel, H.I.; Cai, X. Data Augmentation in Classification and Segmentation: A Survey and New Strategies. J. Imaging 2023, 9, 46. [Google Scholar] [CrossRef]
  99. Archit, A.; Pape, P. ViM-UNet: Vision Mamba for Biomedical Segmentation. In Proceedings of the Medical Imaging with Deep Learning, Paris, France, 2–4 July 2024. [Google Scholar]
  100. Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical Image Segmentation Using Deep Learning: A Survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
  101. Baid, U.; Talbar, S.; Talbar, S. Comparative Study of K-Means, Gaussian Mixture Model, Fuzzy C-Means Algorithms for Brain Tumor Segmentation. In Proceedings of the International Conference on Communication and Signal Processing 2016 (ICCASP 2016), Melmaruvathur, India, 6–8 April 2016; Atlantis Press: Paris, France, 2017. [Google Scholar]
  102. Rezaie, A.; Achanta, R.; Godio, M.; Beyer, K. Comparison of Crack Segmentation Using Digital Image Correlation Measurements and Deep Learning. Constr. Build. Mater. 2020, 261, 120474. [Google Scholar] [CrossRef]
  103. Xu, Z.; Wang, Y.; Chen, M.; Zhang, Q. Multi-Region Radiomics for Artificially Intelligent Diagnosis of Breast Cancer Using Multimodal Ultrasound. Comput. Biol. Med. 2022, 149, 105920. [Google Scholar] [CrossRef]
  104. Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment Anything in Medical Images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
  105. Cartolovni, A.; Tomičić, A.; Lazić Mosler, E. Ethical, Legal, and Social Considerations of AI-Based Medical Decision-Support Tools: A Scoping Review. Int. J. Med. Inform. 2022, 161, 104738. [Google Scholar] [CrossRef] [PubMed]
  106. Cunneen, M.; Mullins, M.; Murphy, F.; Shannon, D.; Furxhi, I.; Ryan, C. Autonomous Vehicles and Avoiding the Trolley (Dilemma): Vehicle Perception, Classification, and the Challenges of Framing Decision Ethics. Cybern. Syst. 2020, 51, 59–80. [Google Scholar] [CrossRef]
  107. Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; AL-Jarazi, R.; Zhai, D. A Hybrid Deep Learning Pavement Crack Semantic Segmentation. Eng. Appl. Artif. Intell. 2023, 122, 106142. [Google Scholar] [CrossRef]
  108. Li, Y.; Zhang, W.; Liu, Y.; Shao, X. A Lightweight Network for Real-Time Smoke Semantic Segmentation Based on Dual Paths. Neurocomputing 2022, 501, 258–269. [Google Scholar] [CrossRef]
  109. Hu, X.; Jiang, F.; Qin, X.; Huang, S.; Yang, X.; Meng, F. An Optimized Smoke Segmentation Method for Forest and Grassland Fire Based on the UNet Framework. Fire 2024, 7, 68. [Google Scholar] [CrossRef]
  110. Zhou, Q.; Situ, Z.; Teng, S.; Liu, H.; Chen, W.; Chen, G. Automatic Sewer Defect Detection and Severity Quantification Based on Pixel-Level Semantic Segmentation. Tunn. Undergr. Space Technol. 2022, 123, 104403. [Google Scholar] [CrossRef]
  111. Li, Y.; Ouyang, S.; Zhang, Y. Combining Deep Learning and Ontology Reasoning for Remote Sensing Image Semantic Segmentation. Knowl. Based Syst. 2022, 243, 108469. [Google Scholar] [CrossRef]
  112. Wang, R.; Zheng, G. CyCMIS: Cycle-Consistent Cross-Domain Medical Image Segmentation via Diverse Image Augmentation. Med. Image Anal. 2022, 76, 102328. [Google Scholar] [CrossRef]
  113. Muksimova, S.; Mardieva, S.; Cho, Y.I. Deep Encoder–Decoder Network-Based Wildfire Segmentation Using Drone Images in Real-Time. Remote Sens. 2022, 14, 6302. [Google Scholar] [CrossRef]
  114. Ling, Z.; Zhang, A.; Ma, D.; Shi, Y.; Wen, H. Deep Siamese Semantic Segmentation Network for PCB Welding Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 3154814. [Google Scholar] [CrossRef]
  115. Fu, Y.; Gao, M.; Xie, G.; Hu, M.; Wei, C.; Ding, R. Density-Aware U-Net for Unstructured Environment Dust Segmentation. IEEE Sens. J. 2024, 24, 8210–8226. [Google Scholar] [CrossRef]
  116. Priyanka; Sravya, N.; Lal, S.; Nalini, J.; Reddy, C.S.; Dell’Acqua, F. DIResUNet: Architecture for Multiclass Semantic Segmentation of High Resolution Remote Sensing Imagery Data. Appl. Intell. 2022, 52, 15462–15482. [Google Scholar] [CrossRef]
  117. Pan, Y.; Zhang, L. Dual Attention Deep Learning Network for Automatic Steel Surface Defect Segmentation. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 1468–1487. [Google Scholar] [CrossRef]
  118. Hu, Y.; Zhan, J.; Zhou, G.; Chen, A.; Cai, W.; Guo, K.; Hu, Y.; Li, L. Fast Forest Fire Smoke Detection Using MVMNet. Knowl. Based Syst. 2022, 241, 108219. [Google Scholar] [CrossRef]
  119. Liang, H.; Zheng, C.; Liu, X.; Tian, Y.; Zhang, J.; Cui, W. Super-Resolution Reconstruction of Remote Sensing Data Based on Multiple Satellite Sources for Forest Fire Smoke Segmentation. Remote Sens. 2023, 15, 4180. [Google Scholar] [CrossRef]
  120. Wang, P.; Shi, G. Image Segmentation of Tunnel Water Leakage Defects in Complex Environments Using an Improved Unet Model. Sci. Rep. 2024, 14, 24286. [Google Scholar] [CrossRef]
  121. Xiong, Y.; Xiao, X.; Yao, M.; Cui, H.; Fu, Y. Light4Mars: A Lightweight Transformer Model for Semantic Segmentation on Unstructured Environment like Mars. ISPRS J. Photogramm. Remote Sens. 2024, 214, 167–178. [Google Scholar] [CrossRef]
  122. Li, Z.; Li, Y.; Li, Q.; Wang, P.; Guo, D.; Lu, L.; Jin, D.; Zhang, Y.; Hong, Q. LViT: Language Meets Vision Transformer in Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 96–107. [Google Scholar] [CrossRef]
  123. Hu, K.; Zhang, E.; Xia, M.; Weng, L.; Lin, H. MCANet: A Multi-Branch Network for Cloud/Snow Segmentation in High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 1055. [Google Scholar] [CrossRef]
  124. Gao, G.; Xu, G.; Yu, Y.; Xie, J.; Yang, J.; Yue, D. MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25489–25499. [Google Scholar] [CrossRef]
  125. Ding, L.; Xia, M.; Lin, H.; Hu, K. Multi-Level Attention Interactive Network for Cloud and Snow Detection Segmentation. Remote Sens. 2024, 16, 112. [Google Scholar] [CrossRef]
  126. Yang, L.; Bai, S.; Liu, Y.; Yu, H. Multi-Scale Triple-Attention Network for Pixelwise Crack Segmentation. Autom. Constr. 2023, 150, 104853. [Google Scholar] [CrossRef]
  127. Zheng, C.; Liu, L.; Meng, Y.; Wang, M.; Jiang, X. Passable Area Segmentation for Open-Pit Mine Road from Vehicle Perspective. Eng. Appl. Artif. Intell. 2024, 129, 107610. [Google Scholar] [CrossRef]
  128. Liu, H.; Yao, M.; Xiao, X.; Xiong, Y. RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3235525. [Google Scholar] [CrossRef]
  129. Su, Y.; Cheng, J.; Bai, H.; Liu, H.; He, C. Semantic Segmentation of Very-High-Resolution Remote Sensing Images via Deep Multi-Feature Learning. Remote Sens. 2022, 14, 533. [Google Scholar] [CrossRef]
  130. Zhang, L.; Wu, J.; Yuan, F.; Fang, Y. Smoke-Aware Global-Interactive Non-Local Network for Smoke Semantic Segmentation. IEEE Trans. Image Process. 2024, 33, 1175–1187. [Google Scholar] [CrossRef]
  131. Tao, H. Smoke Recognition in Satellite Imagery via an Attention Pyramid Network With Bidirectional Multi-level Multigranularity Feature Aggregation and Gated Fusion. IEEE Internet Things J. 2024, 11, 14047–14057. [Google Scholar] [CrossRef]
  132. Xie, X.; Cai, J.; Wang, H.; Wang, Q.; Xu, J.; Zhou, Y.; Zhou, B. Sparse-Sensing and Superpixel-Based Segmentation Model for Concrete Cracks. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 1769–1784. [Google Scholar] [CrossRef]
  133. Zhang, J.; Qin, Q.; Ye, Q.; Ruan, T. ST-Unet: Swin Transformer Boosted U-Net with Cross-Layer Feature Enhancement for Medical Image Segmentation. Comput. Biol. Med. 2023, 153, 106516. [Google Scholar] [CrossRef]
  134. Ma, J.; Yuan, G.; Guo, C.; Gang, X.; Zheng, M. SW-UNet: A U-Net Fusing Sliding Window Transformer Block with CNN for Segmentation of Lung Nodules. Front. Med. 2023, 10, 1273441. [Google Scholar] [CrossRef]
  135. Lin, X.; Yu, L.; Cheng, K.T.; Yan, Z. The Lighter the Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning. IEEE Trans. Med. Imaging 2023, 42, 2325–2337. [Google Scholar] [CrossRef]
  136. Zhang, Y.; Li, Z.; Nan, N.; Wang, X. TranSegNet: Hybrid CNN-Vision Transformers Encoder for Retina Segmentation of Optical Coherence Tomography. Life 2023, 13, 976. [Google Scholar] [CrossRef] [PubMed]
  137. Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net Architecture Design for Medical Image Segmentation through the Lens of Transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef] [PubMed]
  138. Guo, X.; Lin, X.; Yang, X.; Yu, L.; Cheng, K.T.; Yan, Z. UCTNet: Uncertainty-Guided CNN-Transformer Hybrid Networks for Medical Image Segmentation. Pattern Recognit. 2024, 152, 110491. [Google Scholar] [CrossRef]
  139. Bencević, M.; Habijan, M.; Galić, I.; Babin, D.; Pižurica, A. Understanding Skin Color Bias in Deep Learning-Based Skin Lesion Segmentation. Comput. Methods Programs Biomed. 2024, 245, 108044. [Google Scholar] [CrossRef]
  140. Wang, J.J.; Liu, Y.F.; Nie, X.; Mo, Y.L. Deep Convolutional Neural Networks for Semantic Segmentation of Cracks. Struct. Control Health Monit. 2022, 29, e2850. [Google Scholar] [CrossRef]
  141. Pozzer, S.; De Souza, M.P.V.; Hena, B.; Hesam, S.; Rezayiye, R.K.; Rezazadeh Azar, E.; Lopez, F.; Maldague, X. Effect of Different Imaging Modalities on the Performance of a CNN: An Experimental Study on Damage Segmentation in Infrared, Visible, and Fused Images of Concrete Structures. NDT E Int. 2022, 132, 102709. [Google Scholar] [CrossRef]
  142. Jin, K.; Huang, X.; Zhou, J.; Li, Y.; Yan, Y.; Sun, Y.; Zhang, Q.; Wang, Y.; Ye, J. FIVES: A Fundus Image Dataset for Artificial Intelligence Based Vessel Segmentation. Sci Data 2022, 9, 475. [Google Scholar] [CrossRef]
  143. Zheng, J.; Tang, C.; Sun, Y. Thresholding-Accelerated Convolutional Neural Network for Aeroengine Turbine Blade Segmentation. Expert Syst. Appl. 2024, 238, 122387. [Google Scholar] [CrossRef]
  144. Song, L.; Sun, H.; Liu, J.; Yu, Z.; Cui, C. Automatic Segmentation and Quantification of Global Cracks in Concrete Structures Based on Deep Learning. Measurement 2022, 199, 111550. [Google Scholar] [CrossRef]
  145. Chen, J.; Liu, Z.; Jin, D.; Wang, Y.; Yang, F.; Bai, X. Light Transport Induced Domain Adaptation for Semantic Segmentation in Thermal Infrared Urban Scenes. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23194–23211. [Google Scholar] [CrossRef]
  146. Ashraf, H.; Waris, A.; Ghafoor, M.F.; Gilani, S.O.; Niazi, I.K. Melanoma Segmentation Using Deep Learning with Test-Time Augmentations and Conditional Random Fields. Sci. Rep. 2022, 12, 3948. [Google Scholar] [CrossRef]
  147. Tani, T.A.; Tešić, J. Advancing Retinal Vessel Segmentation With Diversified Deep Convolutional Neural Networks. IEEE Access 2024, 12, 141280–141290. [Google Scholar] [CrossRef]
  148. Zhang, J.; Guo, W. A New Regularization for Deep Learning-Based Segmentation of Images with Fine Structures and Low Contrast. Sensors 2023, 23, 1887. [Google Scholar] [CrossRef]
  149. Chen, Y.; Gan, H.; Chen, H.; Zeng, Y.; Xu, L.; Heidari, A.A.; Zhu, X.; Liu, Y. Accurate Iris Segmentation and Recognition Using an End-to-End Unified Framework Based on MADNet and DSANet. Neurocomputing 2023, 517, 264–278. [Google Scholar] [CrossRef]
  150. Yang, L.; Fan, J.; Huo, B.; Li, E.; Liu, Y. A Nondestructive Automatic Defect Detection Method with Pixelwise Segmentation. Knowl. Based Syst. 2022, 242, 108338. [Google Scholar] [CrossRef]
  151. Hekal, A.A.; Elnakib, A.; Moustafa, H.E.D.; Amer, H.M. Breast Cancer Segmentation from Ultrasound Images Using Deep Dual-Decoder Technology with Attention Network. IEEE Access 2024, 12, 10087–10101. [Google Scholar] [CrossRef]
  152. Li, S.; Feng, Y.; Xu, H.; Miao, Y.; Lin, Z.; Liu, H.; Xu, Y.; Li, F. CAENet: Contrast Adaptively Enhanced Network for Medical Image Segmentation Based on a Differentiable Pooling Function. Comput. Biol. Med. 2023, 167, 107578. [Google Scholar] [CrossRef]
  153. Luo, Q.; Su, J.; Yang, C.; Gui, W.; Silven, O.; Liu, L. CAT-EDNet: Cross-Attention Transformer-Based Encoder-Decoder Network for Salient Defect Detection of Strip Steel Surface. IEEE Trans. Instrum. Meas. 2022, 71, 3165270. [Google Scholar] [CrossRef]
  154. Lin, J.; Lin, J.; Lu, C.; Chen, H.; Lin, H.; Zhao, B.; Shi, Z.; Qiu, B.; Pan, X.; Xu, Z.; et al. CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation. IEEE Trans. Med. Imaging 2023, 42, 2451–2461. [Google Scholar] [CrossRef] [PubMed]
  155. Yuan, F.; Dong, Z.; Zhang, L.; Xia, X.; Shi, J. Cubic-Cross Convolutional Attention and Count Prior Embedding for Smoke Segmentation. Pattern Recognit. 2022, 131, 108902. [Google Scholar] [CrossRef]
  156. Yang, L.; Gu, Y.; Bian, G.; Liu, Y. DRR-Net: A Dense-Connected Residual Recurrent Convolutional Network for Surgical Instrument Segmentation From Endoscopic Images. IEEE Trans. Med. Robot. Bionics 2022, 4, 696–707. [Google Scholar] [CrossRef]
  157. Xiao, Z.; Zhang, Y.; Deng, Z.; Liu, F. Light3DHS: A Lightweight 3D Hippocampus Segmentation Method Using Multiscale Convolution Attention and Vision Transformer. Neuroimage 2024, 292, 120608. [Google Scholar] [CrossRef] [PubMed]
  158. Radha, K.; Karuna, Y. Modified Depthwise Parallel Attention UNet for Retinal Vessel Segmentation. IEEE Access 2023, 11, 102572–102588. [Google Scholar] [CrossRef]
  159. Feng, K.; Ren, L.; Wang, G.; Wang, H.; Li, Y. SLT-Net: A Codec Network for Skin Lesion Segmentation. Comput. Biol. Med. 2022, 148, 105942. [Google Scholar] [CrossRef]
  160. Li, E.; Zhang, W. Smoke Image Segmentation Algorithm Suitable for Low-Light Scenes. Fire 2023, 6, 217. [Google Scholar] [CrossRef]
  161. Üzen, H.; Türkoğlu, M.; Yanikoglu, B.; Hanbay, D. Swin-MFINet: Swin Transformer Based Multi-Feature Integration Network for Detection of Pixel-Level Surface Defects. Expert Syst. Appl. 2022, 209, 118269. [Google Scholar] [CrossRef]
  162. Fu, Y.; Liu, J.; Shi, J. TSCA-Net: Transformer Based Spatial-Channel Attention Segmentation Network for Medical Images. Comput. Biol. Med. 2024, 170, 107938. [Google Scholar] [CrossRef]
  163. Banerjee, S.; Lyu, J.; Huang, Z.; Leung, F.H.F.; Lee, T.; Yang, D.; Su, S.; Zheng, Y.; Ling, S.H. Ultrasound Spine Image Segmentation Using Multi-Scale Feature Fusion Skip-Inception U-Net (SIU-Net). Biocybern. Biomed. Eng. 2022, 42, 341–361. [Google Scholar] [CrossRef]
  164. Yamuna Devi, M.M.; Jeyabharathi, J.; Kirubakaran, S.; Narayanan, S.; Srikanth, T.; Chakrabarti, P. Efficient Segmentation and Classification of the Lung Carcinoma via Deep Learning. Multimed. Tools Appl. 2024, 83, 41981–41995. [Google Scholar] [CrossRef]
  165. Wang, J.; Xu, G.; Yan, F.; Wang, J.; Wang, Z. Defect Transformer: An Efficient Hybrid Transformer Architecture for Surface Defect Detection. Measurement 2023, 211, 112614. [Google Scholar] [CrossRef]
  166. Song, J.; Chen, X.; Zhu, Q.; Shi, F.; Xiang, D.; Chen, Z.; Fan, Y.; Pan, L.; Zhu, W. Global and Local Feature Reconstruction for Medical Image Segmentation. IEEE Trans. Med. Imaging 2022, 41, 2273–2284. [Google Scholar] [CrossRef]
  167. Periyasamy, M.; Davari, A.; Seehaus, T.; Braun, M.; Maier, A.; Christlein, V. How to Get the Most out of U-Net for Glacier Calving Front Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1712–1723. [Google Scholar] [CrossRef]
  168. Liu, X.; Hu, Y.; Chen, J. Hybrid CNN-Transformer Model for Medical Image Segmentation with Pyramid Convolution and Multi-Layer Perceptron. Biomed. Signal Process. Control 2023, 86, 105331. [Google Scholar] [CrossRef]
  169. Yazdi, R.; Khotanlou, H. MaxSigNet: Light Learnable Layer for Semantic Cell Segmentation. Biomed. Signal Process. Control 2024, 95, 106464. [Google Scholar] [CrossRef]
  170. Wu, R.; Liang, P.; Huang, X.; Shi, L.; Gu, Y.; Zhu, H.; Chang, Q. MHorUNet: High-Order Spatial Interaction UNet for Skin Lesion Segmentation. Biomed. Signal Process. Control 2024, 88, 105517. [Google Scholar] [CrossRef]
  171. Chen, Z.; Bian, Y.; Shen, E.; Fan, L.; Zhu, W.; Shi, F.; Shao, C.; Chen, X.; Xiang, D. Moment-Consistent Contrastive CycleGAN for Cross-Domain Pancreatic Image Segmentation. IEEE Trans. Med. Imaging 2024, 44, 422–435. [Google Scholar] [CrossRef]
  172. Li, G.; Han, C.; Liu, Z. No-Service Rail Surface Defect Segmentation via Normalized Attention and Dual-Scale Interaction. IEEE Trans. Instrum. Meas. 2023, 72, 3293561. [Google Scholar] [CrossRef]
  173. Lang, C.; Wang, J.; Cheng, G.; Tu, B.; Han, J. Progressive Parsing and Commonality Distillation for Few-Shot Remote Sensing Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3286183. [Google Scholar] [CrossRef]
  174. Chen, W.; Mu, Q.; Qi, J. TrUNet: Dual-Branch Network by Fusing CNN and Transformer for Skin Lesion Segmentation. IEEE Access 2024, 12, 144174–144185. [Google Scholar] [CrossRef]
  175. Liu, W.; Yang, H.; Tian, T.; Cao, Z.; Pan, X.; Xu, W.; Jin, Y.; Gao, F. Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 4623–4634. [Google Scholar] [CrossRef]
  176. Liu, W.; Li, W.; Zhu, J.; Cui, M.; Xie, X.; Zhang, L. Improving Nighttime Driving-Scene Segmentation via Dual Image-Adaptive Learnable Filters. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5855–5867. [Google Scholar] [CrossRef]
  177. Bi, L.; Zhang, W.; Zhang, X.; Li, C. A Nighttime Driving-Scene Segmentation Method Based on Light-Enhanced Network. World Electr. Veh. J. 2024, 15, 490. [Google Scholar] [CrossRef]
  178. Wang, J.; Zhao, H.; Liang, W.; Wang, S.; Zhang, Y. Cross-Convolutional Transformer for Automated Multi-Organs Segmentation in a Variety of Medical Images. Phys. Med. Biol. 2023, 68, 035008. [Google Scholar] [CrossRef] [PubMed]
  179. Xu, Q.; Ma, Z.; He, N.; Duan, W. DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation. Comput. Biol. Med. 2023, 154, 106626. [Google Scholar] [CrossRef] [PubMed]
  180. Wang, K.; Zhang, X.; Zhang, X.; Lu, Y.; Huang, S.; Yang, D. EANet: Iterative Edge Attention Network for Medical Image Segmentation. Pattern Recognit. 2022, 127, 108636. [Google Scholar] [CrossRef]
  181. Liu, X.; Yang, L.; Chen, J.; Yu, S.; Li, K. Region-to-Boundary Deep Learning Model with Multi-Scale Feature Fusion for Medical Image Segmentation. Biomed. Signal Process. Control 2022, 71, 103165. [Google Scholar] [CrossRef]
  182. Liu, Y.; Shen, J.; Yang, L.; Bian, G.; Yu, H. ResDO-UNet: A Deep Residual Network for Accurate Retinal Vessel Segmentation from Fundus Images. Biomed. Signal Process. Control 2022, 79, 104087. [Google Scholar] [CrossRef]
  183. Ding, H.; Cen, Q.; Si, X.; Pan, Z.; Chen, X. Automatic Glottis Segmentation for Laryngeal Endoscopic Images Based on U-Net. Biomed. Signal Process. Control 2022, 71, 103116. [Google Scholar] [CrossRef]
  184. Luo, F.; Cui, Y.; Liao, Y. MVRA-UNet: Multi-View Residual Attention U-Net for Precise Defect Segmentation on Magnetic Tile Surface. IEEE Access 2023, 11, 135212–135221. [Google Scholar] [CrossRef]
Figure 1. Schematic of the publication selection process for LCI SS models. Numbers in parentheses indicate publications selected or excluded at each stage.
Figure 1. Schematic of the publication selection process for LCI SS models. Numbers in parentheses indicate publications selected or excluded at each stage.
Sensors 25 02043 g001
Figure 2. Pie chart showing application distributions before (a) and after (b) applying the quality assessment checklist.
Figure 2. Pie chart showing application distributions before (a) and after (b) applying the quality assessment checklist.
Sensors 25 02043 g002
Figure 3. Examples of LCIs (top) and segmented masks (bottom) from public datasets: (a) MT—crack on a magnetic tile; (b) ISPRS-Potsdam—satellite imagery; (c) CVC-ClinicDB—colonoscopy polyp; (d) NightCity—nighttime driving scene; (e) SmokeSeger—chimney smoke; (f) DRIVE—retinal vessels.
Figure 3. Examples of LCIs (top) and segmented masks (bottom) from public datasets: (a) MT—crack on a magnetic tile; (b) ISPRS-Potsdam—satellite imagery; (c) CVC-ClinicDB—colonoscopy polyp; (d) NightCity—nighttime driving scene; (e) SmokeSeger—chimney smoke; (f) DRIVE—retinal vessels.
Sensors 25 02043 g003
Figure 7. LCSeg-Net implementation schematic [27]. During training, preprocessed images enter the encoder; intermediate decoder feature maps undergo deep supervision with boundary loss. The final output is refined via a Loss Function (LF) against the ground truth mask. In testing, only the encoder–decoder structure is utilized.
Figure 7. LCSeg-Net implementation schematic [27]. During training, preprocessed images enter the encoder; intermediate decoder feature maps undergo deep supervision with boundary loss. The final output is refined via a Loss Function (LF) against the ground truth mask. In testing, only the encoder–decoder structure is utilized.
Sensors 25 02043 g007
Figure 8. Bar chart comparing chart comparing Dice Similarity Coefficient (DSC) of reviewed methods to UNet, based on the best-performing dataset for each method.
Figure 8. Bar chart comparing chart comparing Dice Similarity Coefficient (DSC) of reviewed methods to UNet, based on the best-performing dataset for each method.
Sensors 25 02043 g008
Figure 9. Dice Similarity Coefficient (DSC) of reviewed methods on (a) surface defect (MT, NEU, RSDD) and (b) skin lesion (ISIC, PH2) datasets, plotted against model size. Datasets within each category share similar LCI characteristics.
Figure 9. Dice Similarity Coefficient (DSC) of reviewed methods on (a) surface defect (MT, NEU, RSDD) and (b) skin lesion (ISIC, PH2) datasets, plotted against model size. Datasets within each category share similar LCI characteristics.
Sensors 25 02043 g009
Figure 10. Qualitative results. (1) Results presented in [42]; (2) results presented in [35]. (a) Original Image; (b) ground truth; (c) LCI method; and (d) baseline (Top: UNet; Bottom: SegFormer).
Figure 10. Qualitative results. (1) Results presented in [42]; (2) results presented in [35]. (a) Original Image; (b) ground truth; (c) LCI method; and (d) baseline (Top: UNet; Bottom: SegFormer).
Sensors 25 02043 g010
Table 1. Summary of the key characteristics of the reviewed methods.
Table 1. Summary of the key characteristics of the reviewed methods.
Ref.NameTypeMechanism to Enhance ERFParameters (M)Maximum
Performance (%)
DatasetHighlightsLimitations
Surface Defect
[28]CASDDCNNAC38.34DSC 92.1NEU/RCD/OPDDImproves adaptability to data variability.Poor defect detection with an aspect ratio of 0.4 or lower.
[29]EMRANETCNNBAM+RANRmIoU 87.36DAGM 2007/MT/AITEX/RSDDOptimizes the extraction and fusion of global features.Poor defect detection with a low aspect ratio.
[30]PCTNetCNN+ViTAC+MHSA30.05mIoU 90.53, DSC 94.8Crack RExpands receptive field for lower-level features while limiting it for higher-level features.High computational cost.
[31]STCNet IICNNASPPNRmIoU 87.07Own datasetImproves accuracy by enlarging the ERF while maintaining image resolution.Only captures objects between 0.1 and 0.3 mm in width.
Scene Understanding
[18]NRCNNAC+CANRmPa 93.24, mIoU 90.82Own datasetSuppresses background information interference.Fine details are not captured.
[32]RNightSegCNN+ViT+MLPMHSA104.16mIoU 57.91BDD100K-Night/ NightCity+Handles over- or underexposure caused by uneven lighting.High computational cost, not real-time capable.
Mineral Exploration
[33]FAM-CRFSNCNNAC24.57mIoU 85.77, mPa 92.12Own datasetOptimizes the extraction of base architecture features.High computational cost.
Remote Sensing
[34]GVANetCNN+ViT+MLPBAM+MLP28.59mIoU 87.6, DSC 92.82ISPRS-Vaihingen/ ISPRS-
Postdam
Enables multiview expansion of single-view information and cross-level information interaction.Poor edge detection.
Smoke Detection
[27]LCSeg-NetCNN+MLPMLP22.6DSC 95.82, mIoU 92.02SSDReduces noise in edge inference by applying filters and using uncertainty models during feature fusion.High dependency on label resolution.
[35]SmokeSegerCNN+ViT+MLPDwC+MHSA+MLP34mIoU 91.6USS/SSSOptimizes global and local feature capture and fusion.Not real-time capable for practical applications.
Medical
[36]HEAT-NetCNN+ViTDA+MHSANRDSC 94BUSI/DDTI/
TN3k/CAMUS
Reduces localization errors caused by structural and pixel intensity similarity.Not real-time.
[37]BSANetCNN+ViTDwC+MHSA14.15DSC 96.48RVSC/SCD/
SunnyBrook
Improves adaptation to scale and shape variations.Application-specific.
[38]CPCANetCNNCPCA43.43DSC 93.7ACDC/ISIC 2016/PH2/Synapse/EMModifies the attention module of the base model to make it more efficient and lightweight.Does not support datasets of different sizes than the one used in the study.
[39]CoVi-NetCNN+ViTDS+MHSA22.99Acc 97.1DRIVE/CHASEDB1/STAREIncorporates local and global features in a transformer.Overfitting during training; high computational cost.
[40]CSwin-PNetCNN+ViTSE+MHSANRDSC 87.25UDIATOptimizes global and local feature fusion between the encoder and decoder.Poor classification of diffuse areas.
[41]FDR-TransUNetCNN+ViTMHSA101mIoU 90, DSC 97.5COVID-19 RadiographyProposes an encoder adjusted to depth, retaining more local features.Does not support data variability; high computational cost.
[42]GT-DLA-dsHFFCNN+ViTAC+SE+MHSA26.08DSC 86.5DRIVE/STARE/CHASE_DB1/HRFApplies local and global attention modules in parallel, consolidating edge and fine detail detection.High computational cost.
[43]H2FormerCNN+ViTMHSA+SE33.71DSC 91.8, mIoU 86.29IDRiD/KVASIR SEG/SKIN LESION
Implements attention modules at various depths, improving feature representation capability.Fails to capture diffuse areas and small objects.
[44]LightCM-PNetCNN+MLPMLP+DS28.78DSC 92.42Own datasetReal-time inference; enhances channel information exchange and context information perception.Poor segmentation of diffuse areas.
[45]TBNetCNN+ViTMHSA14.45DSC 80.26, mIoU 67.14TM-EM3000/ ALIZARINE/ SP-3000Optimizes global and local feature capture and fusion.Over-segmentation in regions with few pixels.
[46]Ms REDCNNDC+CA+SA3.8DSC 94.65, mIoU 90.14ISIC 2016-2017-2018/PH^2Reduces the number of parameters in the base model, requiring fewer labeled data for training.Focuses mainly on local features.
[47]SWTRUCNN+ViTMHSA31DSC 97.2, mIoU 94.9CHAOS/ISIC 2018/LGGEfficiently captures global features.Large parameter count compared to baselines; high convergence time during training.
[48]TD-NetCNN+ViTMHSA+DeCNRDSC 91.22NIH/MSDImproves inference of diffuse edges and irregular shapes.Requires a large dataset for training.
[49]PPLCNNDCNRDSC 95.76, mIoU 92.23DCA/XCAProgressively builds context, inference, and boundary perception.Limited generalization to specific application types.
[50]U-NTCACNN +ViTMHSANRDSC 86.42, Acc 97.78Own datasetGenerate low-level positional and morphological features that are transmitted to the upper layers to facilitate multi-scale feature fusion.It does not consider edge constraints to address incorrectly connected cells or cells with broken edges.
NR: Not Reported; metrics reflect peak performance on primary datasets.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Urrea, C.; Vélez, M. Advances in Deep Learning for Semantic Segmentation of Low-Contrast Images: A Systematic Review of Methods, Challenges, and Future Directions. Sensors 2025, 25, 2043. https://doi.org/10.3390/s25072043

AMA Style

Urrea C, Vélez M. Advances in Deep Learning for Semantic Segmentation of Low-Contrast Images: A Systematic Review of Methods, Challenges, and Future Directions. Sensors. 2025; 25(7):2043. https://doi.org/10.3390/s25072043

Chicago/Turabian Style

Urrea, Claudio, and Maximiliano Vélez. 2025. "Advances in Deep Learning for Semantic Segmentation of Low-Contrast Images: A Systematic Review of Methods, Challenges, and Future Directions" Sensors 25, no. 7: 2043. https://doi.org/10.3390/s25072043

APA Style

Urrea, C., & Vélez, M. (2025). Advances in Deep Learning for Semantic Segmentation of Low-Contrast Images: A Systematic Review of Methods, Challenges, and Future Directions. Sensors, 25(7), 2043. https://doi.org/10.3390/s25072043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop