Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (369)

Search Parameters:
Keywords = saliency maps

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 3234 KiB  
Article
A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI
by Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen and Firoz Ahmed
J. Imaging 2025, 11(8), 275; https://doi.org/10.3390/jimaging11080275 (registering DOI) - 15 Aug 2025
Abstract
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all [...] Read more.
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17–1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×104, reduce-on-plateau, early stopping), 2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

30 pages, 2261 KiB  
Article
Multilayer Perceptron Mapping of Subjective Time Duration onto Mental Imagery Vividness and Underlying Brain Dynamics: A Neural Cognitive Modeling Approach
by Matthew Sheculski and Amedeo D’Angiulli
Mach. Learn. Knowl. Extr. 2025, 7(3), 82; https://doi.org/10.3390/make7030082 - 13 Aug 2025
Viewed by 232
Abstract
According to a recent experimental phenomenology–information processing theory, the sensory strength, or vividness, of visual mental images self-reported by human observers reflects the intensive variation in subjective time duration during the process of generation of said mental imagery. The primary objective of this [...] Read more.
According to a recent experimental phenomenology–information processing theory, the sensory strength, or vividness, of visual mental images self-reported by human observers reflects the intensive variation in subjective time duration during the process of generation of said mental imagery. The primary objective of this study was to test the hypothesis that a biologically plausible essential multilayer perceptron (MLP) architecture can validly map the phenomenological categories of subjective time duration onto levels of subjectively self-reported vividness. A secondary objective was to explore whether this type of neural network cognitive modeling approach can give insight into plausible underlying large-scale brain dynamics. To achieve these objectives, vividness self-reports and reaction times from a previously collected database were reanalyzed using multilayered perceptron network models. The input layer consisted of six levels representing vividness self-reports and a reaction time cofactor. A single hidden layer consisted of three nodes representing the salience, task positive, and default mode networks. The output layer consisted of five levels representing Vittorio Benussi’s subjective time categories. Across different models of networks, Benussi’s subjective time categories (Level 1 = very brief, 2 = brief, 3 = present, 4 = long, 5 = very long) were predicted by visual imagery vividness level 1 (=no image) to 5 (=very vivid) with over 90% success in classification accuracy, precision, recall, and F1-score. This accuracy level was maintained after 5-fold cross validation. Linear regressions, Welch’s t-test for independent coefficients, and Pearson’s correlation analysis were applied to the resulting hidden node weight vectors, obtaining evidence for strong correlation and anticorrelation between nodes. This study successfully mapped Benussi’s five levels of subjective time categories onto the activation patterns of a simple MLP, providing a novel computational framework for experimental phenomenology. Our results revealed structured, complex dynamics between the task positive network (TPN), the default mode network (DMN), and the salience network (SN), suggesting that the neural mechanisms underlying temporal consciousness involve flexible network interactions beyond the traditional triple network model. Full article
Show Figures

Figure 1

20 pages, 6359 KiB  
Article
Symmetry in Explainable AI: A Morphometric Deep Learning Analysis for Skin Lesion Classification
by Rafael Fernandez, Angélica Guzmán-Ponce, Ruben Fernandez-Beltran and Ginés García-Mateos
Symmetry 2025, 17(8), 1264; https://doi.org/10.3390/sym17081264 - 7 Aug 2025
Viewed by 229
Abstract
Deep learning has achieved remarkable performance in skin lesion classification, but its lack of interpretability often remains a critical barrier to clinical adoption. In this study, we investigate the spatial properties of saliency-based model explanations, focusing on symmetry and other morphometric features. We [...] Read more.
Deep learning has achieved remarkable performance in skin lesion classification, but its lack of interpretability often remains a critical barrier to clinical adoption. In this study, we investigate the spatial properties of saliency-based model explanations, focusing on symmetry and other morphometric features. We benchmark five deep learning architectures (ResNet-50, EfficientNetV2-S, ConvNeXt-Tiny, Swin-Tiny, and MaxViT-Tiny) on a nine-class skin lesion dataset from the International Skin Imaging Collaboration (ISIC) archive, generating saliency maps with Grad-CAM++ and LayerCAM. The best-performing model, Swin-Tiny, achieved an accuracy of 78.2% and a macro-F1 score of 71.2%. Our morphometric analysis reveals statistically significant differences in the explanation maps between correct and incorrect predictions. Notably, the transformer-based models exhibit highly significant differences (p<0.001) in metrics related to attentional focus (Entropy and Gini), indicating that their correct predictions are associated with more concentrated saliency maps. In contrast, convolutional models show less consistent differences, and only at a standard significance level (p<0.05). These findings suggest that the quantitative morphometric properties of saliency maps could serve as valuable indicators of predictive reliability in medical AI. Full article
Show Figures

Figure 1

27 pages, 5228 KiB  
Article
Detection of Surface Defects in Steel Based on Dual-Backbone Network: MBDNet-Attention-YOLO
by Xinyu Wang, Shuhui Ma, Shiting Wu, Zhaoye Li, Jinrong Cao and Peiquan Xu
Sensors 2025, 25(15), 4817; https://doi.org/10.3390/s25154817 - 5 Aug 2025
Viewed by 494
Abstract
Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical [...] Read more.
Automated surface defect detection in steel manufacturing is pivotal for ensuring product quality, yet it remains an open challenge owing to the extreme heterogeneity of defect morphologies—ranging from hairline cracks and microscopic pores to elongated scratches and shallow dents. Existing approaches, whether classical vision pipelines or recent deep-learning paradigms, struggle to simultaneously satisfy the stringent demands of industrial scenarios: high accuracy on sub-millimeter flaws, insensitivity to texture-rich backgrounds, and real-time throughput on resource-constrained hardware. Although contemporary detectors have narrowed the gap, they still exhibit pronounced sensitivity–robustness trade-offs, particularly in the presence of scale-varying defects and cluttered surfaces. To address these limitations, we introduce MBY (MBDNet-Attention-YOLO), a lightweight yet powerful framework that synergistically couples the MBDNet backbone with the YOLO detection head. Specifically, the backbone embeds three novel components: (1) HGStem, a hierarchical stem block that enriches low-level representations while suppressing redundant activations; (2) Dynamic Align Fusion (DAF), an adaptive cross-scale fusion mechanism that dynamically re-weights feature contributions according to defect saliency; and (3) C2f-DWR, a depth-wise residual variant that progressively expands receptive fields without incurring prohibitive computational costs. Building upon this enriched feature hierarchy, the neck employs our proposed MultiSEAM module—a cascaded squeeze-and-excitation attention mechanism operating at multiple granularities—to harmonize fine-grained and semantic cues, thereby amplifying weak defect signals against complex textures. Finally, we integrate the Inner-SIoU loss, which refines the geometric alignment between predicted and ground-truth boxes by jointly optimizing center distance, aspect ratio consistency, and IoU overlap, leading to faster convergence and tighter localization. Extensive experiments on two publicly available steel-defect benchmarks—NEU-DET and PVEL-AD—demonstrate the superiority of MBY. Without bells and whistles, our model achieves 85.8% mAP@0.5 on NEU-DET and 75.9% mAP@0.5 on PVEL-AD, surpassing the best-reported results by significant margins while maintaining real-time inference on an NVIDIA Jetson Xavier. Ablation studies corroborate the complementary roles of each component, underscoring MBY’s robustness across defect scales and surface conditions. These results suggest that MBY strikes an appealing balance between accuracy, efficiency, and deployability, offering a pragmatic solution for next-generation industrial quality-control systems. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 4728 KiB  
Article
A Web-Deployed, Explainable AI System for Comprehensive Brain Tumor Diagnosis
by Serra Aksoy, Pinar Demircioglu and Ismail Bogrekci
Neurol. Int. 2025, 17(8), 121; https://doi.org/10.3390/neurolint17080121 - 4 Aug 2025
Viewed by 313
Abstract
Background/Objectives: Accurate diagnosis of brain tumors is one of the most important challenges in neuro-oncology since tumor classification and volumetric segmentation inform treatment planning. Two-dimensional classification and three-dimensional segmentation deep learning models can augment radiological workflows, particularly if paired with explainable AI techniques [...] Read more.
Background/Objectives: Accurate diagnosis of brain tumors is one of the most important challenges in neuro-oncology since tumor classification and volumetric segmentation inform treatment planning. Two-dimensional classification and three-dimensional segmentation deep learning models can augment radiological workflows, particularly if paired with explainable AI techniques to improve model interpretability. The objective of this research was to develop a web-based brain tumor segmentation and classification diagnosis platform. Methods: A diagnosis system was developed combining 2D tumor classification and 3D volumetric segmentation. Classification employed a fine-tuned MobileNetV2 model trained on a glioma, meningioma, pituitary tumor, and normal control dataset. Segmentation employed a SegResNet model trained on BraTS multi-channel MRI with synthetic no-tumor data. A meta-classifier MLP was used for binary tumor detection from volumetric features. Explainability was offered using XRAI maps for 2D predictions and Gaussian overlays for 3D visualizations. The platform was incorporated into a web interface for clinical use. Results: MobileNetV2 2D model recorded 98.09% classification accuracy for tumor classification. 3D SegResNet obtained Dice coefficients around 68–70% for tumor segmentations. The MLP-based tumor detection module recorded 100% detection accuracy. Explainability modules could identify the area of the tumor, and saliency and overlay maps were consistent with real pathological features in both 2D and 3D. Conclusions: Deep learning diagnosis system possesses improved brain tumor classification and segmentation with interpretable outcomes by utilizing XAI techniques. Deployment as a web tool and a user-friendly interface made it suitable for clinical usage in radiology workflows. Full article
(This article belongs to the Section Brain Tumor and Brain Injury)
Show Figures

Figure 1

25 pages, 4241 KiB  
Article
Deep Learning for Comprehensive Analysis of Retinal Fundus Images: Detection of Systemic and Ocular Conditions
by Mohammad Mahdi Aghabeigi Alooghareh, Mohammad Mohsen Sheikhey, Ali Sahafi, Habibollah Pirnejad and Amin Naemi
Bioengineering 2025, 12(8), 840; https://doi.org/10.3390/bioengineering12080840 - 3 Aug 2025
Viewed by 768
Abstract
The retina offers a unique window into both ocular and systemic health, motivating the development of AI-based tools for disease screening and risk assessment. In this study, we present a comprehensive evaluation of six state-of-the-art deep neural networks, including convolutional neural networks and [...] Read more.
The retina offers a unique window into both ocular and systemic health, motivating the development of AI-based tools for disease screening and risk assessment. In this study, we present a comprehensive evaluation of six state-of-the-art deep neural networks, including convolutional neural networks and vision transformer architectures, on the Brazilian Multilabel Ophthalmological Dataset (BRSET), comprising 16,266 fundus images annotated for multiple clinical and demographic labels. We explored seven classification tasks: Diabetes, Diabetic Retinopathy (2-class), Diabetic Retinopathy (3-class), Hypertension, Hypertensive Retinopathy, Drusen, and Sex classification. Models were evaluated using precision, recall, F1-score, accuracy, and AUC. Among all models, the Swin-L generally delivered the best performance across scenarios for Diabetes (AUC = 0.88, weighted F1-score = 0.86), Diabetic Retinopathy (2-class) (AUC = 0.98, weighted F1-score = 0.95), Diabetic Retinopathy (3-class) (macro AUC = 0.98, weighted F1-score = 0.95), Hypertension (AUC = 0.85, weighted F1-score = 0.79), Hypertensive Retinopathy (AUC = 0.81, weighted F1-score = 0.97), Drusen detection (AUC = 0.93, weighted F1-score = 0.90), and Sex classification (AUC = 0.87, weighted F1-score = 0.80). These results reflect excellent to outstanding diagnostic performance. We also employed gradient-based saliency maps to enhance explainability and visualize decision-relevant retinal features. Our findings underscore the potential of deep learning, particularly vision transformer models, to deliver accurate, interpretable, and clinically meaningful screening tools for retinal and systemic disease detection. Full article
(This article belongs to the Special Issue Machine Learning in Chronic Diseases)
Show Figures

Figure 1

18 pages, 2688 KiB  
Article
Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
by Jie Zhao, Ying Gao, Chunjuan Bo and Dong Wang
Sensors 2025, 25(15), 4691; https://doi.org/10.3390/s25154691 - 29 Jul 2025
Viewed by 163
Abstract
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are [...] Read more.
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. Full article
Show Figures

Figure 1

21 pages, 2965 KiB  
Article
Inspection Method Enabled by Lightweight Self-Attention for Multi-Fault Detection in Photovoltaic Modules
by Shufeng Meng and Tianxu Xu
Electronics 2025, 14(15), 3019; https://doi.org/10.3390/electronics14153019 - 29 Jul 2025
Viewed by 356
Abstract
Bird-dropping fouling and hotspot anomalies remain the most prevalent and detrimental defects in utility-scale photovoltaic (PV) plants; their co-occurrence on a single module markedly curbs energy yield and accelerates irreversible cell degradation. However, markedly disparate visual–thermal signatures of the two phenomena impede high-fidelity [...] Read more.
Bird-dropping fouling and hotspot anomalies remain the most prevalent and detrimental defects in utility-scale photovoltaic (PV) plants; their co-occurrence on a single module markedly curbs energy yield and accelerates irreversible cell degradation. However, markedly disparate visual–thermal signatures of the two phenomena impede high-fidelity concurrent detection in existing robotic inspection systems, while stringent onboard compute budgets also preclude the adoption of bulky detectors. To resolve this accuracy–efficiency trade-off for dual-defect detection, we present YOLOv8-SG, a lightweight yet powerful framework engineered for mobile PV inspectors. First, a rigorously curated multi-modal dataset—RGB for stains and long-wave infrared for hotspots—is assembled to enforce robust cross-domain representation learning. Second, the HSV color space is leveraged to disentangle chromatic and luminance cues, thereby stabilizing appearance variations across sensors. Third, a single-head self-attention (SHSA) block is embedded in the backbone to harvest long-range dependencies at negligible parameter cost, while a global context (GC) module is grafted onto the detection head to amplify fine-grained semantic cues. Finally, an auxiliary bounding box refinement term is appended to the loss to hasten convergence and tighten localization. Extensive field experiments demonstrate that YOLOv8-SG attains 86.8% mAP@0.5, surpassing the vanilla YOLOv8 by 2.7 pp while trimming 12.6% of parameters (18.8 MB). Grad-CAM saliency maps corroborate that the model’s attention consistently coincides with defect regions, underscoring its interpretability. The proposed method, therefore, furnishes PV operators with a practical low-latency solution for concurrent bird-dropping and hotspot surveillance. Full article
Show Figures

Figure 1

21 pages, 2308 KiB  
Article
Forgery-Aware Guided Spatial–Frequency Feature Fusion for Face Image Forgery Detection
by Zhenxiang He, Zhihao Liu and Ziqi Zhao
Symmetry 2025, 17(7), 1148; https://doi.org/10.3390/sym17071148 - 18 Jul 2025
Cited by 1 | Viewed by 413
Abstract
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they [...] Read more.
The rapid development of deepfake technologies has led to the widespread proliferation of facial image forgeries, raising significant concerns over identity theft and the spread of misinformation. Although recent dual-domain detection approaches that integrate spatial and frequency features have achieved noticeable progress, they still suffer from limited sensitivity to local forgery regions and inadequate interaction between spatial and frequency information in practical applications. To address these challenges, we propose a novel forgery-aware guided spatial–frequency feature fusion network. A lightweight U-Net is employed to generate pixel-level saliency maps by leveraging structural symmetry and semantic consistency, without relying on ground-truth masks. These maps dynamically guide the fusion of spatial features (from an improved Swin Transformer) and frequency features (via Haar wavelet transforms). Cross-domain attention, channel recalibration, and spatial gating are introduced to enhance feature complementarity and regional discrimination. Extensive experiments conducted on two benchmark face forgery datasets, FaceForensics++ and Celeb-DFv2, show that the proposed method consistently outperforms existing state-of-the-art techniques in terms of detection accuracy and generalization capability. The future work includes improving robustness under compression, incorporating temporal cues, extending to multimodal scenarios, and evaluating model efficiency for real-world deployment. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

16 pages, 6900 KiB  
Article
Infrared Small Target Detection via Modified Fast Saliency and Weighted Guided Image Filtering
by Yi Cui, Tao Lei, Guiting Chen, Yunjing Zhang, Gang Zhang and Xuying Hao
Sensors 2025, 25(14), 4405; https://doi.org/10.3390/s25144405 - 15 Jul 2025
Viewed by 324
Abstract
The robust detection of small targets is crucial in infrared (IR) search and tracking applications. Considering that many state-of-the-art (SOTA) methods are still unable to suppress various edges satisfactorily, especially under complex backgrounds, an effective infrared small target detection algorithm inspired by modified [...] Read more.
The robust detection of small targets is crucial in infrared (IR) search and tracking applications. Considering that many state-of-the-art (SOTA) methods are still unable to suppress various edges satisfactorily, especially under complex backgrounds, an effective infrared small target detection algorithm inspired by modified fast saliency and the weighted guided image filter (WGIF) is presented in this paper. Initially, the fast saliency map modulated by the steering kernel (SK) is calculated. Then, a set of edge-preserving smoothed images are produced by WGIF using different filter radii and regularization parameters. After that, utilizing the fuzzy sets technique, the background image is predicted reasonably according to the results of the saliency map and smoothed or non-smoothed images. Finally, the differential image is calculated by subtracting the predicted image from the original one, and IR small targets are detected through a simple thresholding. Experimental results on four sequences demonstrate that the proposed method can not only suppress background clutter effectively under strong edge interference but also detect targets accurately with a low false alarm rate. Full article
Show Figures

Figure 1

26 pages, 1804 KiB  
Article
Dependency-Aware Entity–Attribute Relationship Learning for Text-Based Person Search
by Wei Xia, Wenguang Gan and Xinpan Yuan
Big Data Cogn. Comput. 2025, 9(7), 182; https://doi.org/10.3390/bdcc9070182 - 7 Jul 2025
Viewed by 468
Abstract
Text-based person search (TPS), a critical technology for security and surveillance, aims to retrieve target individuals from image galleries using textual descriptions. The existing methods face two challenges: (1) ambiguous attribute–noun association (AANA), where syntactic ambiguities lead to incorrect associations between attributes and [...] Read more.
Text-based person search (TPS), a critical technology for security and surveillance, aims to retrieve target individuals from image galleries using textual descriptions. The existing methods face two challenges: (1) ambiguous attribute–noun association (AANA), where syntactic ambiguities lead to incorrect associations between attributes and the intended nouns; and (2) textual noise and relevance imbalance (TNRI), where irrelevant or non-discriminative tokens (e.g., ‘wearing’) reduce the saliency of critical visual attributes in the textual description. To address these aspects, we propose the dependency-aware entity–attribute alignment network (DEAAN), a novel framework that explicitly tackles AANA through dependency-guided attention and TNRI via adaptive token filtering. The DEAAN introduces two modules: (1) dependency-assisted implicit reasoning (DAIR) to resolve AANA through syntactic parsing, and (2) relevance-adaptive token selection (RATS) to suppress TNRI by learning token saliency. Experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate state-of-the-art performance, with the DEAAN achieving a Rank-1 accuracy of 76.71% and an mAP of 69.07% on CUHK-PEDES, surpassing RDE by 0.77% in Rank-1 and 1.51% in mAP. Ablation studies reveal that DAIR and RATS individually improve Rank-1 by 2.54% and 3.42%, while their combination elevates the performance by 6.35%, validating their synergy. This work bridges structured linguistic analysis with adaptive feature selection, demonstrating practical robustness in surveillance-oriented TPS scenarios. Full article
Show Figures

Figure 1

24 pages, 16234 KiB  
Article
A Contrast-Enhanced Feature Reconstruction for Fixed PTZ Camera-Based Crack Recognition in Expressways
by Xuezhi Feng and Chunyan Shao
Electronics 2025, 14(13), 2617; https://doi.org/10.3390/electronics14132617 - 28 Jun 2025
Viewed by 176
Abstract
Efficient and accurate recognition of highway pavement cracks is crucial for the timely maintenance and long-term use of expressways. Among the existing crack acquisition methods, human-based approaches are inefficient, whereas carrier-based automated methods are expensive. Additionally, both methods present challenges related to traffic [...] Read more.
Efficient and accurate recognition of highway pavement cracks is crucial for the timely maintenance and long-term use of expressways. Among the existing crack acquisition methods, human-based approaches are inefficient, whereas carrier-based automated methods are expensive. Additionally, both methods present challenges related to traffic obstruction and safety risks. To address these challenges, we propose a fixed pan-tilt-zoom (PTZ) vision-based highway pavement crack recognition workflow. Pavement cracks often exhibit complex textures with blurred boundaries, low contrast, and discontinuous pixels, leading to missed and false detection. To mitigate these issues, we introduce an algorithm named contrast-enhanced feature reconstruction (CEFR), which consists of three parts: comparison-based pixel transformation, nonlinear stretching, and generating a saliency map. CEFR is an image pre-processing algorithm that enhances crack edges and establishes uniform inner-crack characteristics, thereby increasing the contrast between cracks and the background. Extensive experiments demonstrate that CEFR improves recognition performance, yielding increases of 3.1% in F1-score, 2.6% in mAP@0.5, and 4.6% in mAP@0.5:0.95, compared with the dataset without CEFR. The effectiveness and generalisability of CEFR are validated across multiple models, datasets, and tasks, confirming its applicability for highway maintenance engineering. Full article
Show Figures

Figure 1

26 pages, 571 KiB  
Review
Explainable Artificial Intelligence: Advancements and Limitations
by Halil Ibrahim Aysel, Xiaohao Cai and Adam Prugel-Bennett
Appl. Sci. 2025, 15(13), 7261; https://doi.org/10.3390/app15137261 - 27 Jun 2025
Viewed by 812
Abstract
Explainable artificial intelligence (XAI) has emerged as a crucial field for understanding and interpreting the decisions of complex machine learning models, particularly deep neural networks. This review presents a structured overview of XAI methodologies, encompassing a diverse range of techniques designed to provide [...] Read more.
Explainable artificial intelligence (XAI) has emerged as a crucial field for understanding and interpreting the decisions of complex machine learning models, particularly deep neural networks. This review presents a structured overview of XAI methodologies, encompassing a diverse range of techniques designed to provide explainability at different levels of abstraction. We cover pixel-level explanation strategies such as saliency maps, perturbation-based methods and gradient-based visualisations, as well as concept-based approaches that align model behaviour with human-understandable semantics. Additionally, we touch upon the relevance of XAI in the context of weakly supervised semantic segmentation. By synthesising recent developments, this paper aims to clarify the landscape of XAI methods and offer insights into their comparative utility and role in fostering trustworthy AI systems. Full article
Show Figures

Figure 1

20 pages, 1771 KiB  
Article
An Innovative Artificial Intelligence Classification Model for Non-Ischemic Cardiomyopathy Utilizing Cardiac Biomechanics Derived from Magnetic Resonance Imaging
by Liqiang Fu, Peifang Zhang, Liuquan Cheng, Peng Zhi, Jiayu Xu, Xiaolei Liu, Yang Zhang, Ziwen Xu and Kunlun He
Bioengineering 2025, 12(6), 670; https://doi.org/10.3390/bioengineering12060670 - 19 Jun 2025
Viewed by 672
Abstract
Significant challenges persist in diagnosing non-ischemic cardiomyopathies (NICMs) owing to early morphological overlap and subtle functional changes. While cardiac magnetic resonance (CMR) offers gold-standard structural assessment, current morphology-based AI models frequently overlook key biomechanical dysfunctions like diastolic/systolic abnormalities. To address this, we propose [...] Read more.
Significant challenges persist in diagnosing non-ischemic cardiomyopathies (NICMs) owing to early morphological overlap and subtle functional changes. While cardiac magnetic resonance (CMR) offers gold-standard structural assessment, current morphology-based AI models frequently overlook key biomechanical dysfunctions like diastolic/systolic abnormalities. To address this, we propose a dual-path hybrid deep learning framework based on CNN-LSTM and MLP, integrating anatomical features from cine CMR with biomechanical markers derived from intraventricular pressure gradients (IVPGs), significantly enhancing NICM subtype classification by capturing subtle biomechanical dysfunctions overlooked by traditional morphological models. Our dual-path architecture combines a CNN-LSTM encoder for cine CMR analysis and an MLP encoder for IVPG time-series data, followed by feature fusion and dense classification layers. Trained on a multicenter dataset of 1196 patients and externally validated on 137 patients from a distinct institution, the model achieved a superior performance (internal AUC: 0.974; external AUC: 0.962), outperforming ResNet50, VGG16, and radiomics-based SVM. Ablation studies confirmed IVPGs’ significant contribution, while gradient saliency and gradient-weighted class activation mapping (Grad-CAM) visualizations proved the model pays attention to physiologically relevant cardiac regions and phases. The framework maintained robust generalizability across imaging protocols and institutions with minimal performance degradation. By synergizing biomechanical insights with deep learning, our approach offers an interpretable, data-efficient solution for early NICM detection and subtype differentiation, holding strong translational potential for clinical practice. Full article
(This article belongs to the Special Issue Bioengineering in a Generative AI World)
Show Figures

Figure 1

19 pages, 1563 KiB  
Article
Small Object Tracking in LiDAR Point Clouds: Learning the Target-Awareness Prototype and Fine-Grained Search Region
by Shengjing Tian, Yinan Han, Xiantong Zhao and Xiuping Liu
Sensors 2025, 25(12), 3633; https://doi.org/10.3390/s25123633 - 10 Jun 2025
Viewed by 773
Abstract
Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point [...] Read more.
Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point cloud data. Current methods predominantly focus on developing universal frameworks for general object categories, often sidelining the persistent difficulties associated with small objects. These challenges stem from a scarcity of foreground points and a low tolerance for disturbances. To this end, we propose a deep neural network framework that trains a Siamese network for feature extraction and innovatively incorporates two pivotal modules: the target-awareness prototype mining (TAPM) module and the regional grid subdivision (RGS) module. The TAPM module utilizes the reconstruction mechanism of the masked auto-encoder to distill prototypes within the feature space, thereby enhancing the salience of foreground points and aiding in the precise localization of small objects. To heighten the tolerance of disturbances in feature maps, the RGS module is devised to retrieve detailed features of the search area, capitalizing on Vision Transformer and pixel shuffle technologies. Furthermore, beyond standard experimental configurations, we have meticulously crafted scaling experiments to assess the robustness of various trackers when dealing with small objects. Comprehensive evaluations show our method achieves a mean Success of 64.9% and 60.4% under original and scaled settings, outperforming benchmarks by +3.6% and +5.4%, respectively. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

Back to TopTop