Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,024)

Search Parameters:
Keywords = multimodal image

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
38 pages, 1548 KB  
Perspective
RGB-D Cameras and Brain–Computer Interfaces for Human Activity Recognition: An Overview
by Grazia Iadarola, Alessandro Mengarelli, Sabrina Iarlori, Andrea Monteriù and Susanna Spinsante
Sensors 2025, 25(20), 6286; https://doi.org/10.3390/s25206286 - 10 Oct 2025
Abstract
This paper provides a perspective on the use of RGB-D cameras and non-invasive brain–computer interfaces (BCIs) for human activity recognition (HAR). Then, it explores the potential of integrating both the technologies for active and assisted living. RGB-D cameras can offer monitoring of users [...] Read more.
This paper provides a perspective on the use of RGB-D cameras and non-invasive brain–computer interfaces (BCIs) for human activity recognition (HAR). Then, it explores the potential of integrating both the technologies for active and assisted living. RGB-D cameras can offer monitoring of users in their living environments, preserving their privacy in human activity recognition through depth images and skeleton tracking. Concurrently, non-invasive BCIs can provide access to intent and control of users by decoding neural signals. The synergy between these technologies may allow holistic understanding of both physical context and cognitive state of users, to enhance personalized assistance inside smart homes. The successful deployment in integrating the two technologies needs addressing critical technical hurdles, including computational demands for real-time multi-modal data processing, and user acceptance challenges related to data privacy, security, and BCI illiteracy. Continued interdisciplinary research is essential to realize the full potential of RGB-D cameras and BCIs as AAL solutions, in order to improve the quality of life for independent or impaired people. Full article
(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)
Show Figures

Figure 1

31 pages, 2953 KB  
Article
A Balanced Multimodal Multi-Task Deep Learning Framework for Robust Patient-Specific Quality Assurance
by Xiaoyang Zeng, Awais Ahmed and Muhammad Hanif Tunio
Diagnostics 2025, 15(20), 2555; https://doi.org/10.3390/diagnostics15202555 - 10 Oct 2025
Abstract
Background: Multimodal Deep learning has emerged as a crucial method for automated patient-specific quality assurance (PSQA) in radiotherapy research. Integrating image-based dose matrices with tabular plan complexity metrics enables more accurate prediction of quality indicators, including the Gamma Passing Rate (GPR) and dose [...] Read more.
Background: Multimodal Deep learning has emerged as a crucial method for automated patient-specific quality assurance (PSQA) in radiotherapy research. Integrating image-based dose matrices with tabular plan complexity metrics enables more accurate prediction of quality indicators, including the Gamma Passing Rate (GPR) and dose difference (DD). However, modality imbalance remains a significant challenge, as tabular encoders often dominate training, suppressing image encoders and reducing model robustness. This issue becomes more pronounced under task heterogeneity, with GPR prediction relying more on tabular data, whereas dose difference prediction (DDP) depends heavily on image features. Methods: We propose BMMQA (Balanced Multi-modal Quality Assurance), a novel framework that achieves modality balance by adjusting modality-specific loss factors to control convergence dynamics. The framework introduces four key innovations: (1) task-specific fusion strategies (softmax-weighted attention for GPR regression and spatial cascading for DD prediction); (2) a balancing mechanism supported by Shapley values to quantify modality contributions; (3) a fast network forward mechanism for efficient computation of different modality combinations; and (4) a modality-contribution-based task weighting scheme for multi-task multimodal learning. A large-scale multimodal dataset comprising 1370 IMRT plans was curated in collaboration with Peking Union Medical College Hospital (PUMCH). Results: Experimental results demonstrate that, under the standard 2%/3 mm GPR criterion, BMMQA outperforms existing fusion baselines. Under the stricter 2%/2 mm criterion, it achieves a 15.7% reduction in mean absolute error (MAE). The framework also enhances robustness in critical failure cases (GPR < 90%) and achieves a peak SSIM of 0.964 in dose distribution prediction. Conclusions: Explicit modality balancing improves predictive accuracy and strengthens clinical trustworthiness by mitigating overreliance on a single modality. This work highlights the importance of addressing modality imbalance for building trustworthy and robust AI systems in PSQA and establishes a pioneering framework for multi-task multimodal learning. Full article
(This article belongs to the Special Issue Deep Learning in Medical and Biomedical Image Processing)
Show Figures

Figure 1

31 pages, 2150 KB  
Review
The Role of MALDI-TOF Mass Spectrometry in Photodynamic Therapy: From Photosensitizer Design to Clinical Applications
by Dorota Bartusik-Aebisher, Kacper Rogóż and David Aebisher
Curr. Issues Mol. Biol. 2025, 47(10), 834; https://doi.org/10.3390/cimb47100834 - 10 Oct 2025
Abstract
Photodynamic therapy (PDT) has evolved considerably over the past decades, progressing from first-generation porphyrins to second- and third-generation photosensitizers, including nanocarrier-based systems with improved selectivity and bioavailability. In parallel, matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) has become a gold standard for [...] Read more.
Photodynamic therapy (PDT) has evolved considerably over the past decades, progressing from first-generation porphyrins to second- and third-generation photosensitizers, including nanocarrier-based systems with improved selectivity and bioavailability. In parallel, matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) has become a gold standard for the characterisation of complex biomolecules, enabling precise determination of molecular mass, purity and stability. This narrative review explores the intersection of these two fields, focusing on how MALDI-TOF MS supports the development, characterisation and clinical application of photosensitizers used in PDT. Literature searches were performed across PubMed, Web of Science, Scopus and Base-search, followed by targeted retrieval of studies on MALDI and PDT applications. Findings indicate that MALDI-TOF MS plays a crucial role at multiple stages: confirming the synthesis and chemical integrity of novel photosensitizers, monitoring their metabolic stability in biological systems and characterising photodegradation products after PDT. Moreover, MALDI imaging mass spectrometry (MALDI-IMS) enables spatial mapping of photosensitizer distribution in tissues, while rapid pathogen identification by MALDI-TOF supports antimicrobial PDT applications. Collectively, the evidence highlights that MALDI-MS is not only a tool for molecular characterisation but also a versatile analytical platform with a direct translational impact on PDT. Its integration with other omics and multimodal imaging approaches is expected to enhance the personalization and clinical effectiveness of photodynamic therapy. Full article
(This article belongs to the Section Molecular Medicine)
Show Figures

Figure 1

18 pages, 5377 KB  
Article
M3ENet: A Multi-Modal Fusion Network for Efficient Micro-Expression Recognition
by Ke Zhao, Xuanyu Liu and Guangqian Yang
Sensors 2025, 25(20), 6276; https://doi.org/10.3390/s25206276 (registering DOI) - 10 Oct 2025
Abstract
Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, [...] Read more.
Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, and limited availability of annotated data. Most existing approaches rely solely on either appearance or motion cues, thereby restricting their ability to capture expressive information fully. To overcome these limitations, we propose a lightweight multi-modal fusion network, termed M3ENet, which integrates both motion and appearance cues through early-stage feature fusion. Specifically, our model extracts horizontal, vertical, and strain-based optical flow between the onset and apex frames, alongside RGB images from the onset, apex, and offset frames. These inputs are processed by two modality-specific subnetworks, whose features are fused to exploit complementary information for robust classification. To improve generalization in low data regimes, we employ targeted data augmentation and adopt focal loss to mitigate class imbalance. Extensive experiments on five benchmark datasets, including CASME I, CASME II, CAS(ME)2, SAMM, and MMEW, demonstrate that M3ENet achieves state-of-the-art performance with high efficiency. Ablation studies and Grad-CAM visualizations further confirm the effectiveness and interpretability of the proposed architecture. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

21 pages, 14964 KB  
Article
An Automated Framework for Abnormal Target Segmentation in Levee Scenarios Using Fusion of UAV-Based Infrared and Visible Imagery
by Jiyuan Zhang, Zhonggen Wang, Jing Chen, Fei Wang and Lyuzhou Gao
Remote Sens. 2025, 17(20), 3398; https://doi.org/10.3390/rs17203398 - 10 Oct 2025
Abstract
Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. [...] Read more.
Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. While UAV-based remote sensing offers a promising alternative, the effective fusion of multi-modal data and the scarcity of labelled data for supervised model training remain significant challenges. To overcome these limitations, this paper reframes levee monitoring as an unsupervised anomaly detection task. We propose a novel, fully automated framework that unifies geophysical hazards and emergency response elements into a single analytical category of “abnormal targets” for comprehensive situational awareness. The framework consists of three key modules: (1) a state-of-the-art registration algorithm to precisely align infrared and visible images; (2) a generative adversarial network to fuse the thermal information from IR images with the textural details from visible images; and (3) an adaptive, unsupervised segmentation module where a mean-shift clustering algorithm, with its hyperparameters automatically tuned by Bayesian optimization, delineates the targets. We validated our framework on a real-world dataset collected from a levee on the Pajiang River, China. The proposed method demonstrates superior performance over all baselines, achieving an Intersection over Union of 0.348 and a macro F1-Score of 0.479. This work provides a practical, training-free solution for comprehensive levee monitoring and demonstrates the synergistic potential of multi-modal fusion and automated machine learning for disaster management. Full article
Show Figures

Figure 1

16 pages, 4268 KB  
Article
Research on the Detection Method of Flight Trainees’ Attention State Based on Multi-Modal Dynamic Depth Network
by Gongpu Wu, Changyuan Wang, Zehui Chen and Guangyi Jiang
Multimodal Technol. Interact. 2025, 9(10), 105; https://doi.org/10.3390/mti9100105 - 10 Oct 2025
Abstract
In aviation safety, pilots must efficiently process dynamic visual information and maintain a high level of attention. Any missed judgment of critical information or delay in decision-making may lead to mission failure or catastrophic consequences. Therefore, accurately detecting pilots’ attention states is the [...] Read more.
In aviation safety, pilots must efficiently process dynamic visual information and maintain a high level of attention. Any missed judgment of critical information or delay in decision-making may lead to mission failure or catastrophic consequences. Therefore, accurately detecting pilots’ attention states is the primary prerequisite for improving flight safety and performance. To better detect the attention state of pilots, this paper takes flight trainees as the research object and the simulated flight environment as the experimental background. It proposes a method for detecting the attention state of flight trainees based on a multi-modal dynamic depth network (M3D-Net). The M3D-Net architecture is a lightweight neural network architecture that integrates temporal image features, visual information features, and flight operation data features. It aligns image and text features through an attention mechanism to enhance the semantic association between modalities; it utilizes the Depth-wise Separable Convolution and LSTM (DSC-LSTM) module to model temporal information, dynamically capturing the contextual dependencies within the sequence, and achieving six-level attention state classification. This paper conducted ablation experiments to comparatively analyze the classification effects of the model and also evaluates the effectiveness of our proposed method through model evaluation metrics. Experiments show that the classification effect of the model architecture proposed in this paper reaches 97.56%, with a model size of 18.6 M. Compared with traditional algorithms, the M3D-Net architecture has better performance prospects in terms of application. Full article
Show Figures

Figure 1

15 pages, 1797 KB  
Article
Exploring AI’s Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare
by Jonathan Shapiro, Mor Atlas, Naomi Fridman, Itay Cohen, Ziad Khamaysi, Mahdi Awwad, Naomi Silverstein, Tom Kozlovsky and Idit Maharshak
Diagnostics 2025, 15(19), 2547; https://doi.org/10.3390/diagnostics15192547 - 9 Oct 2025
Abstract
Background: Papilledema, an ophthalmic finding associated with increased intracranial pressure, is often induced by dermatological medications, including corticosteroids, isotretinoin, and tetracyclines. Early detection is crucial for preventing irreversible optic nerve damage, but access to ophthalmologic expertise is often limited in rural settings. [...] Read more.
Background: Papilledema, an ophthalmic finding associated with increased intracranial pressure, is often induced by dermatological medications, including corticosteroids, isotretinoin, and tetracyclines. Early detection is crucial for preventing irreversible optic nerve damage, but access to ophthalmologic expertise is often limited in rural settings. Artificial intelligence (AI) may enable the automated and accurate detection of papilledema from fundus images, thereby supporting timely diagnosis and management. Objective: The primary objective of this study was to explore the diagnostic capability of ChatGPT-4o, a general large language model with multimodal input, in identifying papilledema from fundus photographs. For context, its performance was compared with a ResNet-based convolutional neural network (CNN) specifically fine-tuned for ophthalmic imaging, as well as with the assessments of two human ophthalmologists. The focus was on applications relevant to dermatological care in resource-limited environments. Methods: A dataset of 1094 fundus images (295 papilledema, 799 normal) was preprocessed and partitioned into a training set and a test set. The ResNet model was fine-tuned using discriminative learning rates and a one-cycle learning rate policy. GPT-4o and two human evaluators (a senior ophthalmologist and an ophthalmology resident) independently assessed the test images. Diagnostic metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and Cohen’s Kappa, were calculated for each evaluator. Results: GPT-4o, when applied to papilledema detection, achieved an overall accuracy of 85.9% with substantial agreement beyond chance (Cohen’s Kappa = 0.72), but lower specificity (78.9%) and positive predictive value (73.7%) compared to benchmark models. For context, the ResNet model, fine-tuned for ophthalmic imaging, reached near-perfect accuracy (99.5%, Kappa = 0.99), while two human ophthalmologists achieved accuracies of 96.0% (Kappa ≈ 0.92). Conclusions: This study explored the capability of GPT-4o, a large language model with multimodal input, for detecting papilledema from fundus photographs. GPT-4o achieved moderate diagnostic accuracy and substantial agreement with the ground truth, but it underperformed compared to both a domain-specific ResNet model and human ophthalmologists. These findings underscore the distinction between generalist large language models and specialized diagnostic AI: while GPT-4o is not optimized for ophthalmic imaging, its accessibility, adaptability, and rapid evolution highlight its potential as a future adjunct in clinical screening, particularly in underserved settings. These findings also underscore the need for validation on external datasets and real-world clinical environments before such tools can be broadly implemented. Full article
(This article belongs to the Special Issue AI in Dermatology)
Show Figures

Figure 1

24 pages, 4488 KB  
Review
Advances in Facial Micro-Expression Detection and Recognition: A Comprehensive Review
by Tian Shuai, Seng Beng, Fatimah Binti Khalid and Rahmita Wirza Bt O. K. Rahmat
Information 2025, 16(10), 876; https://doi.org/10.3390/info16100876 - 9 Oct 2025
Viewed by 56
Abstract
Micro-expressions are facial movements with extremely short duration and small amplitude, which can reveal an individual’s potential true emotions and have important application value in public safety, medical diagnosis, psychotherapy and business negotiations. Since micro-expressions change rapidly and are difficult to detect, manual [...] Read more.
Micro-expressions are facial movements with extremely short duration and small amplitude, which can reveal an individual’s potential true emotions and have important application value in public safety, medical diagnosis, psychotherapy and business negotiations. Since micro-expressions change rapidly and are difficult to detect, manual recognition is a significant challenge, so the development of automatic recognition systems has become a research hotspot. This paper reviews the development history and research status of micro-expression recognition and systematically analyzes the two main branches of micro-expression analysis: micro-expression detection and micro-expression recognition. In terms of detection, the methods are divided into three categories based on time features, feature changes and deep features according to different feature extraction methods; in terms of recognition, traditional methods based on texture and optical flow features, as well as deep learning-based methods that have emerged in recent years, including motion unit, keyframe and transfer learning strategies, are summarized. This paper also summarizes commonly used micro-expression datasets and facial image preprocessing techniques and evaluates and compares mainstream methods through multiple experimental indicators. Although significant progress has been made in this field in recent years, it still faces challenges such as data scarcity, class imbalance and unstable recognition accuracy. Future research can further combine multimodal emotional information, enhance data generalization capabilities, and optimize deep network structures to promote the widespread application of micro-expression recognition in practical scenarios. Full article
Show Figures

Figure 1

12 pages, 1463 KB  
Article
Retrieval-Augmented Vision–Language Agents for Child-Centered Encyclopedia Learning
by Jing Du, Wenhao Liu, Jingyi Ye, Dibin Zhou and Fuchang Liu
Appl. Sci. 2025, 15(19), 10821; https://doi.org/10.3390/app151910821 - 9 Oct 2025
Viewed by 87
Abstract
This study introduces an Encyclopedic Agent for children’s learning that integrates multimodal retrieval with retrieval-augmented generation (RAG). To support this framework, we construct a dataset of 9524 Wikipedia pages covering 935 encyclopedia topics, each converted into images with associated topical queries and explanations. [...] Read more.
This study introduces an Encyclopedic Agent for children’s learning that integrates multimodal retrieval with retrieval-augmented generation (RAG). To support this framework, we construct a dataset of 9524 Wikipedia pages covering 935 encyclopedia topics, each converted into images with associated topical queries and explanations. Based on this dataset, we fine-tune SigLIP, a vision–language retrieval model, using LoRA adaptation on 8484 training pairs, with 1040 reserved for testing. Experimental results show that the fine-tuned SigLIP significantly outperforms baseline models such as ColPali in both accuracy and latency, enabling efficient and precise document-image retrieval. Combined with GPT-5 for response generation, the Encyclopedic Agent delivers illustrated, interactive Q&A that is more accessible and engaging for children compared to traditional text-only methods. These findings highlight the feasibility of applying multimodal retrieval and RAG to educational agents, offering new possibilities for personalized, child-centered learning in domains such as science, history, and the arts. Full article
(This article belongs to the Special Issue Applications of Digital Technology and AI in Educational Settings)
Show Figures

Figure 1

31 pages, 3160 KB  
Article
Multimodal Image Segmentation with Dynamic Adaptive Window and Cross-Scale Fusion for Heterogeneous Data Environments
by Qianping He, Meng Wu, Pengchang Zhang, Lu Wang and Quanbin Shi
Appl. Sci. 2025, 15(19), 10813; https://doi.org/10.3390/app151910813 - 8 Oct 2025
Viewed by 146
Abstract
Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and [...] Read more.
Multi-modal image segmentation is a key task in various fields such as urban planning, infrastructure monitoring, and environmental analysis. However, it remains challenging due to complex scenes, varying object scales, and the integration of heterogeneous data sources (such as RGB, depth maps, and infrared). To address these challenges, we proposed a novel multi-modal segmentation framework, DyFuseNet, which features dynamic adaptive windows and cross-scale feature fusion capabilities. This framework consists of three key components: (1) Dynamic Window Module (DWM), which uses dynamic partitioning and continuous position bias to adaptively adjust window sizes, thereby improving the representation of irregular and fine-grained objects; (2) Scale Context Attention (SCA), a hierarchical mechanism that associates local details with global semantics in a coarse-to-fine manner, enhancing segmentation accuracy in low-texture or occluded regions; and (3) Hierarchical Adaptive Fusion Architecture (HAFA), which aligns and fuses features from multiple modalities through shallow synchronization and deep channel attention, effectively balancing complementarity and redundancy. Evaluated on benchmark datasets (such as ISPRS Vaihingen and Potsdam), DyFuseNet achieved state-of-the-art performance, with mean Intersection over Union (mIoU) scores of 80.40% and 80.85%, surpassing MFTransNet by 1.91% and 1.77%, respectively. The model also demonstrated strong robustness in challenging scenes (such as building edges and shadowed objects), achieving an average F1 score of 85% while maintaining high efficiency (26.19 GFLOPs, 30.09 FPS), making it suitable for real-time deployment. This work presents a practical, versatile, and computationally efficient solution for multi-modal image analysis, with potential applications beyond remote sensing, including smart monitoring, industrial inspection, and multi-source data fusion tasks. Full article
(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications: 2nd Edition)
Show Figures

Figure 1

19 pages, 3520 KB  
Article
Multifactorial Imaging Analysis as a Platform for Studying Cellular Senescence Phenotypes
by Shatalova Rimma, Larin Ilya and Shevyrev Daniil
J. Imaging 2025, 11(10), 351; https://doi.org/10.3390/jimaging11100351 - 8 Oct 2025
Viewed by 175
Abstract
Cellular senescence is a heterogeneous and dynamic state characterised by stable proliferation arrest, macromolecular damage and metabolic remodelling. Although markers such as SA-β-galactosidase staining, yH2AX foci and p53 activation are widely used as de facto standards, they are imperfect and differ in terms [...] Read more.
Cellular senescence is a heterogeneous and dynamic state characterised by stable proliferation arrest, macromolecular damage and metabolic remodelling. Although markers such as SA-β-galactosidase staining, yH2AX foci and p53 activation are widely used as de facto standards, they are imperfect and differ in terms of sensitivity, specificity and dependence on context. We present a multifactorial imaging platform integrating scanning electron, flow cytometry and high-resolution confocal microscopy. This allows us to identify senescence phenotypes in three in vitro models: replicative ageing via serial passaging; dose-graded genotoxic stress under serum deprivation; and primary fibroblasts from young and elderly donors. We present a multimodal imaging framework to characterise senescence-associated phenotypes by integrating LysoTracker and MitoTracker microscopy and SA-β-gal/FACS, p16INK4a immunostaining provides independent confirmation of proliferative arrest. Combined nutrient deprivation and genotoxic challenge elicited the most pronounced and concordant organelle alterations relative to single stressors, aligning with age-donor differences. Our approach integrates structural and functional readouts across modalities, reducing the impact of phenotypic heterogeneity and providing reproducible multiparametric endpoints. Although the framework focuses on a robustly validated panel of phenotypes, it is extensible by nature and sensitive to distributional shifts. This allows both drug-specific redistribution of established markers and the emergence of atypical or transient phenotypes to be detected. This flexibility renders the platform suitable for comparative studies and the screening of senolytics and geroprotectors, as well as for refining the evolving landscape of senescence-associated states. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

21 pages, 6844 KB  
Article
MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation
by Jingting Qiu, Wei Chang, Wei Ren, Shanshan Hou and Ronghao Yang
Sensors 2025, 25(19), 6225; https://doi.org/10.3390/s25196225 - 8 Oct 2025
Viewed by 251
Abstract
Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic [...] Read more.
Accurate semantic segmentation of high-resolution remote sensing imagery is challenged by substantial intra-class variability, inter-class similarity, and the limitations of single-modality data. This paper proposes MMFNet, a novel multimodal fusion network that leverages the Mamba architecture to efficiently capture long-range dependencies for semantic segmentation tasks. MMFNet adopts a dual-encoder design, combining ResNet-18 for local detail extraction and VMamba for global contextual modelling, striking a balance between segmentation accuracy and computational efficiency. A Multimodal Feature Fusion Block (MFFB) is introduced to effectively integrate complementary information from optical imagery and digital surface models (DSMs), thereby enhancing multimodal feature interaction and improving segmentation accuracy. Furthermore, a frequency-aware upsampling module (FreqFusion) is incorporated in the decoder to enhance boundary delineation and recover fine spatial details. Extensive experiments on the ISPRS Vaihingen and Potsdam benchmarks demonstrate that MMFNet achieves mean IoU scores of 83.50% and 86.06%, outperforming eight state-of-the-art methods while maintaining relatively low computational complexity. These results highlight MMFNet’s potential for efficient and accurate multimodal semantic segmentation in remote sensing applications. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

38 pages, 1954 KB  
Review
Bridge Structural Health Monitoring: A Multi-Dimensional Taxonomy and Evaluation of Anomaly Detection Methods
by Omar S. Sonbul and Muhammad Rashid
Buildings 2025, 15(19), 3603; https://doi.org/10.3390/buildings15193603 - 8 Oct 2025
Viewed by 134
Abstract
Bridges are critical to national mobility and economic flow, making dependable structural health monitoring (SHM) systems essential for safety and durability. However, the SHM data quality is often affected by sensor faults, transmission noise, and environmental interference. To address these issues, anomaly detection [...] Read more.
Bridges are critical to national mobility and economic flow, making dependable structural health monitoring (SHM) systems essential for safety and durability. However, the SHM data quality is often affected by sensor faults, transmission noise, and environmental interference. To address these issues, anomaly detection methods are widely adopted. Despite their wide use and variety, there is a lack of systematic evaluation that comprehensively compares these techniques. Existing reviews are often constrained by limited scope, minimal comparative synthesis, and insufficient focus on real-time performance and multivariate analysis. Consequently, this systematic literature review (SLR) analyzes 36 peer-reviewed studies published between 2020 and 2025, sourced from eight reputable databases. Unlike prior reviews, this work presents a novel four-dimensional taxonomy covering real-time capability, multivariate support, analysis domain, and detection methods. Moreover, detection methods are further classified into three categories: distance-based, predictive, and image processing. A comparative evaluation of the reviewed detection methods is performed across five key dimensions: robustness, scalability, real-world deployment feasibility, interpretability, and data dependency. Findings reveal that image-processing methods are the most frequently applied (22 studies), providing high detection accuracy but facing scalability challenges due to computational intensity. Predictive models offer a trade-off between interpretability and performance, whereas distance-based methods remain less common due to their sensitivity to dimensionality and environmental factors. Notably, only 11 studies support real-time anomaly detection, and multivariate analysis is often overlooked. Moreover, time-domain signal processing dominates the field, while frequency and time-frequency domain methods remain rare despite their potential. Finally, this review highlights key challenges such as scalability, interpretability, robustness, and practicality of current models. Further research should focus on developing adaptive and interpretable anomaly detection frameworks that are efficient enough for real-world SHM deployment. These models should combine multi-modal strategies, handle uncertainty, and follow standardized evaluation protocols across varied monitoring environments. Full article
(This article belongs to the Special Issue Structural Health Monitoring Through Advanced Artificial Intelligence)
Show Figures

Figure 1

19 pages, 6029 KB  
Review
Beyond Nerve Entrapment: A Narrative Review of Muscle–Tendon Pathologies in Deep Gluteal Syndrome
by Yong Hyun Yoon, Ji Hyo Hwang, Ho won Lee, MinJae Lee, Chanwool Park, Jonghyeok Lee, Seungbeom Kim, JaeYoung Lee, Jeimylo C. de Castro, King Hei Stanley Lam, Teinny Suryadi and Kwan Hyun Youn
Diagnostics 2025, 15(19), 2531; https://doi.org/10.3390/diagnostics15192531 - 7 Oct 2025
Viewed by 362
Abstract
Deep Gluteal Syndrome (DGS) has traditionally been defined as a clinical entity caused by sciatic nerve (SN) entrapment. However, recent anatomical and imaging studies suggest that muscle- and tendon-origin pathologies—including enthesopathy—may also serve as primary pain generators. This narrative review aims to broaden [...] Read more.
Deep Gluteal Syndrome (DGS) has traditionally been defined as a clinical entity caused by sciatic nerve (SN) entrapment. However, recent anatomical and imaging studies suggest that muscle- and tendon-origin pathologies—including enthesopathy—may also serve as primary pain generators. This narrative review aims to broaden the current understanding of DGS by integrating muscle and tendon pathologies into its diagnostic and therapeutic framework. The literature was selectively reviewed from PubMed, Cochrane Library, Google Scholar, PEDro, and Web of Science to identify clinically relevant studies illustrating evolving concepts in DGS pathophysiology, diagnosis, and management. We review clinical features and diagnostic tools including physical examination, MRI, and dynamic ultrasonography, with special attention to deep external rotator enthesopathy. Treatment strategies are summarized, including conservative therapy, ultrasound-guided injections, hydrodissection, and prolotherapy. This narrative synthesis underscores the importance of recognizing muscle-origin enthesopathy and soft-tissue pathologies as significant contributors to DGS. A pathophysiology-based, multimodal approach is essential for accurate diagnosis and effective treatment. Full article
Show Figures

Figure 1

7 pages, 3652 KB  
Case Report
Transfemoral TAVI in a High-Risk Patient with Porcelain Aorta and Severe Subrenal Abdominal Aortic Stenosis: A Case Report
by Anees Al Jabri, Marcello Ravani, Giuseppe Trianni, Tommaso Gasbarri, Marta Casula and Sergio Berti
J. Cardiovasc. Dev. Dis. 2025, 12(10), 396; https://doi.org/10.3390/jcdd12100396 - 7 Oct 2025
Viewed by 153
Abstract
Aortic stenosis (AS) is a common degenerative valvular disease in elderly patients, causing obstruction of left ventricular outflow and presenting with symptoms such as angina, syncope, and heart failure. Although surgical aortic valve replacement (SAVR) remains the gold standard, its high perioperative risk [...] Read more.
Aortic stenosis (AS) is a common degenerative valvular disease in elderly patients, causing obstruction of left ventricular outflow and presenting with symptoms such as angina, syncope, and heart failure. Although surgical aortic valve replacement (SAVR) remains the gold standard, its high perioperative risk in frail patients has led to the adoption of transcatheter aortic valve implantation (TAVI) as a less invasive and effective alternative. The transfemoral (TF) access route is generally preferred, but severe peripheral arterial disease may limit its feasibility. We report the case of a 71-year-old woman with critical AS complicated by multiple comorbidities, including extensive vascular calcifications, a porcelain aorta, and significant subrenal abdominal aortic stenosis. Multimodal imaging, including computed tomography, was essential for procedural planning, revealing complex iliofemoral anatomy unsuitable for conventional device passage without intervention. Intravascular lithotripsy (IVL) was used to disrupt calcific plaques and facilitate safe vascular access. The TAVI procedure was successfully performed under local anesthesia via TF access using a 65 cm GORE® DRYSEAL Flex Introducer Sheath (W. L. Gore & Associates, Flagstaff, AZ, USA) (18-Fr). After balloon valvuloplasty performed over a SAFARI2™ Pre-Shaped TAVI Guidewire, Extra Small (Boston Scientific, Marlborough, MA, USA) Curve in the left ventricle, a self-expanding Medtronic Evolut™ FX 26 (Medtronic, Minneapolis, MN, USA)mm transcatheter valve was implanted. Postoperative imaging confirmed optimal valve function and vascular integrity without complications. This case highlights the role of IVL as an innovative adjunctive technique enabling TF-TAVI in patients with challenging vascular anatomy, thereby expanding treatment options for high-risk individuals with severe AS. Full article
(This article belongs to the Special Issue Transcatheter Aortic Valve Implantation (TAVI): 3rd Edition)
Show Figures

Figure 1

Back to TopTop