Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,697)

Search Parameters:
Keywords = multimodal features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
50 pages, 12478 KB  
Article
CorbuAI: A Multimodal Artificial Intelligence-Based Architectural Design (AIAD) Framework for Computer-Generated Residential Building Design
by Yafei Zhao, Ziyi Ying, Wanqing Zhao, Pengpeng Zhang, Rong Xia, Xuepeng Shi, Yanfei Ning, Mengdan Zhang, Xiaoju Li and Yanjun Su
Buildings 2026, 16(3), 668; https://doi.org/10.3390/buildings16030668 - 5 Feb 2026
Abstract
Integrating artificial intelligence (AI) into residential architectural design faces challenges due to fragmented workflows and the lack of localized datasets. This study proposes the CorbuAI framework, hypothesizing that a multimodal AI system integrating Pix2pix-GAN and Stable Diffusion (SD) can streamline the transition from [...] Read more.
Integrating artificial intelligence (AI) into residential architectural design faces challenges due to fragmented workflows and the lack of localized datasets. This study proposes the CorbuAI framework, hypothesizing that a multimodal AI system integrating Pix2pix-GAN and Stable Diffusion (SD) can streamline the transition from floor plan generation to elevation and interior design within a specific regional context. We developed a custom dataset featuring 2335 manually refined Chinese residential floor plans and 1570 elevation images. The methodology employs a specialized U-Net V2.0 generator for functional layout synthesis and an SD-based model for stylistic transfer and elevation rendering. Evaluation was conducted through both subjective professional scoring and objective metrics, including the Perceptual Hash Algorithm (pHash). Results demonstrate that CorbuAI achieves high accuracy in spatial allocation (scoring 0.88/1.0) and high structural consistency in elevation generation (mean pHash similarity of 0.82). The framework significantly reduces design iteration time while maintaining professional aesthetic standards. This research provides a scalable AI-driven methodology for automated residential design, bridging the gap between schematic layouts and visual representation in the Chinese architectural context. Full article
(This article belongs to the Special Issue Data-Driven Intelligence for Sustainable Urban Renewal)
18 pages, 2458 KB  
Article
An Interpretable CPU Scheduling Method Based on a Multiscale Frequency-Domain Convolutional Transformer and a Dendritic Network
by Xiuwei Peng, Honghua Wang, Guohui Zhou, Jun Jiang, Hao Fang, Zhengxing Wu and Xiaohui Li
Electronics 2026, 15(3), 693; https://doi.org/10.3390/electronics15030693 - 5 Feb 2026
Abstract
In modern operating systems, CPU scheduling policy selection and evaluation still rely mainly on heuristic methods, especially at the single-processor level or the abstract ready-queue level, and there is still a lack of systematic modeling and interpretable analysis for complex workload patterns. Traditional [...] Read more.
In modern operating systems, CPU scheduling policy selection and evaluation still rely mainly on heuristic methods, especially at the single-processor level or the abstract ready-queue level, and there is still a lack of systematic modeling and interpretable analysis for complex workload patterns. Traditional approaches are easy to implement and respond quickly in specific scenarios, but they often fail to remain stable under dynamic workloads and high-dimensional features, which can harm generalization. In this work, we build a simulation dataset that covers five typical scheduling policies, redesign a deep learning framework for scheduling policy identification, and propose the MCFCTransformer-DD model. The model extends the standard Transformer with multiscale convolution, frequency-domain augmentation, and cross-attention to capture both low-frequency and high-frequency signals, learn local and global patterns, and model multivariate dependencies. We also introduce a Dendrite Network, or DD, into scheduling policy identification and decision support for the first time, and its gated dendritic structure provides a more transparent nonlinear decision boundary that reduces the black-box nature of deep models and helps mitigate overfitting. Experiments show that MCFCTransformer-DD achieves 94.50% accuracy, a 94.65% F1 score, and an AUROC of 1.00, which indicates strong policy identification performance and strong potential for decision support. Full article
Show Figures

Figure 1

17 pages, 1778 KB  
Article
Differentiating Borderline from Malignant Ovarian-Adnexal Tumours: A Multimodal Predictive Approach Joining Clinical, Analytic, and MRI Parameters
by Lledó Cabedo, Carmen Sebastià, Meritxell Munmany, Adela Saco, Eduardo Gallardo, Olatz Sáenz de Argandoña, Gonzalo Peón, Josep Lluís Carrasco and Carlos Nicolau
Cancers 2026, 18(3), 516; https://doi.org/10.3390/cancers18030516 - 4 Feb 2026
Abstract
Objectives: To improve the differentiation of borderline ovarian-adnexal tumours (BOTs) from malignant ovarian-adnexal masses, most of which fall into the indeterminate O-RADS MRI 4 category, by developing a multimodal predictive model that integrates clinical, analytic, and MRI parameters. Methods: This retrospective, single-centre study [...] Read more.
Objectives: To improve the differentiation of borderline ovarian-adnexal tumours (BOTs) from malignant ovarian-adnexal masses, most of which fall into the indeterminate O-RADS MRI 4 category, by developing a multimodal predictive model that integrates clinical, analytic, and MRI parameters. Methods: This retrospective, single-centre study included 248 women who underwent standardised MRI for ovarian-adnexal mass characterisation between 2019 and 2024. Of these, 201 had true ovarian-adnexal masses (114 benign, 22 borderline, and 65 malignant), confirmed by histopathology or stability after ≥12-month follow-up. Forty-one clinical, laboratory, and imaging variables were initially assessed, and after a bivariate evaluation, 18 final predictors with clinical relevance were selected for model construction with thresholds learned from the data. A classification and regression tree (CART) model (“Full Model”) was applied as a second-stage tool after O-RADS MRI scoring, using 10-fold cross-validation to prevent overfitting. A pruned “Simplified Model” was also derived to enhance interpretability. Results: O-RADS MRI performed well at the extremes (scores 2–3 and 5) but showed limited discrimination between BOTs and malignancies within category 4 (PPV for borderline = 0.50). The decision-tree models significantly improved diagnostic performance, increasing overall accuracy from 0.856 with O-RADS MRI alone to 0.905 (Simplified Model) and 0.955 (Full Model). The PPV for BOTs within the intermediate O-RADS MRI 4 category increased from 0.49 with O-RADS MRI alone to 0.77 and 0.90 with the simplified and full models, respectively, while maintaining high accuracy for benign and malignant lesions. Conclusions: In this retrospective single-centre cohort, the addition of an interpretable rule-based predictive model as a second-line tool within O-RADS MRI category 4 was associated with improved discrimination between borderline and invasive malignant ovarian-adnexal tumours. These findings suggest that multimodal integration of clinical, laboratory, and MRI features may help refine risk stratification in indeterminate cases; however, external validation in prospective multicentre cohorts is required before clinical implementation. Full article
(This article belongs to the Special Issue Gynecological Cancer: Prevention, Diagnosis, Prognosis and Treatment)
22 pages, 1982 KB  
Article
Enhanced 3D DenseNet with CDC for Multimodal Brain Tumor Segmentation
by Bekir Berkcan and Temel Kayıkçıoğlu
Appl. Sci. 2026, 16(3), 1572; https://doi.org/10.3390/app16031572 - 4 Feb 2026
Abstract
Precise tumor segmentation in multimodal MRI is crucial for glioma diagnosis and treatment planning; yet, deep learning models still struggle with irregular boundaries and severe class imbalance under computational constraints. An Enhanced 3D DenseNet with CDC architecture was proposed, integrating Central Difference Convolution, [...] Read more.
Precise tumor segmentation in multimodal MRI is crucial for glioma diagnosis and treatment planning; yet, deep learning models still struggle with irregular boundaries and severe class imbalance under computational constraints. An Enhanced 3D DenseNet with CDC architecture was proposed, integrating Central Difference Convolution, attention gates, and Atrous Spatial Pyramid Pooling for brain tumor segmentation on the BraTS 2023-GLI dataset. CDC layers enhance boundary sensitivity by combining intensity-level semantics and gradient-level features. Attention gates selectively emphasize relevant encoder features during skip connections, whereas the ASPP captures the multi-scale context with dilation rates. A hybrid loss function spanning three levels was introduced, consisting of a region-based Dice loss for volumetric overlap, a GPU-native 3D Sobel boundary loss for edge precision, and a class-weighted focal loss for handling class imbalance. The proposed model achieved a mean Dice score of 91.30% (ET: 87.84%, TC: 92.73%, WT: 93.34%) on the test set. Notably, these results were achieved with approximately 3.7 million parameters, representing a 17–76x reduction compared to the 50–200 million parameters required by transformer-based approaches. Enhanced 3D DenseNet with CDC architecture demonstrates that the integration of gradient-sensitive convolutions, attention mechanisms, multi-scale feature extraction, and multi-level loss optimization achieves competitive segmentation performance with significantly reduced computational requirements. Full article
Show Figures

Figure 1

17 pages, 784 KB  
Article
A Wideband Oscillation Classification Method Based on Multimodal Feature Fusion
by Yingmin Zhang, Yixiong Liu, Zongsheng Zheng and Shilin Gao
Electronics 2026, 15(3), 682; https://doi.org/10.3390/electronics15030682 - 4 Feb 2026
Abstract
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges [...] Read more.
With the increasing penetration of renewable energy sources and power-electronic devices, modern power systems exhibit pronounced wideband oscillation characteristics with large frequency spans, strong modal coupling, and significant time-varying behaviors. Accurate identification and classification of wideband oscillation patterns have therefore become critical challenges for ensuring the secure and stable operation of “dual-high” power systems. Existing methods based on signal processing or single-modality deep-learning models often fail to fully exploit the complementary information embedded in heterogeneous data representations, resulting in limited performance when dealing with complex oscillation patterns.To address these challenges, this paper proposes a multimodal attention-based fusion network for wideband oscillation classification. A dual-branch deep-learning architecture is developed to process Gramian Angular Difference Field images and raw time-series signals in parallel, enabling collaborative extraction of global structural features and local temporal dynamics. An improved Inception module is employed in the image branch to enhance multi-scale spatial feature representation, while a gated recurrent unit network is utilized in the time-series branch to model dynamic evolution characteristics. Furthermore, an attention-based fusion mechanism is introduced to adaptively learn the relative importance of different modalities and perform dynamic feature aggregation. Extensive experiments are conducted using a dataset constructed from mathematical models and engineering-oriented simulations. Comparative studies and ablation studies demonstrate that the proposed method significantly outperforms conventional signal-processing-based approaches and single-modality deep-learning models in terms of classification accuracy, robustness, and generalization capability. The results confirm the effectiveness of multimodal feature fusion and attention mechanisms for accurate wideband oscillation classification, providing a promising solution for advanced power system monitoring and analysis. Full article
Show Figures

Figure 1

23 pages, 15011 KB  
Article
Hybrid Mamba–Graph Fusion with Multi-Stage Pseudo-Label Refinement for Semi-Supervised Hyperspectral–LiDAR Classification
by Khanzada Muzammil Hussain, Keyun Zhao, Sachal Perviaz and Ying Li
Sensors 2026, 26(3), 1005; https://doi.org/10.3390/s26031005 - 3 Feb 2026
Abstract
Semi-supervised joint classification of Hyperspectral Images (HSIs) and LiDAR-derived Digital Surface Models (DSMs) remains challenging due to scarcity of labeled pixels, strong intra-class variability, and the heterogeneous nature of spectral and elevation features. In this work, we propose a Hybrid Mamba–Graph Fusion Network [...] Read more.
Semi-supervised joint classification of Hyperspectral Images (HSIs) and LiDAR-derived Digital Surface Models (DSMs) remains challenging due to scarcity of labeled pixels, strong intra-class variability, and the heterogeneous nature of spectral and elevation features. In this work, we propose a Hybrid Mamba–Graph Fusion Network (HMGF-Net) with Multi-Stage Pseudo-Label Refinement (MS-PLR) for semi-supervised hyperspectral–LiDAR classification. The framework employs a spectral–spatial HSI backbone combining 3D–2D convolutions, a compact LiDAR CNN encoder, Mamba-style state-space sequence blocks for long-range spectral and cross-modal dependency modeling, and a graph fusion module that propagates information over a heterogeneous pixel graph. Semi-supervised learning is realized via a three-stage pseudolabeling pipeline that progressively filters, smooths, and re-weights pseudolabels based on prediction confidence, spatial–spectral consistency, and graph neighborhood agreement. We validate HMGF-Net on three benchmark hyperspectral–LiDAR datasets. Compared with a set of eight state-of-the-art (SOTA) baselines, including 3D-CNNs, SSRN, HybridSN, transformer-based models such as SpectralFormer, multimodal CNN–GCN fusion networks, and recent semi-supervised methods, the proposed approach delivers consistent gains in overall accuracy, average accuracy, and Cohen’s kappa, especially in low-label regimes (10% labeled pixels). The results highlight that the synergy between sequence modeling and graph reasoning in combination with carefully designed pseudolabel refinement is essential to maximizing the benefit of abundant unlabeled samples in multimodal remote sensing scenarios. Full article
(This article belongs to the Special Issue Progress in LiDAR Technologies and Applications)
21 pages, 32717 KB  
Article
Integrative Cross-Modal Fusion of Preoperative MRI and Histopathological Signatures for Improved Survival Prediction in Glioblastoma
by Tianci Liu, Yao Zheng, Chengwei Chen, Jie Wei, Dong Huang, Yuefei Feng and Yang Liu
Bioengineering 2026, 13(2), 179; https://doi.org/10.3390/bioengineering13020179 - 3 Feb 2026
Abstract
Glioblastoma (GBM) is the most common and aggressive primary brain tumor in adults, with a median overall survival of fewer than 15 months despite standard-of-care treatment. Accurate preoperative prognostication is essential for personalized treatment planning; however, existing approaches rely primarily on magnetic resonance [...] Read more.
Glioblastoma (GBM) is the most common and aggressive primary brain tumor in adults, with a median overall survival of fewer than 15 months despite standard-of-care treatment. Accurate preoperative prognostication is essential for personalized treatment planning; however, existing approaches rely primarily on magnetic resonance imaging (MRI) and often overlook the rich histopathological information contained in postoperative whole-slide images (WSIs). The inherent spatiotemporal gap between preoperative MRI and postoperative WSIs substantially hinders effective multimodal integration. To address this limitation, we propose a contrastive-learning-based Imaging–Pathology Synergistic Alignment (CL-IPSA) framework that aligns MRI and WSI data within a shared embedding space, thereby establishing robust cross-modal semantic correspondences. We further construct a cross-modal mapping library that enables patients with MRI-only data to obtain proxy pathological representations via nearest-neighbor retrieval for joint survival modeling. Experiments across multiple datasets demonstrate that incorporating proxy WSI features consistently enhances prediction performance: various convolutional neural networks (CNNs) achieve an average AUC improvement of 0.08–0.10 on the validation cohort and two independent test sets, with SEResNet34 yielding the best performance (AUC = 0.836). Our approach enables non-invasive, preoperative integration of radiological and pathological semantics, substantially improving GBM survival prediction without requiring any additional invasive procedures. Full article
(This article belongs to the Special Issue Modern Medical Imaging in Disease Diagnosis Applications)
Show Figures

Graphical abstract

23 pages, 1232 KB  
Review
Central Nervous System Involvement in Acute Myeloid Leukemia: From Pathophysiology to Neuroradiologic Features and the Emerging Role of Artificial Intelligence
by Rafail C. Christodoulou, Rafael Pitsillos, Vasileia Petrou, Maria Daniela Sarquis, Platon S. Papageorgiou and Elena E. Solomou
J. Clin. Med. 2026, 15(3), 1187; https://doi.org/10.3390/jcm15031187 - 3 Feb 2026
Viewed by 54
Abstract
Background/Objectives: Central nervous system (CNS) involvement in acute myeloid leukemia (AML) is a rare but important complication linked to poor outcomes. Diagnosing it is difficult because neurological symptoms are often subtle or nonspecific, and conventional cytology and imaging have limitations. This review [...] Read more.
Background/Objectives: Central nervous system (CNS) involvement in acute myeloid leukemia (AML) is a rare but important complication linked to poor outcomes. Diagnosing it is difficult because neurological symptoms are often subtle or nonspecific, and conventional cytology and imaging have limitations. This review summarizes current evidence on the neuroradiologic features of CNS infiltration in AML and explores the growing role of artificial intelligence (AI) in enhancing detection and characterization. Methods: A thorough narrative review was conducted using PubMed, Scopus, and Embase, employing key terms related to AML, CNS involvement, MRI, CT, PET, AI, machine learning, deep learning, and radiomics. Of several thousand records, 138 relevant studies were selected and analyzed across four main areas: neuroradiologic patterns, imaging biomarkers, AI and radiomics applications, and emerging computational trends. Results: Imaging findings in AML mainly include myeloid sarcomas (isointense on T1, hyperintense on T2/FLAIR, restricted diffusion) and leptomeningeal enhancement. Secondary ischemic or hemorrhagic lesions may indicate brain leukocytosis. MRI proved more sensitive than CT, while PET/CT helped detect extramedullary disease. Recent AI and radiomics models showed high tumor classification and prognosis accuracy in similar CNS conditions, indicating significant potential for application in AML-CNS. Conclusions: Combining AI-based image analysis with multimodal neuroimaging could significantly improve diagnostic accuracy and personalized treatment for CNS involvement in AML. Progress is still challenged by the rarity of the condition and the lack of large, annotated datasets. Full article
Show Figures

Figure 1

22 pages, 2309 KB  
Article
Feature Fusion-Based Cross-Modal Proxy Hashing Retrieval
by Yan Zhao and Huaiying Li
Appl. Sci. 2026, 16(3), 1532; https://doi.org/10.3390/app16031532 - 3 Feb 2026
Viewed by 47
Abstract
Due to its cost-effective and high-efficiency retrieval advantages, deep hashing has attracted extensive attention in the field of cross-modal retrieval. However, despite significant progress in existing deep cross-modal hashing methods, several limitations persist: they struggle to establish consistent mapping relationships across different modalities, [...] Read more.
Due to its cost-effective and high-efficiency retrieval advantages, deep hashing has attracted extensive attention in the field of cross-modal retrieval. However, despite significant progress in existing deep cross-modal hashing methods, several limitations persist: they struggle to establish consistent mapping relationships across different modalities, fail to effectively bridge the semantic gap between heterogeneous data, and consequently suffer from semantic information loss and incomplete semantic understanding during cross-modal learning. To address these challenges, this paper proposes a Feature Fusion-based Cross-modal Proxy Hashing (FFCPH) retrieval method. This approach integrates multi-modal semantic information through a feature fusion module to generate discriminative and robust fused features. Furthermore, a novel joint loss function, which comprises cross-modal proxy loss, cross-modal irrelevant loss, and cross-modal consistency loss, is designed to preserve inter-sample similarity ranking accuracy and mitigate the semantic gap across modalities. Experimental results on three widely used benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches in retrieval performance. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 884 KB  
Article
AI-Driven Typography: A Human-Centered Framework for Generative Font Design Using Large Language Models
by Yuexi Dong and Mingyong Gao
Information 2026, 17(2), 150; https://doi.org/10.3390/info17020150 - 3 Feb 2026
Viewed by 48
Abstract
This paper presents a human-centered, AI-driven framework for font design that reimagines typography generation as a collaborative process between humans and large language models (LLMs). Unlike conventional pixel- or vector-based approaches, our method introduces a Continuous Style Projector that maps visual features from [...] Read more.
This paper presents a human-centered, AI-driven framework for font design that reimagines typography generation as a collaborative process between humans and large language models (LLMs). Unlike conventional pixel- or vector-based approaches, our method introduces a Continuous Style Projector that maps visual features from a pre-trained ResNet encoder into the LLM’s latent space, enabling zero-shot style interpolation and fine-grained control of stroke and serif attributes. To model handwriting trajectories more effectively, we employ a Mixture Density Network (MDN) head, allowing the system to capture multi-modal stroke distributions beyond deterministic regression. Experimental results show that users can interactively explore, mix, and generate new typefaces in real time, making the system accessible for both experts and non-experts. The approach reduces reliance on commercial font licenses and supports a wide range of applications in education, design, and digital communication. Overall, this work demonstrates how LLM-based generative models can enhance creativity, personalization, and cultural expression in typography, contributing to the broader field of AI-assisted design. Full article
Show Figures

Figure 1

21 pages, 4327 KB  
Article
Engineering-Oriented Ultrasonic Decoding: An End-to-End Deep Learning Framework for Metal Grain Size Distribution Characterization
by Le Dai, Shiyuan Zhou, Yuhan Cheng, Lin Wang, Yuxuan Zhang and Heng Zhi
Sensors 2026, 26(3), 958; https://doi.org/10.3390/s26030958 - 2 Feb 2026
Viewed by 153
Abstract
Grain size is critical for metallic material performance, yet conventional ultrasonic methods rely on strong model assumptions and exhibit limited adaptability. We propose a deep learning architecture that uses multimodal ultrasonic features with spatial coding to predict the grain size distribution of GH4099. [...] Read more.
Grain size is critical for metallic material performance, yet conventional ultrasonic methods rely on strong model assumptions and exhibit limited adaptability. We propose a deep learning architecture that uses multimodal ultrasonic features with spatial coding to predict the grain size distribution of GH4099. A-scan signals from C-scan measurements are converted to time–frequency representations and fed to an encoder–decoder model that combines a dual convolutional compression network with a fully connected decoder. A thickness-encoding branch enables feature decoupling under physical constraints, and an elliptic spatial fusion strategy refines predictions. Experiments show mean and standard deviation MAEs of 1.08 and 0.84 μm, respectively, with a KL divergence of 0.0031, outperforming attenuation- and velocity-based methods. Input-specificity experiments further indicate that transfer learning calibration quickly restores performance under new conditions. These results demonstrate a practical path for integrating deep learning with ultrasonic inspection for accurate, adaptable grain-size characterization. Full article
(This article belongs to the Special Issue Ultrasonic Sensors and Ultrasonic Signal Processing)
Show Figures

Graphical abstract

26 pages, 2808 KB  
Article
An Automated ECG-PCG Coupling Analysis System with LLM-Assisted Semantic Reporting for Community and Home-Based Cardiac Monitoring
by Yi Tang, Fei Cong, Yi Li and Ping Shi
Algorithms 2026, 19(2), 117; https://doi.org/10.3390/a19020117 - 2 Feb 2026
Viewed by 108
Abstract
Objective: Cardiac monitoring in community and home environments requires automated operation, cross-state robustness, and interpretable feedback under resource-constrained and uncontrolled conditions. Unlike accuracy-driven ECG–PCG studies focusing on diagnostic performance, this work emphasizes systematic modeling of cardiac electromechanical coupling for long-term monitoring and engineering [...] Read more.
Objective: Cardiac monitoring in community and home environments requires automated operation, cross-state robustness, and interpretable feedback under resource-constrained and uncontrolled conditions. Unlike accuracy-driven ECG–PCG studies focusing on diagnostic performance, this work emphasizes systematic modeling of cardiac electromechanical coupling for long-term monitoring and engineering feasibility validation. Methods: An automated ECG–PCG coupling analysis and semantic reporting framework is proposed, covering signal preprocessing, event detection and calibration, multimodal coupling feature construction, and rule-constrained LLM-assisted interpretation. Electrical events from ECG are used as global temporal references, while multi-stage consistency correction mechanisms are introduced to enhance the stability of mechanical event localization under noise and motion interference. A structured electromechanical feature set is constructed to support fully automated processing. Results: Experimental results demonstrate that the proposed system maintains coherent event sequences and stable coupling parameter extraction across resting, movement, and emotional stress conditions. The incorporated LLM module integrates precomputed multimodal metrics under strict constraints, improving report readability and consistency without performing autonomous medical interpretation. Conclusions: This study demonstrates the methodological feasibility of an ECG–PCG coupling analysis framework for long-term cardiac state monitoring in low-resource environments. By integrating end-to-end automation, electromechanical coupling features, and constrained semantic reporting, the proposed system provides an engineering-oriented reference for continuous cardiac monitoring in community and home settings rather than a clinical diagnostic solution. Full article
(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (4th Edition))
Show Figures

Figure 1

18 pages, 3652 KB  
Article
Optimizing Foundation Model to Enhance Surface Water Segmentation with Multi-Modal Remote Sensing Data
by Guochao Hu, Mengmeng Shao, Kaiyuan Li, Xiran Zhou and Xiao Xie
Water 2026, 18(3), 382; https://doi.org/10.3390/w18030382 - 2 Feb 2026
Viewed by 153
Abstract
Water resources are of critical importance across all ecological, social, and economic realms. Accurate extraction of water bodies is of significance to estimate the spatial coverage of water resources and to mitigate water-related disasters. Single-modal remote sensing images are often insufficient for accurate [...] Read more.
Water resources are of critical importance across all ecological, social, and economic realms. Accurate extraction of water bodies is of significance to estimate the spatial coverage of water resources and to mitigate water-related disasters. Single-modal remote sensing images are often insufficient for accurate water body extraction due to limitations in spectral information, weather conditions, and speckle noises. Furthermore, state-of-the-art deep learning models may be constrained by data extensibility, feature transferability, model scalability, and task producibility. This manuscript presents an integrated GeoAI framework that enhances foundation models for efficient water body extraction with multi-modal remote sensing images. The proposed framework consists of a data augmentation module tailored for optical and synthetic aperture radar (SAR) remote sensing images, as well as extraction modules augmented by three popular foundation models, namely SAM, SAMRS, and CROMA. Specifically, optical and SAR images are preprocessed and augmented independently, encoded through foundation model backbones, and subsequently decoded to generate water body segmentation masks under single-modal and multi-modal settings. Full article
Show Figures

Figure 1

28 pages, 802 KB  
Article
Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction
by Yunpeng Gong, Peng Hu, Zihan Zhang, Pengyu Liu, Zhengyang Li, Ruoyun Zhang, Jinghui Yin and Manzhou Li
Electronics 2026, 15(3), 633; https://doi.org/10.3390/electronics15030633 - 2 Feb 2026
Viewed by 195
Abstract
Anomalous transaction behaviors in cryptocurrency markets exhibit high concealment, substantial diversity, and strong cross-modal coupling, making traditional rule-based or single-feature analytical methods insufficient for reliable detection in real-world environments. To address the research focus, a data-centric multimodal anomaly detection framework integrating generative augmentation, [...] Read more.
Anomalous transaction behaviors in cryptocurrency markets exhibit high concealment, substantial diversity, and strong cross-modal coupling, making traditional rule-based or single-feature analytical methods insufficient for reliable detection in real-world environments. To address the research focus, a data-centric multimodal anomaly detection framework integrating generative augmentation, latent distribution modeling, and dual-branch real-time detection is proposed. The method employs a generative adversarial network with feature-consistency constraints to mitigate the scarcity of fraudulent samples, and adopts a multi-domain variational modeling strategy to learn the latent distribution of normal behaviors, enabling stable anomaly scoring. By combining the long-range temporal modeling capability of Transformer architectures with the sensitivity of online clustering to local structural deviations, the system dynamically integrates global and local information through an adaptive risk fusion mechanism, thereby enhancing robustness and real-time detection capability. Experimental results demonstrate that the generative augmentation module yields substantial improvements, increasing the recall from 0.421 to 0.671 and the F1-score to 0.692. In anomaly distribution modeling, the multi-domain VAE achieves an area under the curve (AUC) of 0.854 and an F1-score of 0.660, significantly outperforming traditional One-Class SVM and autoencoder baselines. Multimodal fusion experiments further verify the complementarity of the dual-branch detection structure, with the adaptive fusion model achieving an AUC of 0.884, an F1-score of 0.713, and reducing the false positive rate to 0.087. Ablation studies show that the complete model surpasses any individual module in terms of precision, recall, and F1-score, confirming the synergistic benefits of its integrated components. Overall, the proposed framework achieves high accuracy and high recall in data-scarce, structurally complex, and latency-sensitive cryptocurrency scenarios, providing a scalable and efficient solution for deploying data-centric artificial intelligence in financial security applications. Full article
(This article belongs to the Special Issue Machine Learning in Data Analytics and Prediction)
Show Figures

Figure 1

19 pages, 554 KB  
Article
Multimodal Sample Correction Method Based on Large-Model Instruction Enhancement and Knowledge Guidance
by Zhenyu Chen, Huaguang Yan, Jianguang Du, Meng Xue and Shuai Zhao
Electronics 2026, 15(3), 631; https://doi.org/10.3390/electronics15030631 - 2 Feb 2026
Viewed by 96
Abstract
With the continuous improvement of power system intelligence, multimodal data generated during distribution network maintenance have grown exponentially. However, existing power multimodal datasets commonly suffer from issues such as low sample quality, frequent factual errors, and inconsistent instruction expressions caused by regional differences.Traditional [...] Read more.
With the continuous improvement of power system intelligence, multimodal data generated during distribution network maintenance have grown exponentially. However, existing power multimodal datasets commonly suffer from issues such as low sample quality, frequent factual errors, and inconsistent instruction expressions caused by regional differences.Traditional sample correction methods mainly rely on manual screening or single-feature matching, which suffer from low efficiency and limited adaptability. This paper proposes a multimodal sample correction framework based on large-model instruction enhancement and knowledge guidance, focusing on two critical modalities: temporal data and text documentation. Multimodal sample correction refers to the task of identifying and rectifying errors, inconsistencies, or quality issues in datasets containing multiple data types (temporal sequences and text), with the objective of producing corrected samples that maintain factual accuracy, temporal consistency, and domain-specific compliance. Our proposed framework employs a three-stage processing approach: first, temporal Bidirectional Encoder Representations from Transformers (BERT) models and text BERT models are used to extract and fuse device temporal features and text features, respectively; second, a knowledge-injected assessment mechanism integrated with power knowledge graphs and DeepSeek’s long-chain-of-thought (CoT) capabilities is designed to achieve precise assessment of sample credibility; third, beam search algorithms are employed to generate high-quality corrected text, significantly improving the quality and reliability of multimodal samples in power professional scenarios. Experimental results demonstrate that our method significantly outperforms baseline models across all evaluation metrics (BLEU: 0.361, ROUGE: 0.521, METEOR: 0.443, F1-Score: 0.796), achieving improvements ranging from 21.1% to 73.0% over state-of-the-art methods: specifically, a 21.1% improvement over GECToR in BLEU, 26.5% over GECToR in ROUGE, 30.3% over Deep Edit in METEOR, and 11.8% over Deep Edit in F1-Score, with a reduction of approximately 35% in hallucination rates compared to existing approaches. These improvements provide important technical support for intelligent operation and maintenance of power systems, with implications for improving data quality management, enhancing model reliability in safety-critical applications, and enabling scalable knowledge-guided correction frameworks transferable to other industrial domains requiring high data integrity. Full article
Show Figures

Figure 1

Back to TopTop