Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (5)

Search Parameters:
Keywords = cross-modal heatmap fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 1458 KiB  
Article
User Comment-Guided Cross-Modal Attention for Interpretable Multimodal Fake News Detection
by Zepu Yi, Chenxu Tang and Songfeng Lu
Appl. Sci. 2025, 15(14), 7904; https://doi.org/10.3390/app15147904 - 15 Jul 2025
Viewed by 384
Abstract
In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of [...] Read more.
In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of social polarization. Current fake news detection methods are largely limited to superficial text analysis or basic text–image integration, which face significant limitations in accurately identifying deceptive information. To bridge this gap, we propose the UC-CMAF framework, which comprehensively integrates news text, images, and user comments through an adaptive co-attention fusion mechanism. The UC-CMAF workflow consists of four key subprocesses: multimodal feature extraction, cross-modal adaptive collaborative attention fusion of news text and images, cross-modal attention fusion of user comments with news text and images, and finally, input of fusion features into a fake news detector. Specifically, we introduce multi-head cross-modal attention heatmaps and comment importance visualizations to provide interpretability support for the model’s predictions, revealing key semantic areas and user perspectives that influence judgments. Through the cross-modal adaptive collaborative attention mechanism, UC-CMAF achieves deep semantic alignment between news text and images and uses social signals from user comments to build an enhanced credibility evaluation path, offering a new paradigm for interpretable fake information detection. Experimental results demonstrate that UC-CMAF consistently outperforms 15 baseline models across two benchmark datasets, achieving F1 Scores of 0.894 and 0.909. These results validate the effectiveness of its adaptive cross-modal attention mechanism and the incorporation of user comments in enhancing both detection accuracy and interpretability. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)
Show Figures

Figure 1

19 pages, 1918 KiB  
Article
3D Human Pose Estimation Based on Wearable IMUs and Multiple Camera Views
by Mingliang Chen and Guangxing Tan
Electronics 2024, 13(15), 2926; https://doi.org/10.3390/electronics13152926 - 24 Jul 2024
Cited by 7 | Viewed by 2855
Abstract
The problem of 3D human pose estimation (HPE) has been the focus of research in recent years, yet precise estimation remains an under-explored challenge. In this paper, the merits of both multiview images and wearable IMUs are combined to enhance the process of [...] Read more.
The problem of 3D human pose estimation (HPE) has been the focus of research in recent years, yet precise estimation remains an under-explored challenge. In this paper, the merits of both multiview images and wearable IMUs are combined to enhance the process of 3D HPE. We build upon a state-of-the-art baseline while introducing three novelties. Initially, we enhance the precision of keypoint localization by substituting Gaussian kernels with Laplacian kernels in the generation of target heatmaps. Secondly, we incorporate orientation regularized network (ORN), which enhances cross-modal heatmap fusion by taking a weighted average of the top-scored values instead of solely relying on the maximum value. This not only improves robustness to outliers but also leads to higher accuracy in pose estimation. Lastly, we modify the limb length constraint in the conventional orientation regularized pictorial structure model (ORPSM) to improve the estimation of joint positions. Specifically, we devise a soft-coded binary term for limb length constraint, hence imposing a flexible and smoothed penalization and reducing sensitivity to hyperparameters. The experimental results using the TotalCapture dataset reveal a significant improvement, with a 10.3% increase in PCKh accuracy at the one-twelfth threshold and a 3.9 mm reduction in MPJPE error compared to the baseline. Full article
Show Figures

Figure 1

19 pages, 2447 KiB  
Article
Center-Aware 3D Object Detection with Attention Mechanism Based on Roadside LiDAR
by Haobo Shi, Dezao Hou and Xiyao Li
Sustainability 2023, 15(3), 2628; https://doi.org/10.3390/su15032628 - 1 Feb 2023
Cited by 10 | Viewed by 3885
Abstract
Infrastructure 3D Object Detection is a pivotal component of Vehicle-Infrastructure Cooperated Autonomous Driving (VICAD). As turning objects account for a high proportion of traffic at intersections, anchor-free representation in the bird’s-eye view (BEV) is more suitable for roadside 3D detection. In this work, [...] Read more.
Infrastructure 3D Object Detection is a pivotal component of Vehicle-Infrastructure Cooperated Autonomous Driving (VICAD). As turning objects account for a high proportion of traffic at intersections, anchor-free representation in the bird’s-eye view (BEV) is more suitable for roadside 3D detection. In this work, we propose CetrRoad, a simple yet effective center-aware detector with transformer-based detection head for roadside 3D object detection with single LiDAR (Light Detection and Ranging). CetrRoad firstly utilizes a voxel-based roadside LiDAR feature encoder module that voxelizes and projects the raw point cloud into BEV with dense feature representation, following a one-stage center proposal module that initializes center candidates of objects based on the top N points in the BEV target heatmap with unnormalized 2D Gaussian. Then, taking attending center proposals as query embedding, a detection head with multi-head self-attention and multi-scale multi-head deformable cross attention can refine and predict 3D bounding boxes for different classes moving/parked at the intersection. Extensive experiments and analyses demonstrate that our method achieves state-of-the-art performance on the DAIR-V2X-I benchmark with an acceptable training time cost, especially for Car and Cyclist. CetrRoad also reaches comparable results with the multi-modal fusion method for Pedestrian. An ablation study demonstrates that center-aware query as input can provide denser supervision than a purified feature map in the attention-based detection head. Moreover, we were able to intuitively observe that in complex traffic environment, our proposed model could produce more accurate 3D detection results than other compared methods with fewer false positives, which is helpful for other downstream VICAD tasks. Full article
(This article belongs to the Special Issue Sustainable Transportation and Urban Planning)
Show Figures

Figure 1

15 pages, 2436 KiB  
Article
An Open-Source AI Framework for the Analysis of Single Cells in Whole-Slide Images with a Note on CD276 in Glioblastoma
by Islam Alzoubi, Guoqing Bao, Rong Zhang, Christina Loh, Yuqi Zheng, Svetlana Cherepanoff, Gary Gracie, Maggie Lee, Michael Kuligowski, Kimberley L. Alexander, Michael E. Buckland, Xiuying Wang and Manuel B. Graeber
Cancers 2022, 14(14), 3441; https://doi.org/10.3390/cancers14143441 - 15 Jul 2022
Cited by 10 | Viewed by 3042
Abstract
Routine examination of entire histological slides at cellular resolution poses a significant if not insurmountable challenge to human observers. However, high-resolution data such as the cellular distribution of proteins in tissues, e.g., those obtained following immunochemical staining, are highly desirable. Our present study [...] Read more.
Routine examination of entire histological slides at cellular resolution poses a significant if not insurmountable challenge to human observers. However, high-resolution data such as the cellular distribution of proteins in tissues, e.g., those obtained following immunochemical staining, are highly desirable. Our present study extends the applicability of the PathoFusion framework to the cellular level. We illustrate our approach using the detection of CD276 immunoreactive cells in glioblastoma as an example. Following automatic identification by means of PathoFusion’s bifocal convolutional neural network (BCNN) model, individual cells are automatically profiled and counted. Only discriminable cells selected through data filtering and thresholding were segmented for cell-level analysis. Subsequently, we converted the detection signals into the corresponding heatmaps visualizing the distribution of the detected cells in entire whole-slide images of adjacent H&E-stained sections using the Discrete Wavelet Transform (DWT). Our results demonstrate that PathoFusion is capable of autonomously detecting and counting individual immunochemically labelled cells with a high prediction performance of 0.992 AUC and 97.7% accuracy. The data can be used for whole-slide cross-modality analyses, e.g., relationships between immunochemical signals and anaplastic histological features. PathoFusion has the potential to be applied to additional problems that seek to correlate heterogeneous data streams and to serve as a clinically applicable, weakly supervised system for histological image analyses in (neuro)pathology. Full article
Show Figures

Figure 1

18 pages, 4008 KiB  
Article
CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation
by Xiaojing Sun, Bin Wang, Longxiang Huang, Qian Zhang, Sulei Zhu and Yan Ma
Sensors 2021, 21(18), 6095; https://doi.org/10.3390/s21186095 - 11 Sep 2021
Cited by 6 | Viewed by 4005
Abstract
Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to [...] Read more.
Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

Back to TopTop