sensors-logo

Journal Browser

Journal Browser

Multi-Modal Data Sensing and Processing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 5 March 2025 | Viewed by 20222

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Interests: information fusion; multi-view clustering; multiple kernel clustering; multi-modality fusion
School of Software, Jiangxi Normal University, Nanchang 330022, China
Interests: artificial intelligence; computer vision; machine learning; medical imaging processing
Special Issues, Collections and Topics in MDPI journals
School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China
Interests: RGB-D data acquisition and processing; 3D vision; structured light 3D reconstruction

Special Issue Information

Dear Colleagues,

With advances in sensing technology, increasing multi-modal data can be easily collected from different sources, which describes object and scene from different aspects. For example, in visual data processing, various sensors can capture data from a variety of domains or modalities, such as RGB and depth data, during video/image acquisition. In medial data analysis, Magnetic Resonance Imaging (MRI), positron emission tomography (PET) and Single-Nucleotide Polymorphisms (SNPs) are often used for reliable decisions. For the classification of the glioblastoma multiforme subtype, three different modalities, including miRNA expression (miRNA), mRNA expression (mRNA) and DNA methylation (DNA), are commonly used. As to multi-modal data, different modalities often contain both complementary and consensus information. Fully using multi-modality information is essential for enhancing specific tasks, which derives many multi-modality learning methods.

During the past few decades, although various models and networks have been put forward for multi-modality learning and gained great success, there are still many unsolved issues, which need to be further investigated. For example, missing or noisy values often exist among multi-modal data due to the failure of sensors, and the big data era also brings the scalability problem for handling large-scale multi-modal data. It is important to regularly bring together high-quality research and innovative works, covering multi-modal data sensing, processing, fusion and multi-modality learning models.

Thus, this Special Issue focuses on providing such a platform, which is fully within the scope of this journal.

Prof. Dr. Chang Tang
Dr. Yugen Yi
Dr. Sen Xiang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 2375 KiB  
Article
Incorporating Multimodal Directional Interpersonal Synchrony into Empathetic Response Generation
by Jingyu Quan, Yoshihiro Miyake and Takayuki Nozawa
Sensors 2025, 25(2), 434; https://doi.org/10.3390/s25020434 - 13 Jan 2025
Viewed by 405
Abstract
This study investigates how interpersonal (speaker–partner) synchrony contributes to empathetic response generation in communication scenarios. To perform this investigation, we propose a model that incorporates multimodal directional (positive and negative) interpersonal synchrony, operationalized using the cosine similarity measure, into empathetic response generation. We [...] Read more.
This study investigates how interpersonal (speaker–partner) synchrony contributes to empathetic response generation in communication scenarios. To perform this investigation, we propose a model that incorporates multimodal directional (positive and negative) interpersonal synchrony, operationalized using the cosine similarity measure, into empathetic response generation. We evaluate how incorporating specific synchrony affects the generated responses at the language and empathy levels. Based on comparison experiments, models with multimodal synchrony generate responses that are closer to ground truth responses and more diverse than models without synchrony. This demonstrates that these features are successfully integrated into the models. Additionally, we find that positive synchrony is linked to enhanced emotional reactions, reduced exploration, and improved interpretation. Negative synchrony is associated with reduced exploration and increased interpretation. These findings shed light on the connections between multimodal directional interpersonal synchrony and empathy’s emotional and cognitive aspects in artificial intelligence applications. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

19 pages, 12227 KiB  
Article
Local-Peak Scale-Invariant Feature Transform for Fast and Random Image Stitching
by Hao Li, Lipo Wang, Tianyun Zhao and Wei Zhao
Sensors 2024, 24(17), 5759; https://doi.org/10.3390/s24175759 - 4 Sep 2024
Viewed by 1451
Abstract
Image stitching aims to construct a wide field of view with high spatial resolution, which cannot be achieved in a single exposure. Typically, conventional image stitching techniques, other than deep learning, require complex computation and are thus computationally expensive, especially for stitching large [...] Read more.
Image stitching aims to construct a wide field of view with high spatial resolution, which cannot be achieved in a single exposure. Typically, conventional image stitching techniques, other than deep learning, require complex computation and are thus computationally expensive, especially for stitching large raw images. In this study, inspired by the multiscale feature of fluid turbulence, we developed a fast feature point detection algorithm named local-peak scale-invariant feature transform (LP-SIFT), based on the multiscale local peaks and scale-invariant feature transform method. By combining LP-SIFT and RANSAC in image stitching, the stitching speed can be improved by orders compared with the original SIFT method. Benefiting from the adjustable size of the interrogation window, the LP-SIFT algorithm demonstrates comparable or even less stitching time than the other commonly used algorithms, while achieving comparable or even better stitching results. Nine large images (over 2600 × 1600 pixels), arranged randomly without prior knowledge, can be stitched within 158.94 s. The algorithm is highly practical for applications requiring a wide field of view in diverse application scenes, e.g., terrain mapping, biological analysis, and even criminal investigation. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

23 pages, 6371 KiB  
Article
Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network
by Junkai Yang, Yuqing He, Jingxuan Zhu, Zitao Lv and Weiqi Jin
Sensors 2024, 24(14), 4647; https://doi.org/10.3390/s24144647 - 17 Jul 2024
Viewed by 1111
Abstract
The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We [...] Read more.
The timely detection of falls and alerting medical aid is critical for health monitoring in elderly individuals living alone. This paper mainly focuses on issues such as poor adaptability, privacy infringement, and low recognition accuracy associated with traditional visual sensor-based fall detection. We propose an infrared video-based fall detection method utilizing spatial-temporal graph convolutional networks (ST-GCNs) to address these challenges. Our method used fine-tuned AlphaPose to extract 2D human skeleton sequences from infrared videos. Subsequently, the skeleton data was represented in Cartesian and polar coordinates and processed through a two-stream ST-GCN to recognize fall behaviors promptly. To enhance the network’s recognition capability for fall actions, we improved the adjacency matrix of graph convolutional units and introduced multi-scale temporal graph convolution units. To facilitate practical deployment, we optimized time window and network depth of the ST-GCN, striking a balance between model accuracy and speed. The experimental results on a proprietary infrared human action recognition dataset demonstrated that our proposed algorithm accurately identifies fall behaviors with the highest accuracy of 96%. Moreover, our algorithm performed robustly, identifying falls in both near-infrared and thermal-infrared videos. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

14 pages, 4558 KiB  
Article
Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence
by Kai Zou, Simeng Wang, Ziqian Wang, Hongliang Zou and Fan Yang
Sensors 2023, 23(22), 9014; https://doi.org/10.3390/s23229014 - 7 Nov 2023
Cited by 1 | Viewed by 1811
Abstract
Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular [...] Read more.
Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular location of proteins based on protein sequence, immunohistochemistry (IHC) images, or immunofluorescence (IF) images. However, the research on the fusion of multiple protein signals has received little attention. In this study, we developed a dual-signal computational protocol by incorporating IHC images into protein sequences to learn protein subcellular localization. Three major steps can be summarized as follows in this protocol: first, a benchmark database that includes 281 proteins sorted out from 4722 proteins of the Human Protein Atlas (HPA) and Swiss-Prot database, which is involved in the endoplasmic reticulum (ER), Golgi apparatus, cytosol, and nucleoplasm; second, discriminative feature operators were first employed to quantitate protein image-sequence samples that include IHC images and protein sequence; finally, the feature subspace of different protein signals is absorbed to construct multiple sub-classifiers via dimensionality reduction and binary relevance (BR), and multiple confidence derived from multiple sub-classifiers is adopted to decide subcellular location by the centralized voting mechanism at the decision layer. The experimental results indicated that the dual-signal model embedded IHC images and protein sequences outperformed the single-signal models with accuracy, precision, and recall of 75.41%, 80.38%, and 74.38%, respectively. It is enlightening for further research on protein subcellular location prediction under multi-signal fusion of protein. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

15 pages, 4788 KiB  
Article
Deep-Learning-Aided Evaluation of Spondylolysis Imaged with Ultrashort Echo Time Magnetic Resonance Imaging
by Suraj Achar, Dosik Hwang, Tim Finkenstaedt, Vadim Malis and Won C. Bae
Sensors 2023, 23(18), 8001; https://doi.org/10.3390/s23188001 - 21 Sep 2023
Cited by 5 | Viewed by 2107
Abstract
Isthmic spondylolysis results in fracture of pars interarticularis of the lumbar spine, found in as many as half of adolescent athletes with persistent low back pain. While computed tomography (CT) is the gold standard for the diagnosis of spondylolysis, the use of ionizing [...] Read more.
Isthmic spondylolysis results in fracture of pars interarticularis of the lumbar spine, found in as many as half of adolescent athletes with persistent low back pain. While computed tomography (CT) is the gold standard for the diagnosis of spondylolysis, the use of ionizing radiation near reproductive organs in young subjects is undesirable. While magnetic resonance imaging (MRI) is preferable, it has lowered sensitivity for detecting the condition. Recently, it has been shown that ultrashort echo time (UTE) MRI can provide markedly improved bone contrast compared to conventional MRI. To take UTE MRI further, we developed supervised deep learning tools to generate (1) CT-like images and (2) saliency maps of fracture probability from UTE MRI, using ex vivo preparation of cadaveric spines. We further compared quantitative metrics of the contrast-to-noise ratio (CNR), mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) between UTE MRI (inverted to make the appearance similar to CT) and CT and between CT-like images and CT. Qualitative results demonstrated the feasibility of successfully generating CT-like images from UTE MRI to provide easier interpretability for bone fractures thanks to improved image contrast and CNR. Quantitatively, the mean CNR of bone against defect-filled tissue was 35, 97, and 146 for UTE MRI, CT-like, and CT images, respectively, being significantly higher for CT-like than UTE MRI images. For the image similarity metrics using the CT image as the reference, CT-like images provided a significantly lower mean MSE (0.038 vs. 0.0528), higher mean PSNR (28.6 vs. 16.5), and higher SSIM (0.73 vs. 0.68) compared to UTE MRI images. Additionally, the saliency maps enabled quick detection of the location with probable pars fracture by providing visual cues to the reader. This proof-of-concept study is limited to the data from ex vivo samples, and additional work in human subjects with spondylolysis would be necessary to refine the models for clinical use. Nonetheless, this study shows that the utilization of UTE MRI and deep learning tools could be highly useful for the evaluation of isthmic spondylolysis. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

16 pages, 3488 KiB  
Article
Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification
by Jinhua Jiang, Junjie Xiao, Renlin Wang, Tiansong Li, Wenfeng Zhang, Ruisheng Ran and Sen Xiang
Sensors 2023, 23(18), 7948; https://doi.org/10.3390/s23187948 - 18 Sep 2023
Viewed by 1356
Abstract
With the increasing demand for person re-identification (Re-ID) tasks, the need for all-day retrieval has become an inevitable trend. Nevertheless, single-modal Re-ID is no longer sufficient to meet this requirement, making Multi-Modal Data crucial in Re-ID. Consequently, a Visible-Infrared Person Re-Identification (VI Re-ID) [...] Read more.
With the increasing demand for person re-identification (Re-ID) tasks, the need for all-day retrieval has become an inevitable trend. Nevertheless, single-modal Re-ID is no longer sufficient to meet this requirement, making Multi-Modal Data crucial in Re-ID. Consequently, a Visible-Infrared Person Re-Identification (VI Re-ID) task is proposed, which aims to match pairs of person images from the visible and infrared modalities. The significant modality discrepancy between the modalities poses a major challenge. Existing VI Re-ID methods focus on cross-modal feature learning and modal transformation to alleviate the discrepancy but overlook the impact of person contour information. Contours exhibit modality invariance, which is vital for learning effective identity representations and cross-modal matching. In addition, due to the low intra-modal diversity in the visible modality, it is difficult to distinguish the boundaries between some hard samples. To address these issues, we propose the Graph Sampling-based Multi-stream Enhancement Network (GSMEN). Firstly, the Contour Expansion Module (CEM) incorporates the contour information of a person into the original samples, further reducing the modality discrepancy and leading to improved matching stability between image pairs of different modalities. Additionally, to better distinguish cross-modal hard sample pairs during the training process, an innovative Cross-modality Graph Sampler (CGS) is designed for sample selection before training. The CGS calculates the feature distance between samples from different modalities and groups similar samples into the same batch during the training process, effectively exploring the boundary relationships between hard classes in the cross-modal setting. Some experiments conducted on the SYSU-MM01 and RegDB datasets demonstrate the superiority of our proposed method. Specifically, in the VIS→IR task, the experimental results on the RegDB dataset achieve 93.69% for Rank-1 and 92.56% for mAP. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

19 pages, 9285 KiB  
Article
CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval
by Yewen Li, Mingyuan Ge, Mingyong Li, Tiansong Li and Sen Xiang
Sensors 2023, 23(7), 3439; https://doi.org/10.3390/s23073439 - 24 Mar 2023
Cited by 8 | Viewed by 3190
Abstract
With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods [...] Read more.
With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

Review

Jump to: Research

42 pages, 12665 KiB  
Review
Computer Vision Technology for Monitoring of Indoor and Outdoor Environments and HVAC Equipment: A Review
by Bin Yang, Shuang Yang, Xin Zhu, Min Qi, He Li, Zhihan Lv, Xiaogang Cheng and Faming Wang
Sensors 2023, 23(13), 6186; https://doi.org/10.3390/s23136186 - 6 Jul 2023
Cited by 10 | Viewed by 5889
Abstract
Artificial intelligence technologies such as computer vision (CV), machine learning, Internet of Things (IoT), and robotics have advanced rapidly in recent years. The new technologies provide non-contact measurements in three areas: indoor environmental monitoring, outdoor environ-mental monitoring, and equipment monitoring. This paper summarizes [...] Read more.
Artificial intelligence technologies such as computer vision (CV), machine learning, Internet of Things (IoT), and robotics have advanced rapidly in recent years. The new technologies provide non-contact measurements in three areas: indoor environmental monitoring, outdoor environ-mental monitoring, and equipment monitoring. This paper summarizes the specific applications of non-contact measurement based on infrared images and visible images in the areas of personnel skin temperature, position posture, the urban physical environment, building construction safety, and equipment operation status. At the same time, the challenges and opportunities associated with the application of CV technology are anticipated. Full article
(This article belongs to the Special Issue Multi-Modal Data Sensing and Processing)
Show Figures

Figure 1

Back to TopTop