Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (134)

Search Parameters:
Keywords = siamese convolutional neural network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 13053 KB  
Article
Lightweight Complex-Valued Siamese Network for Few-Shot PolSAR Image Classification
by Yinyin Jiang, Rongzhen Du, Wanying Song, Peng Zhang, Lei Liu and Zhenxi Zhang
Remote Sens. 2026, 18(2), 344; https://doi.org/10.3390/rs18020344 - 20 Jan 2026
Viewed by 112
Abstract
Complex-valued convolutional neural networks (CVCNNs) have demonstrated strong capabilities for polarimetric synthetic aperture radar (PolSAR) image classification by effectively integrating both amplitude and phase information inherent in polarimetric data. However, their practical deployment faces significant challenges due to high computational costs and performance [...] Read more.
Complex-valued convolutional neural networks (CVCNNs) have demonstrated strong capabilities for polarimetric synthetic aperture radar (PolSAR) image classification by effectively integrating both amplitude and phase information inherent in polarimetric data. However, their practical deployment faces significant challenges due to high computational costs and performance degradation caused by extremely limited labeled samples. To address these challenges, a lightweight CV Siamese network (LCVSNet) is proposed for few-shot PolSAR image classification. Considering the constraints of limited hardware resources in practical applications, simple one-dimensional (1D) CV convolutions along the scattering dimension are combined with two-dimensional (2D) lightweight CV convolutions. In this way, the inter-element dependencies of polarimetric coherency matrix and the spatial correlations between neighboring units can be captured effectively, while simultaneously reducing computational costs. Furthermore, LCVSNet incorporates a contrastive learning (CL) projection head to explicitly optimize the feature space. This optimization can effectively enhance the feature discriminability, leading to accurate classification with a limited number of labeled samples. Experiments on three real PolSAR datasets demonstrate the effectiveness and practical utility of LCVSNet for PolSAR image classification with a small number of labeled samples. Full article
Show Figures

Figure 1

16 pages, 2345 KB  
Article
Vehicular Re-Identification from Uncontrolled Multiple Views
by Sally Ghanem, John H. Holliman and Ryan A. Kerekes
Future Transp. 2025, 5(4), 202; https://doi.org/10.3390/futuretransp5040202 - 18 Dec 2025
Viewed by 367
Abstract
Vehicle re-identification (re-ID) across disparate sensing modalities remains a fundamental challenge for transportation research. In this work, we introduce a deep multi-view vehicle re-ID framework that leverages Siamese networks to compare pairs of vehicle images and produce matching scores, enabling robust association across [...] Read more.
Vehicle re-identification (re-ID) across disparate sensing modalities remains a fundamental challenge for transportation research. In this work, we introduce a deep multi-view vehicle re-ID framework that leverages Siamese networks to compare pairs of vehicle images and produce matching scores, enabling robust association across drastically different viewpoints such as those from UAVs, surveillance cameras, and ground sensors. The model exploits convolutional neural networks to learn features that remain discriminative under changes in angle, distance, and illumination, supporting more generalizable re-ID performance. As part of this effort, we also developed an automated pipeline to synchronize roadside and UAV video streams, producing a multi-perspective dataset that complements preexisting real collections and a synthetic dataset generated in this study. Together, these contributions advance the capability to re-identify vehicles across wide viewing baselines; establish a foundation for scalable, reproducible research in vehicle re-ID; and open pathways for future applications, such as inferring routine behaviors, movement patterns, and daily habits of the individual associated with the vehicle. Full article
Show Figures

Figure 1

15 pages, 1875 KB  
Article
MS-Detector: A Hierarchical Deep Learning Method to Detect Muscle Strain Using Bilateral Symmetric Ultrasound Images of the Body
by Le Zhu, Yifu Xiong, Huachao Wu, Li Zhu, Zihan Tang, Wenbin Pei, Jing Zhou and Zhidong Xue
Diagnostics 2025, 15(23), 3087; https://doi.org/10.3390/diagnostics15233087 - 4 Dec 2025
Viewed by 471
Abstract
Background/Objectives: Muscle strain impairs mobility and quality of life, yet ultrasound diagnosis remains dependent on subjective expert interpretation, which can lead to variability in lesion detection. This study aimed to develop and evaluate MS-detector, a symmetry-aware, two-stage deep learning model that leverages bilateral [...] Read more.
Background/Objectives: Muscle strain impairs mobility and quality of life, yet ultrasound diagnosis remains dependent on subjective expert interpretation, which can lead to variability in lesion detection. This study aimed to develop and evaluate MS-detector, a symmetry-aware, two-stage deep learning model that leverages bilateral B-mode ultrasound images to automatically detect muscle strain and provide clinicians with a consistent second-reader decision-support tool in routine practice. Methods: A YOLOv5-based detector proposes candidate regions, and a Siamese convolutional neural network (CNN) compares contralateral regions to filter false positives. The dataset comprised 559 bilateral pairs from 86 patients with consensus labels. All splits were enforced at the patient level. A fixed, independent hold-out test set of 32 pairs was never used for training, tuning, or threshold selection. Five-fold cross-validation (CV) on the remaining development set was used for model selection. The operating point was pre-specified at T1 = 0.01 and T2 = 0.20. Results: The detector achieved mAP = 0.4006 (five-fold CV mean). On the hold-out set at the pre-specified operating point, MS-detector attained recall = 0.826 and precision = 0.486, improving F1/F2 over the YOLOv5 baseline by increasing precision with an acceptable recall trade-off. A representative figure illustrates the reduction in low-confidence false positives after filtering; this example is illustrative rather than aggregate. Conclusions: Leveraging contralateral symmetry in a hierarchical scheme improves detection precision while maintaining clinically acceptable recall, supporting MS-detector as a decision-support tool. Future work will evaluate generalizability across scanners and centers and assess calibrated probabilistic fusion and lesion grading. Full article
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)
Show Figures

Figure 1

15 pages, 43296 KB  
Article
NCIVISION: A Siamese Neural Network for Molecular Similarity Prediction MEP and RDG Images
by Rafael Campos Vieira, Letícia de A. Nascimento, Arthur Alves Nascimento, Nicolas Ricardo de Melo Alves, Érica C. M. Nascimento and João B. L. Martins
Molecules 2025, 30(23), 4589; https://doi.org/10.3390/molecules30234589 - 28 Nov 2025
Viewed by 482
Abstract
Artificial neural networks in drug discovery have shown remarkable potential in various areas, including molecular similarity assessment and virtual screening. This study presents a novel multimodal Siamese neural network architecture. The aim was to join molecular electrostatic potential (MEP) images with the texture [...] Read more.
Artificial neural networks in drug discovery have shown remarkable potential in various areas, including molecular similarity assessment and virtual screening. This study presents a novel multimodal Siamese neural network architecture. The aim was to join molecular electrostatic potential (MEP) images with the texture features derived from reduced density gradient (RDG) diagrams for enhanced molecular similarity prediction. On one side, the proposed model is combined with a convolutional neural network (CNN) for processing MEP visual information. This data is added to the multilayer perceptron (MLP) that extracts texture features from gray-level co-occurrence matrices (GLCM) computed from RDG diagrams. Both representations converge through a multimodal projector into a shared embedding space, which was trained using triplet loss to learn similarity and dissimilarity patterns. Limitations associated with the use of purely structural descriptors were overcome by incorporating non-covalent interaction information through RDG profiles, which enables the identification of bioisosteric relationships needed for rational drug design. Three datasets were used to evaluate the performance of the developed model: tyrosine kinase inhibitors (TKIs) targeting the mutant T315I BCR-ABL receptor for the treatment of chronic myeloid leukemia, acetylcholinesterase inhibitors (AChEIs) for Alzheimer’s disease therapy, and heterodimeric AChEI candidates for cross-validation. The visual and texture features of the Siamese architecture help in the capture of molecular similarities based on electrostatic and non-covalent interaction profiles. Therefore, the developed protocol offers a suitable approach in computational drug discovery, being a promising framework for virtual screening, drug repositioning, and the identification of novel therapeutic candidates. Full article
(This article belongs to the Section Computational and Theoretical Chemistry)
Show Figures

Graphical abstract

25 pages, 12760 KB  
Article
Intelligent Face Recognition: Comprehensive Feature Extraction Methods for Holistic Face Analysis and Modalities
by Thoalfeqar G. Jarullah, Ahmad Saeed Mohammad, Musab T. S. Al-Kaltakchi and Jabir Alshehabi Al-Ani
Signals 2025, 6(3), 49; https://doi.org/10.3390/signals6030049 - 19 Sep 2025
Viewed by 2313
Abstract
Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance [...] Read more.
Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance monitoring, access management, and law enforcement activities. In this paper, comprehensive evaluations are conducted using different face detection and modality segmentation methods, feature extraction methods, and classifiers to improve system performance. As for face detection, four methods are proposed: OpenCV’s Haar Cascade classifier, Dlib’s HOG + SVM frontal face detector, Dlib’s CNN face detector, and Mediapipe’s face detector. Additionally, two types of feature extraction techniques are proposed: hand-crafted features (traditional methods: global local features) and deep learning features. Three global features were extracted, Scale-Invariant Feature Transform (SIFT), Speeded Robust Features (SURF), and Global Image Structure (GIST). Likewise, the following local feature methods are utilized: Local Binary Pattern (LBP), Weber local descriptor (WLD), and Histogram of Oriented Gradients (HOG). On the other hand, the deep learning-based features fall into two categories: convolutional neural networks (CNNs), including VGG16, VGG19, and VGG-Face, and Siamese neural networks (SNNs), which generate face embeddings. For classification, three methods are employed: Support Vector Machine (SVM), a one-class SVM variant, and Multilayer Perceptron (MLP). The system is evaluated on three datasets: in-house, Labelled Faces in the Wild (LFW), and the Pins dataset (sourced from Pinterest) providing comprehensive benchmark comparisons for facial recognition research. The best performance accuracy for the proposed ten-feature extraction methods applied to the in-house database in the context of the facial recognition task achieved 99.8% accuracy by using the VGG16 model combined with the SVM classifier. Full article
Show Figures

Figure 1

25 pages, 3109 KB  
Article
Radio Frequency Fingerprinting Authentication for IoT Networks Using Siamese Networks
by Raju Dhakal, Laxima Niure Kandel and Prashant Shekhar
IoT 2025, 6(3), 47; https://doi.org/10.3390/iot6030047 - 22 Aug 2025
Cited by 1 | Viewed by 3680
Abstract
As IoT (internet of things) devices grow in prominence, safeguarding them from cyberattacks is becoming a pressing challenge. To bootstrap IoT security, device identification or authentication is crucial for establishing trusted connections among devices without prior trust. In this regard, radio frequency fingerprinting [...] Read more.
As IoT (internet of things) devices grow in prominence, safeguarding them from cyberattacks is becoming a pressing challenge. To bootstrap IoT security, device identification or authentication is crucial for establishing trusted connections among devices without prior trust. In this regard, radio frequency fingerprinting (RFF) is gaining attention because it is more efficient and requires fewer computational resources compared to resource-intensive cryptographic methods, such as digital signatures. RFF works by identifying unique manufacturing defects in the radio circuitry of IoT devices by analyzing over-the-air signals that embed these imperfections, allowing for the identification of the transmitting hardware. Recent studies on RFF often leverage advanced classification models, including classical machine learning techniques such as K-Nearest Neighbor (KNN) and Support Vector Machine (SVM), as well as modern deep learning architectures like Convolutional Neural Network (CNN). In particular, CNNs are well-suited as they use multidimensional mapping to detect and extract reliable fingerprints during the learning process. However, a significant limitation of these approaches is that they require large datasets and necessitate retraining when new devices not included in the initial training set are added. This retraining can cause service interruptions and is costly, especially in large-scale IoT networks. In this paper, we propose a novel solution to this problem: RFF using Siamese networks, which eliminates the need for retraining and allows for seamless authentication in IoT deployments. The proposed Siamese network is trained using in-phase and quadrature (I/Q) samples from 10 different Software-Defined Radios (SDRs). Additionally, we present a new algorithm, the Similarity-Based Embedding Classification (SBEC) for RFF. We present experimental results that demonstrate that the Siamese network effectively distinguishes between malicious and trusted devices with a remarkable 98% identification accuracy. Full article
(This article belongs to the Special Issue Cybersecurity in the Age of the Internet of Things)
Show Figures

Figure 1

21 pages, 3406 KB  
Article
ResNet-SE-CBAM Siamese Networks for Few-Shot and Imbalanced PCB Defect Classification
by Chao-Hsiang Hsiao, Huan-Che Su, Yin-Tien Wang, Min-Jie Hsu and Chen-Chien Hsu
Sensors 2025, 25(13), 4233; https://doi.org/10.3390/s25134233 - 7 Jul 2025
Cited by 3 | Viewed by 1836
Abstract
Defect detection in mass production lines often involves small and imbalanced datasets, necessitating the use of few-shot learning methods. Traditional deep learning-based approaches typically rely on large datasets, limiting their applicability in real-world scenarios. This study explores few-shot learning models for detecting product [...] Read more.
Defect detection in mass production lines often involves small and imbalanced datasets, necessitating the use of few-shot learning methods. Traditional deep learning-based approaches typically rely on large datasets, limiting their applicability in real-world scenarios. This study explores few-shot learning models for detecting product defects using limited data, enhancing model generalization and stability. Unlike previous deep learning models that require extensive datasets, our approach effectively performs defect detection with minimal data. We propose a Siamese network that integrates Residual blocks, Squeeze and Excitation blocks, and Convolution Block Attention Modules (ResNet-SE-CBAM Siamese network) for feature extraction, optimized through triplet loss for embedding learning. The ResNet-SE-CBAM Siamese network incorporates two primary features: attention mechanisms and metric learning. The recently developed attention mechanisms enhance the convolutional neural network operations and significantly improve feature extraction performance. Meanwhile, metric learning allows for the addition or removal of feature classes without the need to retrain the model, improving its applicability in industrial production lines with limited defect samples. To further improve training efficiency with imbalanced datasets, we introduce a sample selection method based on the Structural Similarity Index Measure (SSIM). Additionally, a high defect rate training strategy is utilized to reduce the False Negative Rate (FNR) and ensure no missed defect detections. At the classification stage, a K-Nearest Neighbor (KNN) classifier is employed to mitigate overfitting risks and enhance stability in few-shot conditions. The experimental results demonstrate that with a good-to-defect ratio of 20:40, the proposed system achieves a classification accuracy of 94% and an FNR of 2%. Furthermore, when the number of defective samples increases to 80, the system achieves zero false negatives (FNR = 0%). The proposed metric learning approach outperforms traditional deep learning models, such as parametric-based YOLO series models in defect detection, achieving higher accuracy and lower miss rates, highlighting its potential for high-reliability industrial deployment. Full article
Show Figures

Figure 1

19 pages, 6772 KB  
Article
A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization
by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng
Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025
Cited by 2 | Viewed by 1689
Abstract
The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.
The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article
Show Figures

Figure 1

21 pages, 7179 KB  
Article
Structural Similarity-Guided Siamese U-Net Model for Detecting Changes in Snow Water Equivalent
by Karim Malik and Colin Robertson
Remote Sens. 2025, 17(9), 1631; https://doi.org/10.3390/rs17091631 - 4 May 2025
Cited by 1 | Viewed by 1363
Abstract
Snow water equivalent (SWE), the amount of water generated when a snowpack melts, has been used to study the impacts of climate change on the cryosphere processes and snow cover dynamics during the winter season. In most analyses, high-temporal-resolution SWE and SD data [...] Read more.
Snow water equivalent (SWE), the amount of water generated when a snowpack melts, has been used to study the impacts of climate change on the cryosphere processes and snow cover dynamics during the winter season. In most analyses, high-temporal-resolution SWE and SD data are aggregated into monthly and yearly averages to detect and characterize changes. Aggregating snow measurements, however, can magnify the modifiable aerial unit problem, resulting in differing snow trends at different temporal resolutions. Time series analysis of gridded SWE data holds the potential to unravel the impacts of climate change and global warming on daily, weekly, and monthly changes in snow during the winter season. Consequently, this research presents a high-temporal-resolution analysis of changes in the SWE across the cold regions of Canada. A Siamese UNet (Si-UNet) was developed by modifying the model’s last layer to incorporate the structural similarity (SSIM) index. The similarity values from the SSIM index are passed to a contrastive loss function, where the optimization process maximizes SSIM index values for pairs of similar SWE images and minimizes the values for pairs of dissimilar SWE images. A comparison of different model architectures, loss functions, and similarity metrics revealed that the SSIM index and the contrastive loss improved the Si-UNet’s accuracy by 16%. Using our Si-UNet, we found that interannual SWE declined steadily from 1979 to 2018, with March being the month in which the most significant changes occurred (R2 = 0.1, p-value < 0.05). We conclude with a discussion on the implications of the findings from our study of snow dynamics and climate variables using gridded SWE data, computer vision metrics, and fully convolutional deep neural networks. Full article
Show Figures

Figure 1

13 pages, 3572 KB  
Article
Explainable Siamese Neural Networks for Detection of High Fall Risk Older Adults in the Community Based on Gait Analysis
by Christos Kokkotis, Kyriakos Apostolidis, Dimitrios Menychtas, Ioannis Kansizoglou, Evangeli Karampina, Maria Karageorgopoulou, Athanasios Gkrekidis, Serafeim Moustakidis, Evangelos Karakasis, Erasmia Giannakou, Maria Michalopoulou, Georgios Ch Sirakoulis and Nikolaos Aggelousis
J. Funct. Morphol. Kinesiol. 2025, 10(1), 73; https://doi.org/10.3390/jfmk10010073 - 22 Feb 2025
Viewed by 1331
Abstract
Background/Objectives: Falls among the older adult population represent a significant public health concern, often leading to diminished quality of life and serious injuries that escalate healthcare costs, and they may even prove fatal. Accurate fall risk prediction is therefore crucial for implementing timely [...] Read more.
Background/Objectives: Falls among the older adult population represent a significant public health concern, often leading to diminished quality of life and serious injuries that escalate healthcare costs, and they may even prove fatal. Accurate fall risk prediction is therefore crucial for implementing timely preventive measures. However, to date, there is no definitive metric to identify individuals with high risk of experiencing a fall. To address this, the present study proposes a novel approach that transforms biomechanical time-series data, derived from gait analysis, into visual representations to facilitate the application of deep learning (DL) methods for fall risk assessment. Methods: By leveraging convolutional neural networks (CNNs) and Siamese neural networks (SNNs), the proposed framework effectively addresses the challenges of limited datasets and delivers robust predictive capabilities. Results: Through the extraction of distinctive gait-related features and the generation of class-discriminative activation maps using Grad-CAM, the random forest (RF) machine learning (ML) model not only achieves commendable accuracy (83.29%) but also enhances explainability. Conclusions: Ultimately, this study underscores the potential of advanced computational tools and machine learning algorithms to improve fall risk prediction, reduce healthcare burdens, and promote greater independence and well-being among the older adults. Full article
Show Figures

Figure 1

21 pages, 7041 KB  
Article
Synergy of Internet of Things and Software Engineering Approach for Enhanced Copy–Move Image Forgery Detection Model
by Mohammed Assiri
Electronics 2025, 14(4), 692; https://doi.org/10.3390/electronics14040692 - 11 Feb 2025
Cited by 2 | Viewed by 1176
Abstract
The fast development of digital images and the improvement required for security measures have recently increased the demand for innovative image analysis methods. Image analysis identifies, classifies, and monitors people, events, or objects in images or videos. Image analysis significantly improves security by [...] Read more.
The fast development of digital images and the improvement required for security measures have recently increased the demand for innovative image analysis methods. Image analysis identifies, classifies, and monitors people, events, or objects in images or videos. Image analysis significantly improves security by identifying and preventing attacks on security applications through digital images. It is crucial in diverse security fields, comprising video analysis, anomaly detection, biometrics, object recognition, surveillance, and forensic investigations. By integrating advanced software engineering models with IoT capabilities, this technique revolutionizes copy–move image forgery detection. IoT devices collect and transmit real-world data, improving software solutions to detect and analyze image tampering with exceptional accuracy and efficiency. This combination enhances detection abilities and provides scalable and adaptive solutions to reduce cutting-edge forgery models. Copy–move forgery detection (CMFD) has become possibly a major active research domain in the blind image forensics area. Between existing approaches, most of them are dependent upon block and key-point methods or integration of them. A few deep convolutional neural networks (DCNN) techniques have been implemented in image hashing, image forensics, image retrieval, image classification, etc., that have performed better than the conventional methods. To accomplish robust CMFD, this study develops a fusion of soft computing with a deep learning-based CMFD approach (FSCDL-CMFDA) to secure digital images. The FSCDL-CMFDA approach aims to integrate the benefits of metaheuristics with the DL model for an enhanced CMFD process. In the FSCDL-CMFDA method, histogram equalization is initially performed to improve the image quality. Furthermore, the Siamese convolutional neural network (SCNN) model is used to learn complex features from pre-processed images. Its hyperparameters are chosen by the golden jackal optimization (GJO) model. For the CMFD process, the FSCDL-CMFDA technique employs the regularized extreme learning machine (RELM) classifier. Finally, the detection performance of the RELM method is improved by the beluga whale optimization (BWO) technique. To demonstrate the enhanced performance of the FSCDL-CMFDA method, a comprehensive outcome analysis is conducted using the MNIST and CIFAR datasets. The experimental validation of the FSCDL-CMFDA method portrayed a superior accuracy value of 98.12% over existing models. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

26 pages, 5609 KB  
Article
DSiam-CnK: A CBAM- and KCF-Enabled Deep Siamese Region Proposal Network for Human Tracking in Dynamic and Occluded Scenes
by Xiangpeng Liu, Jianjiao Han, Yulin Peng, Qiao Liang, Kang An, Fengqin He and Yuhua Cheng
Sensors 2024, 24(24), 8176; https://doi.org/10.3390/s24248176 - 21 Dec 2024
Viewed by 1300
Abstract
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently [...] Read more.
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently addressing issues such as complete occlusion or instances where the target exits the frame. To tackle these issues, this study enhances the SiamRPN algorithm by integrating the convolutional block attention module (CBAM), which enhances spatial channel attention. Additionally, it integrates the kernelized correlation filters (KCFs) for enhanced feature template representation. Building on this, we present DSiam-CnK, a Siamese neural network with dynamic template updating capabilities, facilitating adaptive adjustments in tracking strategy. The proposed algorithm is tailored to elevate the Siamese neural network’s accuracy and robustness for prolonged tracking, all the while preserving its tracking velocity. In our research, we assessed the performance on the OTB2015, VOT2018, and LaSOT datasets. Our method, when benchmarked against established trackers, including SiamRPN on OTB2015, achieved a success rate of 92.1% and a precision rate of 90.9%. On the VOT2018 dataset, it excelled, with a VOT-A (accuracy) of 46.7%, a VOT-R (robustness) of 135.3%, and a VOT-EAO (expected average overlap) of 26.4%, leading in all categories. On the LaSOT dataset, it achieved a precision of 35.3%, a normalized precision of 34.4%, and a success rate of 39%. The findings demonstrate enhanced precision in tracking performance and a notable increase in robustness with our method. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

16 pages, 952 KB  
Article
SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals
by Davide Lillini, Carlo Aironi, Lucia Migliorelli, Leonardo Gabrielli and Stefano Squartini
Sensors 2024, 24(23), 7782; https://doi.org/10.3390/s24237782 - 5 Dec 2024
Cited by 2 | Viewed by 1778
Abstract
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography [...] Read more.
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography (PSG), an intrusive procedure that depends on subjective assessment by expert clinicians. To address the limitations of PSG, we propose a decision support system, which uses a tracheal microphone for data collection and a deep learning (DL) approach—namely SiCRNN—to detect apnea events during overnight sleep recordings. Our proposed SiCRNN processes Mel spectrograms using a Siamese approach, integrating a convolutional neural network (CNN) backbone and a bidirectional gated recurrent unit (GRU). The final detection of apnea events is performed using an unsupervised clustering algorithm, specifically k-means. Multiple experimental runs were carried out to determine the optimal network configuration and the most suitable type and frequency range for the input data. Tests with data from eight patients showed that our method can achieve a Recall score of up to 95% for apnea events. We also compared the proposed approach to a fully convolutional baseline, recently introduced in the literature, highlighting the effectiveness of the Siamese training paradigm in improving the identification of SAS. Full article
Show Figures

Figure 1

24 pages, 10105 KB  
Article
SiamRhic: Improved Cross-Correlation and Ranking Head-Based Siamese Network for Object Tracking in Remote Sensing Videos
by Afeng Yang, Zhuolin Yang and Wenqing Feng
Remote Sens. 2024, 16(23), 4549; https://doi.org/10.3390/rs16234549 - 4 Dec 2024
Cited by 1 | Viewed by 2311
Abstract
Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due [...] Read more.
Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due to challenging conditions and limited computational resources. Thus, enhancing tracking efficiency and improving algorithm responsiveness in complex scenarios are crucial. To address tracking drift caused by similar objects and background interference in remote sensing image tracking, we propose an enhanced Siamese network based on the SiamRhic architecture, incorporating a cross-correlation and ranking head for improved object tracking. We first use convolutional neural networks for feature extraction and integrate the CBAM (Convolutional Block Attention Module) to enhance the tracker’s representational capacity, allowing it to focus more effectively on the objects. Additionally, we replace the original depth-wise cross-correlation operation with asymmetric convolution, enhancing both speed and performance. We also introduce a ranking loss to reduce the classification confidence of interference objects, addressing the mismatch between classification and regression. We validate the proposed algorithm through experiments on the OTB100, UAV123, and OOTB remote sensing datasets. Specifically, SiamRhic achieves success, normalized precision, and precision rates of 0.533, 0.786, and 0.812, respectively, on the OOTB benchmark. The OTB100 benchmark achieves a success rate of 0.670 and a precision rate of 0.892. Similarly, in the UAV123 benchmark, SiamRhic achieves a success rate of 0.621 and a precision rate of 0.823. These results demonstrate the algorithm’s high precision and success rates, highlighting its practical value. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

19 pages, 2872 KB  
Article
Channel and Spatial Attention in Chest X-Ray Radiographs: Advancing Person Identification and Verification with Self-Residual Attention Network
by Hazem Farah, Akram Bennour, Neesrin Ali Kurdi, Samir Hammami and Mohammed Al-Sarem
Diagnostics 2024, 14(23), 2655; https://doi.org/10.3390/diagnostics14232655 - 25 Nov 2024
Cited by 2 | Viewed by 1555
Abstract
Background/Objectives: In contrast to traditional biometric modalities, such as facial recognition, fingerprints, and iris scans or even DNA, the research orientation towards chest X-ray recognition has been spurred by its remarkable recognition rates. Capturing the intricate anatomical nuances of an individual’s skeletal structure, [...] Read more.
Background/Objectives: In contrast to traditional biometric modalities, such as facial recognition, fingerprints, and iris scans or even DNA, the research orientation towards chest X-ray recognition has been spurred by its remarkable recognition rates. Capturing the intricate anatomical nuances of an individual’s skeletal structure, the ribcage of the chest, lungs, and heart, chest X-rays have emerged as a focal point for identification and verification, especially in the forensic field, even in scenarios where the human body damaged or disfigured. Discriminative feature embedding is essential for large-scale image verification, especially in applying chest X-ray radiographs for identity identification and verification. This study introduced a self-residual attention-based convolutional neural network (SRAN) aimed at effective feature embedding, capturing long-range dependencies and emphasizing critical spatial features in chest X-rays. This method offers a novel approach to person identification and verification through chest X-ray categorization, relevant for biometric applications and patient care, particularly when traditional biometric modalities are ineffective. Method: The SRAN architecture integrated a self-channel and self-spatial attention module to minimize channel redundancy and enhance significant spatial elements. The attention modules worked by dynamically aggregating feature maps across channel and spatial dimensions to enhance feature differentiation. For the network backbone, a self-residual attention block (SRAB) was implemented within a ResNet50 framework, forming a Siamese network trained with triplet loss to improve feature embedding for identity identification and verification. Results: By leveraging the NIH ChestX-ray14 and CheXpert datasets, our method demonstrated notable improvements in accuracy for identity verification and identification based on chest X-ray images. This approach effectively captured the detailed anatomical characteristics of individuals, including skeletal structure, ribcage, lungs, and heart, highlighting chest X-rays as a viable biometric tool even in cases of body damage or disfigurement. Conclusions: The proposed SRAN with self-residual attention provided a promising solution for biometric identification through chest X-ray imaging, showcasing its potential for accurate and reliable identity verification where traditional biometric approaches may fall short, especially in postmortem cases or forensic investigations. This methodology could play a transformative role in both biometric security and healthcare applications, offering a robust alternative modality for identity verification. Full article
Show Figures

Figure 1

Back to TopTop