MDPI - Publisher of Open Access Journals

19 pages, 4399 KB

Open AccessArticle

Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets

by Damir Shodiev, Egor Ushakov, Arsenii Litvinov and Yury Markin

Informatics 2025, 12(4), 112; https://doi.org/10.3390/informatics12040112 - 18 Oct 2025

Viewed by 388

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under [...] Read more.

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under strict privacy requirements. Existing privacy-preserving approaches, such as federated learning and dataset distillation, have limitations related to data access, visual interpretability, etc. Methods: This study explores the use of generative models to create synthetic medical data that preserves the statistical properties of the original data while ensuring privacy. The research is carried out on the VinDr-Mammo dataset of digital mammography images. A conditional generative method using Latent Diffusion Models (LDMs) is proposed with conditioning on diagnostic labels and lesion information. Diagnostic utility and privacy robustness are assessed via cancer classification tasks and re-identification tasks using Siamese neural networks and membership inference. Results: The generated synthetic data achieved a Fréchet Inception Distance (FID) of 5.8, preserving diagnostic features. A model trained solely on synthetic data achieved comparable performance to one trained on real data (ROC-AUC: 0.77 vs. 0.82). Visual assessments showed that synthetic images are indistinguishable from real ones. Privacy evaluations demonstrated a low re-identification risk (e.g., mAP@R = 0.0051 on the test set), confirming the effectiveness of the privacy-preserving approach. Conclusions: The study demonstrates that privacy-preserving generative models can produce synthetic medical images with sufficient quality for diagnostic task while significantly reducing the risk of patient re-identification. This approach enables secure data sharing and model training in privacy-sensitive domains such as medical imaging. Full article

(This article belongs to the Special Issue Health Data Management in the Age of AI)

► Show Figures

Figure 1

16 pages, 2489 KB

Open AccessArticle

Sentence-Level Silent Speech Recognition Using a Wearable EMG/EEG Sensor System with AI-Driven Sensor Fusion and Language Model

by Nicholas Satterlee, Xiaowei Zuo, Kee Moon, Sung Q. Lee, Matthew Peterson and John S. Kang

Sensors 2025, 25(19), 6168; https://doi.org/10.3390/s25196168 - 5 Oct 2025

Viewed by 925

Abstract

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable [...] Read more.

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable SSR system capable of accurate sentence-level recognition using single-channel EMG and EEG sensors with real-time wireless transmission. A moving window-based few-shot learning model, implemented with a Siamese neural network, segments and classifies words from continuous biosignals without requiring pauses or manual segmentation between word signals. A novel sensor fusion model integrates both EMG and EEG modalities, enhancing classification accuracy. To further improve sentence-level recognition, a statistical language model (LM) is applied as post-processing to correct syntactic and lexical errors. The system was evaluated on a dataset of four military command sentences containing ten unique words, achieving 95.25% sentence-level recognition accuracy. These results demonstrate the feasibility of sentence-level SSR using wearable sensors through a window-based few-shot learning model, sensor fusion, and ML applied to limited simultaneous EMG and EEG signals. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

► Show Figures

Figure 1

25 pages, 12760 KB

Open AccessArticle

Intelligent Face Recognition: Comprehensive Feature Extraction Methods for Holistic Face Analysis and Modalities

by Thoalfeqar G. Jarullah, Ahmad Saeed Mohammad, Musab T. S. Al-Kaltakchi and Jabir Alshehabi Al-Ani

Signals 2025, 6(3), 49; https://doi.org/10.3390/signals6030049 - 19 Sep 2025

Viewed by 1008

Abstract

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance [...] Read more.

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance monitoring, access management, and law enforcement activities. In this paper, comprehensive evaluations are conducted using different face detection and modality segmentation methods, feature extraction methods, and classifiers to improve system performance. As for face detection, four methods are proposed: OpenCV’s Haar Cascade classifier, Dlib’s HOG + SVM frontal face detector, Dlib’s CNN face detector, and Mediapipe’s face detector. Additionally, two types of feature extraction techniques are proposed: hand-crafted features (traditional methods: global local features) and deep learning features. Three global features were extracted, Scale-Invariant Feature Transform (SIFT), Speeded Robust Features (SURF), and Global Image Structure (GIST). Likewise, the following local feature methods are utilized: Local Binary Pattern (LBP), Weber local descriptor (WLD), and Histogram of Oriented Gradients (HOG). On the other hand, the deep learning-based features fall into two categories: convolutional neural networks (CNNs), including VGG16, VGG19, and VGG-Face, and Siamese neural networks (SNNs), which generate face embeddings. For classification, three methods are employed: Support Vector Machine (SVM), a one-class SVM variant, and Multilayer Perceptron (MLP). The system is evaluated on three datasets: in-house, Labelled Faces in the Wild (LFW), and the Pins dataset (sourced from Pinterest) providing comprehensive benchmark comparisons for facial recognition research. The best performance accuracy for the proposed ten-feature extraction methods applied to the in-house database in the context of the facial recognition task achieved 99.8% accuracy by using the VGG16 model combined with the SVM classifier. Full article

► Show Figures

Figure 1

27 pages, 1902 KB

Open AccessArticle

Few-Shot Breast Cancer Diagnosis Using a Siamese Neural Network Framework and Triplet-Based Loss

by Tea Marasović and Vladan Papić

Algorithms 2025, 18(9), 567; https://doi.org/10.3390/a18090567 - 8 Sep 2025

Viewed by 599

Abstract

Breast cancer is one of the leading causes of death among women of all ages and backgrounds globally. In recent years, the growing deficit of expert radiologists—particularly in underdeveloped countries—alongside a surge in the number of images for analysis, has negatively affected the [...] Read more.

Breast cancer is one of the leading causes of death among women of all ages and backgrounds globally. In recent years, the growing deficit of expert radiologists—particularly in underdeveloped countries—alongside a surge in the number of images for analysis, has negatively affected the ability to secure timely and precise diagnostic results in breast cancer screening. AI technologies offer powerful tools that allow for the effective diagnosis and survival forecasting, reducing the dependency on human cognitive input. Towards this aim, this research introduces a deep meta-learning framework for swift analysis of mammography images—combining a Siamese network model with a triplet-based loss function—to facilitate automatic screening (recognition) of potentially suspicious breast cancer cases. Three pre-trained deep CNN architectures, namely GoogLeNet, ResNet50, and MobileNetV3, are fine-tuned and scrutinized for their effectiveness in transforming input mammograms to a suitable embedding space. The proposed framework undergoes a comprehensive evaluation through a rigorous series of experiments, utilizing two different, publicly accessible, and widely used datasets of digital X-ray mammograms: INbreast and CBIS-DDSM. The experimental results demonstrate the framework’s strong performance in differentiating between tumorous and normal images, even with a very limited number of training samples, on both datasets. Full article

(This article belongs to the Special Issue Machine Learning for Pattern Recognition (3rd Edition))

► Show Figures

Figure 1

18 pages, 3709 KB

Open AccessArticle

AI-Based Response Classification After Anti-VEGF Loading in Neovascular Age-Related Macular Degeneration

by Murat Fırat, İlknur Tuncer Fırat, Ziynet Fadıllıoğlu Üstündağ, Emrah Öztürk and Taner Tuncer

Diagnostics 2025, 15(17), 2253; https://doi.org/10.3390/diagnostics15172253 - 5 Sep 2025

Viewed by 829

Abstract

Background/Objectives: Wet age-related macular degeneration (AMD) is a progressive retinal disease characterized by macular neovascularization (MNV). Currently, the standard treatment for wet AMD is intravitreal anti-VEGF administration, which aims to control disease activity by suppressing neovascularization. In clinical practice, the decision to [...] Read more.

Background/Objectives: Wet age-related macular degeneration (AMD) is a progressive retinal disease characterized by macular neovascularization (MNV). Currently, the standard treatment for wet AMD is intravitreal anti-VEGF administration, which aims to control disease activity by suppressing neovascularization. In clinical practice, the decision to continue or discontinue treatment is largely based on the presence of fluid on optical coherence tomography (OCT) and changes in visual acuity. However, discrepancies between anatomic and functional responses can occur during these assessments. Methods: This article presents an artificial intelligence (AI)-based classification model developed to objectively assess the response to anti-VEGF treatment in patients with AMD at 3 months. This retrospective study included 120 patients (144 eyes) who received intravitreal bevacizumab treatment. After bevacizumab loading treatment, the presence of subretinal/intraretinal fluid (SRF/IRF) on OCT images and changes in visual acuity (logMAR) were evaluated. Patients were divided into three groups: Class 0, active disease (persistent SRF/IRF); Class 1, good response (no SRF/IRF and ≥0.1 logMAR improvement); and Class 2, limited response (no SRF/IRF but with <0.1 logMAR improvement). Pre-treatment and 3-month post-treatment OCT image pairs were used for training and testing the artificial intelligence model. Based on this grouping, classification was performed with a Siamese neural network (ResNet-18-based) model. Results: The model achieved 95.4% accuracy. The macro precision, macro recall, and macro F1 scores for the classes were 0.948, 0.949, and 0.948, respectively. Layer Class Activation Map (LayerCAM) heat maps and Shapley Additive Explanations (SHAP) overlays confirmed that the model focused on pathology-related regions. Conclusions: In conclusion, the model classifies post-loading response by predicting both anatomic disease activity and visual prognosis from OCT images. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

25 pages, 3109 KB

Open AccessArticle

Radio Frequency Fingerprinting Authentication for IoT Networks Using Siamese Networks

by Raju Dhakal, Laxima Niure Kandel and Prashant Shekhar

IoT 2025, 6(3), 47; https://doi.org/10.3390/iot6030047 - 22 Aug 2025

Viewed by 1561

Abstract

As IoT (internet of things) devices grow in prominence, safeguarding them from cyberattacks is becoming a pressing challenge. To bootstrap IoT security, device identification or authentication is crucial for establishing trusted connections among devices without prior trust. In this regard, radio frequency fingerprinting [...] Read more.

As IoT (internet of things) devices grow in prominence, safeguarding them from cyberattacks is becoming a pressing challenge. To bootstrap IoT security, device identification or authentication is crucial for establishing trusted connections among devices without prior trust. In this regard, radio frequency fingerprinting (RFF) is gaining attention because it is more efficient and requires fewer computational resources compared to resource-intensive cryptographic methods, such as digital signatures. RFF works by identifying unique manufacturing defects in the radio circuitry of IoT devices by analyzing over-the-air signals that embed these imperfections, allowing for the identification of the transmitting hardware. Recent studies on RFF often leverage advanced classification models, including classical machine learning techniques such as K-Nearest Neighbor (KNN) and Support Vector Machine (SVM), as well as modern deep learning architectures like Convolutional Neural Network (CNN). In particular, CNNs are well-suited as they use multidimensional mapping to detect and extract reliable fingerprints during the learning process. However, a significant limitation of these approaches is that they require large datasets and necessitate retraining when new devices not included in the initial training set are added. This retraining can cause service interruptions and is costly, especially in large-scale IoT networks. In this paper, we propose a novel solution to this problem: RFF using Siamese networks, which eliminates the need for retraining and allows for seamless authentication in IoT deployments. The proposed Siamese network is trained using in-phase and quadrature (I/Q) samples from 10 different Software-Defined Radios (SDRs). Additionally, we present a new algorithm, the Similarity-Based Embedding Classification (SBEC) for RFF. We present experimental results that demonstrate that the Siamese network effectively distinguishes between malicious and trusted devices with a remarkable 98% identification accuracy. Full article

(This article belongs to the Special Issue Cybersecurity in the Age of the Internet of Things)

► Show Figures

Figure 1

21 pages, 3406 KB

Open AccessArticle

ResNet-SE-CBAM Siamese Networks for Few-Shot and Imbalanced PCB Defect Classification

by Chao-Hsiang Hsiao, Huan-Che Su, Yin-Tien Wang, Min-Jie Hsu and Chen-Chien Hsu

Sensors 2025, 25(13), 4233; https://doi.org/10.3390/s25134233 - 7 Jul 2025

Cited by 2 | Viewed by 1133

Abstract

Defect detection in mass production lines often involves small and imbalanced datasets, necessitating the use of few-shot learning methods. Traditional deep learning-based approaches typically rely on large datasets, limiting their applicability in real-world scenarios. This study explores few-shot learning models for detecting product [...] Read more.

Defect detection in mass production lines often involves small and imbalanced datasets, necessitating the use of few-shot learning methods. Traditional deep learning-based approaches typically rely on large datasets, limiting their applicability in real-world scenarios. This study explores few-shot learning models for detecting product defects using limited data, enhancing model generalization and stability. Unlike previous deep learning models that require extensive datasets, our approach effectively performs defect detection with minimal data. We propose a Siamese network that integrates Residual blocks, Squeeze and Excitation blocks, and Convolution Block Attention Modules (ResNet-SE-CBAM Siamese network) for feature extraction, optimized through triplet loss for embedding learning. The ResNet-SE-CBAM Siamese network incorporates two primary features: attention mechanisms and metric learning. The recently developed attention mechanisms enhance the convolutional neural network operations and significantly improve feature extraction performance. Meanwhile, metric learning allows for the addition or removal of feature classes without the need to retrain the model, improving its applicability in industrial production lines with limited defect samples. To further improve training efficiency with imbalanced datasets, we introduce a sample selection method based on the Structural Similarity Index Measure (SSIM). Additionally, a high defect rate training strategy is utilized to reduce the False Negative Rate (FNR) and ensure no missed defect detections. At the classification stage, a K-Nearest Neighbor (KNN) classifier is employed to mitigate overfitting risks and enhance stability in few-shot conditions. The experimental results demonstrate that with a good-to-defect ratio of 20:40, the proposed system achieves a classification accuracy of 94% and an FNR of 2%. Furthermore, when the number of defective samples increases to 80, the system achieves zero false negatives (FNR = 0%). The proposed metric learning approach outperforms traditional deep learning models, such as parametric-based YOLO series models in defect detection, achieving higher accuracy and lower miss rates, highlighting its potential for high-reliability industrial deployment. Full article

(This article belongs to the Special Issue Artificial Intelligence and Sensor-Enhanced Fault Diagnosis for Industrial Application)

► Show Figures

Figure 1

23 pages, 1523 KB

Open AccessArticle

Deep One-Directional Neural Semantic Siamese Network for High-Accuracy Fact Verification

by Muchammad Naseer, Jauzak Hussaini Windiatmaja, Muhamad Asvial and Riri Fitri Sari

Big Data Cogn. Comput. 2025, 9(7), 172; https://doi.org/10.3390/bdcc9070172 - 30 Jun 2025

Viewed by 1005

Abstract

Fake news has eroded trust in credible news sources, driving the need for tools to verify the accuracy of circulating information. Fact verification addresses this issue by classifying claims as Supports (S), Refutes (R), or Not Enough Info (NEI) based on evidence. Neural [...] Read more.

Fake news has eroded trust in credible news sources, driving the need for tools to verify the accuracy of circulating information. Fact verification addresses this issue by classifying claims as Supports (S), Refutes (R), or Not Enough Info (NEI) based on evidence. Neural Semantic Matching Networks (NSMN) is an algorithm designed for this purpose, but its reliance on BiLSTM has shown limitations, particularly overfitting. This study aims to enhance NSMN for fact verification through a structured framework comprising encoding, alignment, matching, and output layers. The proposed approach employed Siamese MaLSTM in the matching layer and introduced the Manhattan Fact Relatedness Score (MFRS) in the output layer, culminating in a novel algorithm called Deep One-Directional Neural Semantic Siamese Network (DOD–NSSN). Performance evaluation compared DOD–NSSN with NSMN and transformer-based algorithms (BERT, RoBERTa, XLM, XL-Net). Results demonstrated that DOD–NSSN achieved 91.86% accuracy and consistently outperformed other models, achieving over 95% accuracy across diverse topics, including sports, government, politics, health, and industry. The findings highlight the DOD–NSSN model’s capability to generalize effectively across various domains, providing a robust tool for automated fact verification. Full article

(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)

► Show Figures

Figure 1

19 pages, 6772 KB

Open AccessArticle

A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization

by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng

Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025

Cited by 1 | Viewed by 1323

Abstract

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article

► Show Figures

Figure 1

19 pages, 1563 KB

Open AccessArticle

Small Object Tracking in LiDAR Point Clouds: Learning the Target-Awareness Prototype and Fine-Grained Search Region

by Shengjing Tian, Yinan Han, Xiantong Zhao and Xiuping Liu

Sensors 2025, 25(12), 3633; https://doi.org/10.3390/s25123633 - 10 Jun 2025

Viewed by 1305

Abstract

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point [...] Read more.

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point cloud data. Current methods predominantly focus on developing universal frameworks for general object categories, often sidelining the persistent difficulties associated with small objects. These challenges stem from a scarcity of foreground points and a low tolerance for disturbances. To this end, we propose a deep neural network framework that trains a Siamese network for feature extraction and innovatively incorporates two pivotal modules: the target-awareness prototype mining (TAPM) module and the regional grid subdivision (RGS) module. The TAPM module utilizes the reconstruction mechanism of the masked auto-encoder to distill prototypes within the feature space, thereby enhancing the salience of foreground points and aiding in the precise localization of small objects. To heighten the tolerance of disturbances in feature maps, the RGS module is devised to retrieve detailed features of the search area, capitalizing on Vision Transformer and pixel shuffle technologies. Furthermore, beyond standard experimental configurations, we have meticulously crafted scaling experiments to assess the robustness of various trackers when dealing with small objects. Comprehensive evaluations show our method achieves a mean Success of 64.9% and 60.4% under original and scaled settings, outperforming benchmarks by +3.6% and +5.4%, respectively. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

18 pages, 3721 KB

Open AccessArticle

Haptic–Vision Fusion for Accurate Position Identification in Robotic Multiple Peg-in-Hole Assembly

by Jinlong Chen, Deming Luo, Zhigang Xiao, Minghao Yang, Xingguo Qin and Yongsong Zhan

Electronics 2025, 14(11), 2163; https://doi.org/10.3390/electronics14112163 - 26 May 2025

Viewed by 1133

Abstract

Multi-peg-hole assembly is a fundamental process in robotic manufacturing, particularly for circular aviation electrical connectors (CAECs) that require precise axial alignment. However, CAEC assembly poses significant challenges due to small apertures, posture disturbances, and the need for high error tolerance. This paper proposes [...] Read more.

Multi-peg-hole assembly is a fundamental process in robotic manufacturing, particularly for circular aviation electrical connectors (CAECs) that require precise axial alignment. However, CAEC assembly poses significant challenges due to small apertures, posture disturbances, and the need for high error tolerance. This paper proposes a dual-stream Siamese network (DSSN) framework that fuses visual and tactile modalities to achieve accurate position identification in six-degree-of-freedom robotic connector assembly tasks. The DSSN employs ConvNeXt for visual feature extraction and SE-ResNet-50 with integrated attention mechanisms for tactile feature extraction, while a gated attention module adaptively fuses multimodal features. A bidirectional long short-term memory (Bi-LSTM) recurrent neural network is introduced to jointly model spatiotemporal deviations in position and orientation. Compared with state-of-the-art methods, the proposed DSSN achieves improvements of approximately 7.4%, 5.7%, and 5.4% in assembly success rates after 1, 5, and 10 buckling iterations, respectively. Experimental results validate that the integration of multimodal adaptive fusion and sequential spatiotemporal learning enables robust and precise robotic connectors assembly under high-tolerance conditions. Full article

► Show Figures

Figure 1

28 pages, 5257 KB

Open AccessArticle

Comparative Evaluation of Sequential Neural Network (GRU, LSTM, Transformer) Within Siamese Networks for Enhanced Job–Candidate Matching in Applied Recruitment Systems

by Mateusz Łępicki, Tomasz Latkowski, Izabella Antoniuk, Michał Bukowski, Bartosz Świderski, Grzegorz Baranik, Bogusz Nowak, Robert Zakowicz, Łukasz Dobrakowski, Bogdan Act and Jarosław Kurek

Appl. Sci. 2025, 15(11), 5988; https://doi.org/10.3390/app15115988 - 26 May 2025

Viewed by 1770

Abstract

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a [...] Read more.

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a multilingual Sentence Transformer backbone, which is trained end-to-end with triplet loss on real-world recruitment data. This combination captures both long-range dependencies across document segments and global semantics, representing a substantial advance over approaches that rely solely on static embeddings. We compare the three heads using ranking metrics such as Top-K accuracy and Mean Reciprocal Rank (MRR). The Transformer-based model yields the best overall performance, with an MRR of 0.979 and a Top-100 accuracy of 87.20% on the test set. Visualization of learned embeddings (t-SNE) shows that self-attention more effectively clusters matching texts and separates them from irrelevant ones. These findings underscore the potential of combining multilingual base embeddings with specialized sequential layers to reduce manual screening efforts and improve recruitment efficiency. Full article

(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)

► Show Figures

Figure 1

21 pages, 7179 KB

Open AccessArticle

by Karim Malik and Colin Robertson

Remote Sens. 2025, 17(9), 1631; https://doi.org/10.3390/rs17091631 - 4 May 2025

Cited by 1 | Viewed by 967

Abstract

Snow water equivalent (SWE), the amount of water generated when a snowpack melts, has been used to study the impacts of climate change on the cryosphere processes and snow cover dynamics during the winter season. In most analyses, high-temporal-resolution SWE and SD data [...] Read more.

Snow water equivalent (SWE), the amount of water generated when a snowpack melts, has been used to study the impacts of climate change on the cryosphere processes and snow cover dynamics during the winter season. In most analyses, high-temporal-resolution SWE and SD data are aggregated into monthly and yearly averages to detect and characterize changes. Aggregating snow measurements, however, can magnify the modifiable aerial unit problem, resulting in differing snow trends at different temporal resolutions. Time series analysis of gridded SWE data holds the potential to unravel the impacts of climate change and global warming on daily, weekly, and monthly changes in snow during the winter season. Consequently, this research presents a high-temporal-resolution analysis of changes in the SWE across the cold regions of Canada. A Siamese UNet (Si-UNet) was developed by modifying the model’s last layer to incorporate the structural similarity (SSIM) index. The similarity values from the SSIM index are passed to a contrastive loss function, where the optimization process maximizes SSIM index values for pairs of similar SWE images and minimizes the values for pairs of dissimilar SWE images. A comparison of different model architectures, loss functions, and similarity metrics revealed that the SSIM index and the contrastive loss improved the Si-UNet’s accuracy by 16%. Using our Si-UNet, we found that interannual SWE declined steadily from 1979 to 2018, with March being the month in which the most significant changes occurred (R² = 0.1, p-value < 0.05). We conclude with a discussion on the implications of the findings from our study of snow dynamics and climate variables using gridded SWE data, computer vision metrics, and fully convolutional deep neural networks. Full article

(This article belongs to the Special Issue Recent Progress on Remote Sensing Change Detection and Application Driven by AI)

► Show Figures

Figure 1

13 pages, 354 KB

Open AccessArticle

Enhanced Cleft Lip and Palate Classification Using SigLIP 2: A Comparative Study with Vision Transformers and Siamese Networks

by Oraphan Nantha, Benjaporn Sathanarugsawait and Prasong Praneetpolgrang

Appl. Sci. 2025, 15(9), 4766; https://doi.org/10.3390/app15094766 - 25 Apr 2025

Viewed by 3223

Abstract

This paper extends our previous work on cleft lip and/or palate (CL/P) classification, which employed vision transformers (ViTs) and Siamese neural networks. We now integrate SigLIP 2, a state-of-the-art multilingual vision–language model, for feature extraction, replacing the previously utilized BiomedCLIP. SigLIP 2 offers [...] Read more.

This paper extends our previous work on cleft lip and/or palate (CL/P) classification, which employed vision transformers (ViTs) and Siamese neural networks. We now integrate SigLIP 2, a state-of-the-art multilingual vision–language model, for feature extraction, replacing the previously utilized BiomedCLIP. SigLIP 2 offers enhanced semantic understanding, improved localization capabilities, and multilingual support, potentially leading to more robust feature representations for CL/P classification. We hypothesize that SigLIP 2’s superior feature extraction will improve the classification accuracy of CL/P types (bilateral, unilateral, and palate-only) from the UltraSuite CLEFT dataset, a collection of ultrasound video sequences capturing tongue movements during speech with synchronized audio recordings. A comparative analysis is conducted, evaluating the performance of our original ViT-Siamese network model (using BiomedCLIP) against a new model leveraging SigLIP 2 for feature extraction. Performance is assessed using accuracy, precision, recall, and F1 score, demonstrating the impact of SigLIP 2 on CL/P classification. The new model achieves statistically significant improvements in overall accuracy (86.6% vs. 82.76%) and F1 scores for all cleft types. We discuss the computational efficiency and practical implications of employing SigLIP 2 in a clinical setting, highlighting its potential for earlier and more accurate diagnosis, personalized treatment planning, and broader applicability across diverse populations. The results demonstrate the significant potential of advanced vision–language models, such as SigLIP 2, to enhance AI-powered medical diagnostics. Full article

► Show Figures

Figure 1

16 pages, 1939 KB

Open AccessArticle

Auto-Probabilistic Mining Method for Siamese Neural Network Training

by Arseniy Mokin, Alexander Sheshkus and Vladimir L. Arlazarov

Mathematics 2025, 13(8), 1270; https://doi.org/10.3390/math13081270 - 12 Apr 2025

Viewed by 660

Abstract

Training deep learning models for classification with limited data and computational resources remains a challenge when the number of classes is large. Metric learning offers an effective solution to this problem. However, it has its own shortcomings due to the known imperfections of [...] Read more.

Training deep learning models for classification with limited data and computational resources remains a challenge when the number of classes is large. Metric learning offers an effective solution to this problem. However, it has its own shortcomings due to the known imperfections of widely used loss functions such as contrastive loss and triplet loss, as well as sample mining methods. This paper address these issues by proposing a novel mining method and metric loss function. Firstly, this paper presents an auto-probabilistic mining method designed to automatically select the most informative training samples for Siamese neural networks. Combined with a previously proposed auto-clustering technique, the method improves model training by optimizing the utilization of available data and reducing computational overhead. Secondly, this paper proposes the novel cluster-aware triplet-based metric loss function that addresses the limitations of contrastive and triplet loss, enhancing the overall training process. To evaluate the proposed methods, experiments were conducted with the optical character recognition task using the PHD08 and Omniglot datasets. The proposed loss function with the random-mining method achieved

82.6 %

classification accuracy on the PHD08 dataset with full training on the Korean alphabet, surpassing the known baseline. The same experiment, using a reduced training alphabet, set a new baseline of

88.6 %

on the PHD08 dataset. The application of the novel mining method further enhanced the accuracy to

90.6 %

(+2.0%) and, combined with auto-clustering, achieved

92.3 %

(+3.7%) compared with the new baseline. On the Omniglot dataset, the proposed mining method reached

92.32 %

, rising to

93.17 %

with auto-clustering. These findings highlight the potential effectiveness of the developed loss function and mining method in addressing a wide range of pattern recognition challenges. Full article

(This article belongs to the Special Issue Artificial Intelligence: Deep Learning and Computer Vision)

► Show Figures

Figure 1

Search Results (244)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (244)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI