MDPI - Publisher of Open Access Journals

24 pages, 4915 KB

Open AccessArticle

Semantic-Guided Matching of Heterogeneous UAV Imagery and Mobile LiDAR Data Using Deep Learning and Graph Neural Networks

by Tee-Ann Teo, Hao Yu and Pei-Cheng Chen

Drones 2026, 10(3), 185; https://doi.org/10.3390/drones10030185 - 8 Mar 2026

Viewed by 326

Abstract

The integration of heterogeneous geospatial data, specifically low-cost unmanned aerial vehicle (UAV) imagery and mobile light detection and ranging (LiDAR) system point clouds, presents a significant challenge due to the significant radiometric and structural discrepancies between the two modalities. This study proposes a [...] Read more.

The integration of heterogeneous geospatial data, specifically low-cost unmanned aerial vehicle (UAV) imagery and mobile light detection and ranging (LiDAR) system point clouds, presents a significant challenge due to the significant radiometric and structural discrepancies between the two modalities. This study proposes a novel air-to-ground semantic feature matching framework to achieve precise geometric registration between these data sources by effectively incorporating semantic-constraint deep learning-based matching. The methodology transformed the cross-sensor alignment challenge into a robust two-dimensional image matching problem. This was achieved by first using YOLOv11 for semantic segmentation of common road markings in both the UAV orthoimage and the converted LiDAR intensity image to generate highly consistent feature references. Subsequently, the SuperPoint detector and a graph neural network matcher, SuperGlue, were applied to these semantic images to establish reliable geomatics information correspondence points. Experimental results confirmed that this semantic-guided strategy consistently outperformed traditional feature-based matching (i.e., scale-invariant feature transform + fast library for approximate nearest neighbors), particularly by converting the noisy LiDAR intensity image into a stabilized semantic representation. The explicit application of semantic constraints further proved effective in eliminating false matches between geometrically similar but semantically distinct objects. The final object-specific analysis demonstrated that features with clear, complex geometric structures (e.g., pedestrian crossings and directional arrows) provide the most robust matching control. In summary, the proposed framework successfully leverages semantic context to overcome cross-sensor heterogeneity, offering an automated and precise solution for the geometric alignment of mobile LiDAR data. Full article

(This article belongs to the Special Issue When Deep Learning Meets Geometry for Air-to-Ground Perception on Drones: 2nd Edition)

► Show Figures

Figure 1

43 pages, 4725 KB

Open AccessArticle

Graph-FEM/ML Framework for Inverse Load Identification in Thick-Walled Hyperelastic Pressure Vessels

by Nasser Firouzi, Ramy M. Hafez, Kareem N. Salloomi, Mohamed A. Abdelkawy and Raja Rizwan Hussain

Symmetry 2025, 17(12), 2021; https://doi.org/10.3390/sym17122021 - 23 Nov 2025

Cited by 2 | Viewed by 1000

Abstract

The accurate identification of internal and external pressures in thick-walled hyperelastic vessels is a challenging inverse problem with significant implications for structural health monitoring, biomedical devices, and soft robotics. Conventional analytical and numerical approaches address the forward problem effectively but offer limited means [...] Read more.

The accurate identification of internal and external pressures in thick-walled hyperelastic vessels is a challenging inverse problem with significant implications for structural health monitoring, biomedical devices, and soft robotics. Conventional analytical and numerical approaches address the forward problem effectively but offer limited means for recovering unknown load conditions from observable deformations. In this study, we introduce a Graph-FEM/ML framework that couples high-fidelity finite element simulations with machine learning models to infer normalized internal and external pressures from measurable boundary deformations. A dataset of 1386 valid samples was generated through Latin Hypercube Sampling of geometric and loading parameters and simulated using finite element analysis with a Neo-Hookean constitutive model. Two complementary neural architectures were explored: graph neural networks (GNNs), which operate directly on resampled and feature-enriched boundary data, and convolutional neural networks (CNNs), which process image-based representations of undeformed and deformed cross-sections. The GNN models consistently achieved low root-mean-square errors (≈0.021) and stable correlations across training, validation, and test sets, particularly when augmented with displacement and directional features. In contrast, CNN models exhibited limited predictive accuracy: quarter-section inputs regressed toward mean values, while full-ring and filled-section inputs improved after Bayesian optimization but remained inferior to GNNs, with higher RMSEs (0.023–0.030) and modest correlations (R2). To the best of our knowledge, this is the first work to combine boundary deformation observations with graph-based learning for inverse load identification in hyperelastic vessels. The results highlight the advantages of boundary-informed GNNs over CNNs and establish a reproducible dataset and methodology for future investigations. This framework represents an initial step toward a new direction in mechanics-informed machine learning, with the expectation that future research will refine and extend the approach to improve accuracy, robustness, and applicability in broader engineering and biomedical contexts. Full article

(This article belongs to the Special Issue Symmetries in Machine Learning and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 4146 KB

Open AccessArticle

Sentiment Analysis of Meme Images Using Deep Neural Network Based on Keypoint Representation

by Endah Asmawati, Ahmad Saikhu and Daniel O. Siahaan

Informatics 2025, 12(4), 118; https://doi.org/10.3390/informatics12040118 - 28 Oct 2025

Cited by 2 | Viewed by 2018

Abstract

Meme image sentiment analysis is a task of examining public opinion based on meme images posted on social media. In various fields, stakeholders often need to quickly and accurately determine the sentiment of memes from large amounts of available data. Therefore, innovation is [...] Read more.

Meme image sentiment analysis is a task of examining public opinion based on meme images posted on social media. In various fields, stakeholders often need to quickly and accurately determine the sentiment of memes from large amounts of available data. Therefore, innovation is needed in image pre-processing so that an increase in performance metrics, especially accuracy, can be obtained in improving the classification of meme image sentiment. This is because sentiment classification using human face datasets yields higher accuracy than using meme images. This research aims to develop a sentiment analysis model for meme images based on key points. The analyzed meme images contain human faces. The facial features extracted using key points are the eyebrows, eyes, and mouth. In the proposed method, key points of facial features are represented in the form of graphs, specifically directed graphs, weighted graphs, or weighted directed graphs. These graph representations of key points are then used to build a sentiment analysis model based on a Deep Neural Network (DNN) with three layers (hidden layer: i = 64, j = 64, k = 90). There are several contributions of this study, namely developing a human facial sentiment detection model using key points, representing key points as various graphs, and constructing a meme dataset with Indonesian text. The proposed model is evaluated using several metrics, namely accuracy, precision, recall, and F-1 score. Furthermore, a comparative analysis is conducted to evaluate the performance of the proposed model against existing approaches. The experimental results show that the proposed model, which utilized the directed graph representation of key points, obtained the highest accuracy at 83% and F1 score at 81%, respectively. Full article

(This article belongs to the Special Issue Practical Applications of Sentiment Analysis)

► Show Figures

Figure 1

17 pages, 6650 KB

Open AccessArticle

DAGMNet: Dual-Branch Attention-Pruned Graph Neural Network for Multimodal sMRI and fMRI Fusion in Autism Prediction

by Lanlan Wang, Xinyu Li, Jialu Yuan and Yinghao Chen

Biomedicines 2025, 13(9), 2168; https://doi.org/10.3390/biomedicines13092168 - 5 Sep 2025

Cited by 1 | Viewed by 1282

Abstract

Background: Accurate and early diagnosis of autism spectrum disorder (ASD) is essential for timely intervention. Structural magnetic resonance imaging (sMRI) and functional magnetic resonance imaging (fMRI) provide complementary insights into brain structure and function. Most deep learning approaches rely on a single [...] Read more.

Background: Accurate and early diagnosis of autism spectrum disorder (ASD) is essential for timely intervention. Structural magnetic resonance imaging (sMRI) and functional magnetic resonance imaging (fMRI) provide complementary insights into brain structure and function. Most deep learning approaches rely on a single modality, limiting their ability to capture cross-modal relationships. Methods: We propose DAGMNet, a dual-branch attention-pruned graph neural network for ASD prediction that integrates sMRI, fMRI, and phenotypic data. The framework employs modality-specific feature extraction to preserve unique structural and functional characteristics, an attention-based cross-modal fusion module to model inter-modality complementarity, and a phenotype-pruned dynamic graph learning module with adaptive graph construction for personalized diagnosis. Results: Evaluated on the ABIDE-I dataset, DAGMNet achieves an accuracy of 91.59% and an AUC of 96.80%, outperforming several state-of-the-art baselines. To validate the method’s generalizability, we also validate it on ADNI datasets from other degenerative diseases and achieve good results. Conclusions: By effectively fusing multimodal neuroimaging and phenotypic information, DAGMNet enhances cross-modal representation learning and improves diagnostic accuracy. To further assist clinical decision making, we conduct biomarker detection analysis to provide region-level explanations of our model’s decisions. Full article

(This article belongs to the Special Issue Progress in Neurodevelopmental Disorders Research)

► Show Figures

Figure 1

19 pages, 17084 KB

Open AccessArticle

SPADE: Superpixel Adjacency Driven Embedding for Three-Class Melanoma Segmentation

by Pablo Ordóñez, Ying Xie, Xinyue Zhang, Chloe Yixin Xie, Santiago Acosta and Issac Guitierrez

Algorithms 2025, 18(9), 551; https://doi.org/10.3390/a18090551 - 2 Sep 2025

Viewed by 1201

Abstract

The accurate segmentation of pigmented skin lesions is a critical prerequisite for reliable melanoma detection, yet approximately 30% of lesions exhibit fuzzy or poorly defined borders. This ambiguity makes the definition of a single contour unreliable and limits the effectiveness of computer-assisted diagnosis [...] Read more.

The accurate segmentation of pigmented skin lesions is a critical prerequisite for reliable melanoma detection, yet approximately 30% of lesions exhibit fuzzy or poorly defined borders. This ambiguity makes the definition of a single contour unreliable and limits the effectiveness of computer-assisted diagnosis (CAD) systems. While clinical assessment based on the ABCDE criteria (asymmetry, border, color, diameter, and evolution), dermoscopic imaging, and scoring systems remains the standard, these methods are inherently subjective and vary with clinician experience. We address this challenge by reframing segmentation into three distinct regions: background, border, and lesion core. These regions are delineated using superpixels generated via the Simple Linear Iterative Clustering (SLIC) algorithm, which provides meaningful structural units for analysis. Our contributions are fourfold: (1) redefining lesion borders as regions, rather than sharp lines; (2) generating superpixel-level embeddings with a transformer-based autoencoder; (3) incorporating these embeddings as features for superpixel classification; and (4) integrating neighborhood information to construct enhanced feature vectors. Unlike pixel-level algorithms that often overlook boundary context, our pipeline fuses global class information with local spatial relationships, significantly improving precision and recall in challenging border regions. An evaluation on the HAM10000 melanoma dataset demonstrates that our superpixel–RAG–transformer (region adjacency graph) pipeline achieves exceptional performance (100% F1 score, accuracy, and precision) in classifying background, border, and lesion core superpixels. By transforming raw dermoscopic images into region-based structured representations, the proposed method generates more informative inputs for downstream deep learning models. This strategy not only advances melanoma analysis but also provides a generalizable framework for other medical image segmentation and classification tasks. Full article

(This article belongs to the Special Issue Algorithms and Applications of Machine Learning Techniques for Healthcare)

► Show Figures

Figure 1

25 pages, 5194 KB

Open AccessArticle

A Graph-Based Superpixel Segmentation Approach Applied to Pansharpening

by Hind Hallabia

Sensors 2025, 25(16), 4992; https://doi.org/10.3390/s25164992 - 12 Aug 2025

Cited by 1 | Viewed by 1308

Abstract

In this paper, an image-driven regional pansharpening technique based on simplex optimization analysis with a graph-based superpixel segmentation strategy is proposed. This fusion approach optimally combines spatial information derived from a high-resolution panchromatic (PAN) image and spectral information captured from a low-resolution multispectral [...] Read more.

In this paper, an image-driven regional pansharpening technique based on simplex optimization analysis with a graph-based superpixel segmentation strategy is proposed. This fusion approach optimally combines spatial information derived from a high-resolution panchromatic (PAN) image and spectral information captured from a low-resolution multispectral (MS) image to generate a unique comprehensive high-resolution MS image. As the performance of such a fusion method relies on the choice of the fusion strategy, and in particular, on the way the algorithm is used for estimating gain coefficients, our proposal is dedicated to computing the injection gains over a graph-driven segmentation map. The graph-based segments are obtained by applying simple linear iterative clustering (SLIC) on the MS image followed by a region adjacency graph (RAG) merging stage. This graphical representation of the segmentation map is used as guidance for spatial information to be injected during fusion processing. The high-resolution MS image is achieved by inferring locally the details in accordance with the local simplex injection fusion rule. The quality improvements achievable by our proposal are evaluated and validated at reduced and at full scales using two high resolution datasets collected by GeoEye-1 and WorldView-3 sensors. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 1945 KB

Open AccessArticle

Spectro-Image Analysis with Vision Graph Neural Networks and Contrastive Learning for Parkinson’s Disease Detection

by Nuwan Madusanka, Hadi Sedigh Malekroodi, H. M. K. K. M. B. Herath, Chaminda Hewage, Myunggi Yi and Byeong-Il Lee

J. Imaging 2025, 11(7), 220; https://doi.org/10.3390/jimaging11070220 - 2 Jul 2025

Viewed by 1402

Abstract

This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into [...] Read more.

This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

22 pages, 2225 KB

Open AccessArticle

Connectogram-COH: A Coherence-Based Time-Graph Representation for EEG-Based Alzheimer’s Disease Detection

by Ehssan Aljanabi and İlker Türker

Diagnostics 2025, 15(11), 1441; https://doi.org/10.3390/diagnostics15111441 - 5 Jun 2025

Cited by 1 | Viewed by 1721

Abstract

Background: Alzheimer’s disease (AD) is a neurological disorder that affects the brain in the elderly, resulting in memory loss, mental deterioration, and loss of the ability to think and act, while being a cause of death, with its rates increasing dramatically. A popular [...] Read more.

Background: Alzheimer’s disease (AD) is a neurological disorder that affects the brain in the elderly, resulting in memory loss, mental deterioration, and loss of the ability to think and act, while being a cause of death, with its rates increasing dramatically. A popular method to detect AD is electroencephalography (EEG) signal analysis thanks to its ability to reflect neural activity, which helps to identify abnormalities associated with the disorder. Originating from its multivariate nature, EEG signals are generally handled as multidimensional time series, and the related methodology is employed. Methods: This study proposes a new transformation strategy that generates a graph representation with time resolution, which handles EEG recordings as relatively small time windows and converts these segments into a similarity graph based on signal coherence between available channels. The retrieved adjacency matrices are further flattened to form a 1-pixel image column, which represents the coherence activity from the available electrodes within the given time window. These pixel columns are concatenated horizontally for all available sliding time windows with 50% overlap, resulting in a grayscale image representation that can be input to well-known deep learning architectures specialized for images. We name this representation Connectogram-COH, a coherence-based version of the previously proposed time graph representation, Connectogram. Results: The experimental results demonstrate that the proposed Connectogram-COH representation effectively captures the coherence dynamics of multichannel EEG data and achieves high accuracy in detecting Alzheimer’s disease. The time graph images serve as robust input for deep learning classifiers, outperforming traditional EEG representations in terms of classification performance. Conclusions: Connectogram-COH offers a powerful and interpretable approach for transforming EEG signals into image representations that are well suited for deep learning. The method not only improves the detection of AD but also shows promise for broader applications in EEG-based and general time series classification tasks. Full article

(This article belongs to the Special Issue EEG Analysis in Diagnostics)

► Show Figures

Figure 1

25 pages, 9742 KB

Open AccessArticle

Autism Spectrum Disorder Detection Using Skeleton-Based Body Movement Analysis via Dual-Stream Deep Learning

by Jungpil Shin, Abu Saleh Musa Miah, Manato Kakizaki, Najmul Hassan and Yoichi Tomioka

Electronics 2025, 14(11), 2231; https://doi.org/10.3390/electronics14112231 - 30 May 2025

Cited by 6 | Viewed by 2895

Abstract

Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns [...] Read more.

Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns can be efficiently and non-intrusively captured using modern computational techniques, making them valuable for ASD recognition. Various types of research have been conducted to detect ASD through deep learning, including facial feature analysis, eye gaze analysis, and movement and gesture analysis. In this study, we optimise a dual-stream architecture that combines image classification and skeleton recognition models to analyse video data for body motion analysis. The first stream processes Skepxels—spatial representations derived from skeleton data—using ConvNeXt-Base, a robust image recognition model that efficiently captures aggregated spatial embeddings. The second stream encodes angular features, embedding relative joint angles into the skeleton sequence and extracting spatiotemporal dynamics using Multi-Scale Graph 3D Convolutional Network(MSG3D), a combination of Graph Convolutional Networks (GCNs) and Temporal Convolutional Networks (TCNs). We replace the ViT model from the original architecture with ConvNeXt-Base to evaluate the efficacy of CNN-based models in capturing gesture-related features for ASD detection. Additionally, we experimented with a Stack Transformer in the second stream instead of MSG3D but found it to result in lower performance accuracy, thus highlighting the importance of GCN-based models for motion analysis. The integration of these two streams ensures comprehensive feature extraction, capturing both global and detailed motion patterns. A pairwise Euclidean distance loss is employed during training to enhance the consistency and robustness of feature representations. The results from our experiments demonstrate that the two-stream approach, combining ConvNeXt-Base and MSG3D, offers a promising method for effective autism detection. This approach not only enhances accuracy but also contributes valuable insights into optimising deep learning models for gesture-based recognition. By integrating image classification and skeleton recognition, we can better capture both global and detailed motion patterns, which are crucial for improving early ASD diagnosis and intervention strategies. Full article

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)

► Show Figures

Figure 1

24 pages, 4213 KB

Open AccessArticle

Multi-Scale Feature Fusion and Global Context Modeling for Fine-Grained Remote Sensing Image Segmentation

by Yifan Li and Gengshen Wu

Appl. Sci. 2025, 15(10), 5542; https://doi.org/10.3390/app15105542 - 15 May 2025

Cited by 2 | Viewed by 2804

Abstract

High-precision remote sensing image semantic segmentation plays a crucial role in Earth science analysis and urban management, especially in urban remote sensing scenarios with rich details and complex structures. In such cases, the collaborative modeling of global and local contexts is a key [...] Read more.

High-precision remote sensing image semantic segmentation plays a crucial role in Earth science analysis and urban management, especially in urban remote sensing scenarios with rich details and complex structures. In such cases, the collaborative modeling of global and local contexts is a key challenge for improving segmentation accuracy. Existing methods that rely on single feature extraction architectures, such as convolutional neural networks (i.e., CNNs) and vision transformers, are prone to semantic fragmentation due to their limited feature representation capabilities. To address this issue, we propose a hybrid architecture model called PLGTransformer, which is based on dual-encoder collaborative enhancement and integrates pyramid pooling and graph convolutional network (i.e., GCN) modules. Our model innovatively constructs a parallel encoding architecture combining Swin transformer and CNN: the CNN branch captures fine-grained features such as road and building edges through multi-scale heterogeneous convolutions, while the Swin transformer branch models global dependencies of large-scale land cover using hierarchical window attention. To further strengthen multi-granularity feature fusion, we design a dual-path pyramid pooling module to perform adaptive multi-scale context aggregation for both feature types and dynamically balance local and global contributions using learnable weights. Specifically, we introduce the GCNs to build a topological graph in the feature space, enabling geometric relationship reasoning for multi-scale feature nodes at high resolution. Experiments on the Potsdam and Vaihingen datasets show that our model outperforms contemporary advanced methods and significantly improves segmentation accuracy for small objects such as vehicles and individual buildings, thereby validating the effectiveness of the multi-feature collaborative enhancement mechanism. Full article

(This article belongs to the Special Issue Signal and Image Processing: From Theory to Applications: 2nd Edition)

► Show Figures

Figure 1

19 pages, 6903 KB

Open AccessArticle

GT-SRR: A Structured Method for Social Relation Recognition with GGNN-Based Transformer

by Dejiao Huang, Menglei Xia, Ruyi Chang, Xiaohan Kong and Shuai Guo

Sensors 2025, 25(10), 2992; https://doi.org/10.3390/s25102992 - 9 May 2025

Cited by 2 | Viewed by 1051

Abstract

Social relationship recognition (SRR) holds significant value in fields such as behavior analysis and intelligent social systems. However, existing methods primarily focus on modeling individual visual traits, interaction patterns, and scene-level contextual cues, often failing to capture the complex dependencies among these features [...] Read more.

Social relationship recognition (SRR) holds significant value in fields such as behavior analysis and intelligent social systems. However, existing methods primarily focus on modeling individual visual traits, interaction patterns, and scene-level contextual cues, often failing to capture the complex dependencies among these features and the hierarchical structure of social groups, which are crucial for effective reasoning. In order to overcome these restrictions, this essay suggests a SRR model that integrates Gated Graph Neural Network (GGNN) and Transformer. The task for SRR in this model is image-based. Specifically, the purpose of a novel and robust hybrid feature extraction module is to capture individual characteristics, relative positional information, and group-level cues, which are used to construct relation nodes and group nodes. A modified GGNN is then employed to model the logical dependencies between features. Nevertheless, GGNN alone lacks the capacity to dynamically adjust feature importance, which may result in ambiguous relationship representations. The Transformer’s multi-head self-attention (MSA) mechanism is integrated to improve feature interaction modeling, allowing the model to capture global context and higher-order dependencies effectively. By fusing pairwise features, graph-structured features, and group-level information. Experimental results on public datasets such as PISC demonstrate that the proposed approach outperforms comparison models including Dual-Glance, GRM, GRRN, Graph-BERT, and SRT in terms of accuracy and mean average precision (mAP), validating its effectiveness in multi-feature representation learning and global reasoning. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

36 pages, 2542 KB

Open AccessArticle

Multi-Modal Graph Neural Networks for Colposcopy Data Classification and Visualization

by Priyadarshini Chatterjee, Shadab Siddiqui, Razia Sulthana Abdul Kareem and Srikanth R. Rao

Cancers 2025, 17(9), 1521; https://doi.org/10.3390/cancers17091521 - 30 Apr 2025

Cited by 3 | Viewed by 2524

Abstract

Background: Cervical lesion classification is essential for early detection of cervical cancer. While deep learning methods have shown promise, most rely on single-modal data or require extensive manual annotations. This study proposes a novel Graph Neural Network (GNN)-based framework that integrates colposcopy images, [...] Read more.

Background: Cervical lesion classification is essential for early detection of cervical cancer. While deep learning methods have shown promise, most rely on single-modal data or require extensive manual annotations. This study proposes a novel Graph Neural Network (GNN)-based framework that integrates colposcopy images, segmentation masks, and graph representations for improved lesion classification. Methods: We developed a fully connected graph-based architecture using GCNConv layers with global mean pooling and optimized it via grid search. A five-fold cross-validation protocol was employed to evaluate performance before (1–100 epochs) and after fine-tuning (101–151 epochs). Performance metrics included macro-average F1-score and validation accuracy. Visualizations were used for model interpretability. Results: The model achieved a macro-average F1-score of 89.4% and validation accuracy of 92.1% before fine-tuning, which improved to 94.56% and 98.98%, respectively, after fine-tuning. LIME-based visual explanations validated models focus on discriminative lesion regions. Conclusions: This study highlights the potential of graph-based multi-modal learning for cervical lesion analysis. Collaborating with the MNJ Institute of Oncology, the framework shows promise for clinical use. Full article

(This article belongs to the Section Methods and Technologies Development)

► Show Figures

Figure 1

26 pages, 15436 KB

Open AccessArticle

AGCD: An Attention-Guided Graph Convolution Network for Change Detection of Remote Sensing Images

by Heng Li, Xin Lyu, Xin Li, Yiwei Fang, Zhennan Xu, Xinyuan Wang, Chengming Zhang, Chun Xu, Shaochuan Chen and Chengxin Lu

Remote Sens. 2025, 17(8), 1367; https://doi.org/10.3390/rs17081367 - 11 Apr 2025

Viewed by 2016

Abstract

Change detection is a crucial field in remote sensing image analysis for tracking environmental dynamics. Although convolutional neural networks (CNNs) have made impressive strides in this field, their grid-based processing structures struggle to capture abundant semantics and complex spatial-temporal correlations of bitemporal features, [...] Read more.

Change detection is a crucial field in remote sensing image analysis for tracking environmental dynamics. Although convolutional neural networks (CNNs) have made impressive strides in this field, their grid-based processing structures struggle to capture abundant semantics and complex spatial-temporal correlations of bitemporal features, leading to high uncertainty in distinguishing true changes from pseudo changes. To overcome these limitations, we propose the Attention-guided Graph convolution network for Change Detection (AGCD), a novel framework that integrates a graph convolutional network (GCN) and an attention mechanism to enhance change-detection performance. AGCD introduces three novel modules, including Graph-level Feature Difference Module (GFDM) for enhanced feature interaction, Multi-scale Feature Fusion Module (MFFM) for detailed semantic representation and Spatial-Temporal Attention Module (STAM) for refined spatial-temporal dependency modeling. These modules enable AGCD to reduce pseudo changes triggered by seasonal variations and varying imaging conditions, thereby improving the accuracy and reliability of change-detection results. Extensive experiments on three benchmark datasets demonstrate that AGCD’s superior performance, achieving the best F1-score of 90.34% and IoU of 82.38% on the LEVIR-CD dataset and outperforming existing state-of-the-art methods by a notable margin. Full article

(This article belongs to the Special Issue Multi-Task Remote Sensing Image Analysis: Classification, Segmentation, and Change Detection)

► Show Figures

Figure 1

13 pages, 5340 KB

Open AccessArticle

Riemannian Manifolds for Biological Imaging Applications Based on Unsupervised Learning

by Ilya Larin and Alexander Karabelsky

J. Imaging 2025, 11(4), 103; https://doi.org/10.3390/jimaging11040103 - 29 Mar 2025

Cited by 2 | Viewed by 1443

Abstract

The development of neural networks has made the introduction of multimodal systems inevitable. Computer vision methods are still not widely used in biological research, despite their importance. It is time to recognize the significance of advances in feature extraction and real-time analysis of [...] Read more.

The development of neural networks has made the introduction of multimodal systems inevitable. Computer vision methods are still not widely used in biological research, despite their importance. It is time to recognize the significance of advances in feature extraction and real-time analysis of information from cells. Teacherless learning for the image clustering task is of great interest. In particular, the clustering of single cells is of great interest. This study will evaluate the feasibility of using latent representation and clustering of single cells in various applications in the fields of medicine and biotechnology. Of particular interest are embeddings, which relate to the morphological characterization of cells. Studies of C2C12 cells will reveal more about aspects of muscle differentiation by using neural networks. This work focuses on analyzing the applicability of the latent space to extract morphological features. Like many researchers in this field, we note that obtaining high-quality latent representations for phase-contrast or bright-field images opens new frontiers for creating large visual-language models. Graph structures are the main approaches to non-Euclidean manifolds. Graph-based segmentation has a long history, e.g., the normalized cuts algorithm treated segmentation as a graph partitioning problem—but only recently have such ideas merged with deep learning in an unsupervised manner. Recently, a number of works have shown the advantages of hyperbolic embeddings in vision tasks, including clustering and classification based on the Poincaré ball model. One area worth highlighting is unsupervised segmentation, which we believe is undervalued, particularly in the context of non-Euclidean spaces. In this approach, we aim to mark the beginning of our future work on integrating visual information and biological aspects of individual cells to multimodal space in comparative studies in vitro. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

18 pages, 800 KB

Open AccessArticle

Open-World Semi-Supervised Learning for fMRI Analysis to Diagnose Psychiatric Disease

by Chang Hu, Yihong Dong, Shoubo Peng and Yuehan Wu

Information 2025, 16(3), 171; https://doi.org/10.3390/info16030171 - 25 Feb 2025

Cited by 1 | Viewed by 1737

Abstract

Due to the incomplete nature of cognitive testing data and human subjective biases, accurately diagnosing mental disease using functional magnetic resonance imaging (fMRI) data poses a challenging task. In the clinical diagnosis of mental disorders, there often arises a problem of limited labeled [...] Read more.

Due to the incomplete nature of cognitive testing data and human subjective biases, accurately diagnosing mental disease using functional magnetic resonance imaging (fMRI) data poses a challenging task. In the clinical diagnosis of mental disorders, there often arises a problem of limited labeled data due to factors such as large data volumes and cumbersome labeling processes, leading to the emergence of unlabeled data with new classes, which can result in misdiagnosis. In the context of graph-based mental disorder classification, open-world semi-supervised learning for node classification aims to classify unlabeled nodes into known classes or potentially new classes, presenting a practical yet underexplored issue within the graph community. To improve open-world semi-supervised representation learning and classification in fMRI under low-label settings, we propose a novel open-world semi-supervised learning approach tailored for functional magnetic resonance imaging analysis, termed Open-World Semi-Supervised Learning for fMRI Analysis (OpenfMA). Specifically, we employ spectral augmentation self-supervised learning and dynamic concept contrastive learning to achieve open-world graph learning guided by pseudo-labels, and construct hard positive sample pairs to enhance the network’s focus on potential positive pairs. Experiments conducted on public datasets validate the superior performance of this method in the open-world psychiatric disease diagnosis domain. Full article

► Show Figures

Graphical abstract

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (48)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI