MDPI - Publisher of Open Access Journals

28 pages, 2379 KiB

Open AccessArticle

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization

by Chuan-Sheng Hung, Chun-Hung Richard Lin, Shi-Huang Chen, You-Cheng Zheng, Cheng-Han Yu, Cheng-Wei Hung, Ting-Hsin Huang and Jui-Hsiu Tsai

Bioengineering 2025, 12(8), 827; https://doi.org/10.3390/bioengineering12080827 - 30 Jul 2025

Viewed by 126

Abstract

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class [...] Read more.

In recent years, data augmentation techniques have become the predominant approach for addressing highly imbalanced classification problems in machine learning. Algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Tabular Generative Adversarial Network (CTGAN) have proven effective in synthesizing minority class samples. However, these methods often introduce distributional bias and noise, potentially leading to model overfitting, reduced predictive performance, increased computational costs, and elevated cybersecurity risks. To overcome these limitations, we propose a novel architecture, FADEL, which integrates feature-type awareness with a supervised discretization strategy. FADEL introduces a unique feature augmentation ensemble framework that preserves the original data distribution by concurrently processing continuous and discretized features. It dynamically routes these feature sets to their most compatible base models, thereby improving minority class recognition without the need for data-level balancing or augmentation techniques. Experimental results demonstrate that FADEL, solely leveraging feature augmentation without any data augmentation, achieves a recall of 90.8% and a G-mean of 94.5% on the internal test set from Kaohsiung Chang Gung Memorial Hospital in Taiwan. On the external validation set from Kaohsiung Medical University Chung-Ho Memorial Hospital, it maintains a recall of 91.9% and a G-mean of 86.7%. These results outperform conventional ensemble methods trained on CTGAN-balanced datasets, confirming the superior stability, computational efficiency, and cross-institutional generalizability of the FADEL architecture. Altogether, FADEL uses feature augmentation to offer a robust and practical solution to extreme class imbalance, outperforming mainstream data augmentation-based approaches. Full article

(This article belongs to the Special Issue Artificial Intelligence for Better Healthcare and Precision Medicine, 2nd Edition)

► Show Figures

Graphical abstract

20 pages, 9955 KiB

Open AccessArticle

Dual-Branch Occlusion-Aware Semantic Part-Features Extraction Network for Occluded Person Re-Identification

by Bo Sun, Yulong Zhang, Jianan Wang and Chunmao Jiang

Mathematics 2025, 13(15), 2432; https://doi.org/10.3390/math13152432 - 28 Jul 2025

Viewed by 139

Abstract

Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The [...] Read more.

Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The proposed DOAN framework comprises two synergistic branches. In the first branch, we introduce an Occlusion-Aware Semantic Attention (OASA) module to extract semantic part features, incorporating a parallel channel and spatial attention (PCSA) block to precisely distinguish between pedestrian body regions and occlusion noise. We also generate occlusion-aware parsing labels by combining external human parsing annotations with occluder masks, providing structural supervision to guide the model in focusing on visible regions. In the second branch, we develop an occlusion-aware recovery (OAR) module that reconstructs occluded pedestrians to their original, unoccluded form, enabling the model to recover missing semantic information and enhance occlusion robustness. Extensive experiments on occluded, partial, and holistic benchmark datasets demonstrate that DOAN consistently outperforms existing state-of-the-art methods. Full article

► Show Figures

Figure 1

22 pages, 9071 KiB

Open AccessArticle

Integrating UAV-Based RGB Imagery with Semi-Supervised Learning for Tree Species Identification in Heterogeneous Forests

by Bingru Hou, Chenfeng Lin, Mengyuan Chen, Mostafa M. Gouda, Yunpeng Zhao, Yuefeng Chen, Fei Liu and Xuping Feng

Remote Sens. 2025, 17(15), 2541; https://doi.org/10.3390/rs17152541 - 22 Jul 2025

Viewed by 298

Abstract

The integration of unmanned aerial vehicle (UAV) remote sensing and deep learning has emerged as a highly effective strategy for inventorying forest resources. However, the spatiotemporal variability of forest environments and the scarcity of annotated data hinder the performance of conventional supervised deep-learning [...] Read more.

The integration of unmanned aerial vehicle (UAV) remote sensing and deep learning has emerged as a highly effective strategy for inventorying forest resources. However, the spatiotemporal variability of forest environments and the scarcity of annotated data hinder the performance of conventional supervised deep-learning models. To overcome these challenges, this study has developed efficient tree (ET), a semi-supervised tree detector designed for forest scenes. ET employed an enhanced YOLO model (YOLO-Tree) as a base detector and incorporated a teacher–student semi-supervised learning (SSL) framework based on pseudo-labeling, effectively leveraging abundant unlabeled data to bolster model robustness. The results revealed that SSL significantly improved outcomes in scenarios with sparse labeled data, specifically when the annotation proportion was below 50%. Additionally, employing overlapping cropping as a data augmentation strategy mitigated instability during semi-supervised training under conditions of limited sample size. Notably, introducing unlabeled data from external sites enhances the accuracy and cross-site generalization of models trained on diverse datasets, achieving impressive results with F1, mAP50, and mAP50-95 scores of 0.979, 0.992, and 0.871, respectively. In conclusion, this study highlights the potential of combining UAV-based RGB imagery with SSL to advance tree species identification in heterogeneous forests. Full article

(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)

► Show Figures

Figure 1

15 pages, 3893 KiB

Open AccessArticle

Exploration of 3D Few-Shot Learning Techniques for Classification of Knee Joint Injuries on MR Images

by Vinh Hiep Dang, Minh Tri Nguyen, Ngoc Hoang Le, Thuan Phat Nguyen, Quoc-Viet Tran, Tan Ha Mai, Vu Pham Thao Vy, Truong Nguyen Khanh Hung, Ching-Yu Lee, Ching-Li Tseng, Nguyen Quoc Khanh Le and Phung-Anh Nguyen

Diagnostics 2025, 15(14), 1808; https://doi.org/10.3390/diagnostics15141808 - 18 Jul 2025

Viewed by 424

Abstract

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning [...] Read more.

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning (FSL) addresses this by enabling models to classify new conditions from minimal annotated examples, often leveraging knowledge from related tasks. However, creating robust 3D FSL frameworks for varied knee injuries remains challenging. Methods: We introduce MedNet-FS, a 3D FSL framework that effectively classifies knee injuries by utilizing domain-specific pre-trained weights and generalized end-to-end (GE2E) loss for discriminative embeddings. Results: MedNet-FS, with knee-MRI-specific pre-training, significantly outperformed models using generic or other medical pre-trained weights and approached supervised learning performance on internal datasets with limited samples (e.g., achieving an area under the curve (AUC) of 0.76 for ACL tear classification with k = 40 support samples on the MRNet dataset). External validation on the KneeMRI dataset revealed challenges in classifying partially torn ACL (AUC up to 0.58) but demonstrated promising performance for distinguishing intact versus fully ruptured ACLs (AUC 0.62 with k = 40). Conclusions: These findings demonstrate that tailored FSL strategies can substantially reduce data dependency in developing specialized medical imaging tools. This approach fosters rapid AI tool development for knee injuries and offers a scalable solution for data scarcity in other medical imaging domains, potentially democratizing AI-assisted diagnostics, particularly for rare conditions or in resource-limited settings. Full article

(This article belongs to the Special Issue New Technologies and Tools Used for Risk Assessment of Diseases)

► Show Figures

Figure 1

30 pages, 2018 KiB

Open AccessArticle

Comprehensive Performance Comparison of Signal Processing Features in Machine Learning Classification of Alcohol Intoxication on Small Gait Datasets

by Muxi Qi, Samuel Chibuoyim Uche and Emmanuel Agu

Appl. Sci. 2025, 15(13), 7250; https://doi.org/10.3390/app15137250 - 27 Jun 2025

Viewed by 378

Abstract

Detecting alcohol intoxication is crucial for preventing accidents and enhancing public safety. Traditional intoxication detection methods rely on direct blood alcohol concentration (BAC) measurement via breathalyzers and wearable sensors. These methods require the user to purchase and carry external hardware such as breathalyzers, [...] Read more.

Detecting alcohol intoxication is crucial for preventing accidents and enhancing public safety. Traditional intoxication detection methods rely on direct blood alcohol concentration (BAC) measurement via breathalyzers and wearable sensors. These methods require the user to purchase and carry external hardware such as breathalyzers, which is expensive and cumbersome. Convenient, unobtrusive intoxication detection methods using equipment already owned by users are desirable. Recent research has explored machine learning-based approaches using smartphone accelerometers to classify intoxicated gait patterns. While neural network approaches have emerged, due to the significant challenges with collecting intoxicated gait data, gait datasets are often too small to utilize such approaches. To avoid overfitting on such small datasets, traditional machine learning (ML) classification is preferred. A comprehensive set of ML features have been proposed. However, until now, no work has systematically evaluated the performance of various categories of gait features for alcohol intoxication detection task using traditional machine learning algorithms. This study evaluates 27 signal processing features handcrafted from accelerometer gait data across five domains: time, frequency, wavelet, statistical, and information-theoretic. The data were collected from 24 subjects who experienced alcohol stimulation using goggle busters. Correlation-based feature selection (CFS) was employed to rank the features most correlated with alcohol-induced gait changes, revealing that 22 features exhibited statistically significant correlations with BAC levels. These statistically significant features were utilized to train supervised classifiers and assess their impact on alcohol intoxication detection accuracy. Statistical features yielded the highest accuracy (83.89%), followed by time-domain (83.22%) and frequency-domain features (82.21%). Classifying all domain 22 significant features using a random forest model improved classification accuracy to 84.9%. These findings suggest that incorporating a broader set of signal processing features enhances the accuracy of smartphone-based alcohol intoxication detection. Full article

(This article belongs to the Special Issue AI-Based Biomedical Signal and Image Processing)

► Show Figures

Figure 1

40 pages, 7147 KiB

Open AccessArticle

A Hybrid Ensemble Learning Framework for Predicting Lumbar Disc Herniation Recurrence: Integrating Supervised Models, Anomaly Detection, and Threshold Optimization

by Mădălina Duceac (Covrig), Călin Gheorghe Buzea, Alina Pleșea-Condratovici, Lucian Eva, Letiția Doina Duceac, Marius Gabriel Dabija, Bogdan Costăchescu, Eva Maria Elkan, Cristian Guțu and Doina Carina Voinescu

Diagnostics 2025, 15(13), 1628; https://doi.org/10.3390/diagnostics15131628 - 26 Jun 2025

Viewed by 372

Abstract

Background: Lumbar disc herniation (LDH) recurrence remains a pressing clinical challenge, with limited predictive tools available to support early identification and personalized intervention. Predicting recurrence after lumbar disc herniation (LDH) remains clinically important but algorithmically difficult due to extreme class imbalance and low [...] Read more.

Background: Lumbar disc herniation (LDH) recurrence remains a pressing clinical challenge, with limited predictive tools available to support early identification and personalized intervention. Predicting recurrence after lumbar disc herniation (LDH) remains clinically important but algorithmically difficult due to extreme class imbalance and low signal-to-noise ratio. Objective: This study proposes a hybrid machine learning framework that integrates supervised classifiers, unsupervised anomaly detection, and decision threshold tuning to predict LDH recurrence using routine clinical data. Methods: A dataset of 977 patients from a Romanian neurosurgical center was used. We trained a deep neural network, random forest, and an autoencoder (trained only on non-recurrence cases) to model baseline and anomalous patterns. Their outputs were stacked into a meta-classifier and optimized via sensitivity-focused threshold tuning. Evaluation was performed via stratified cross-validation and external holdout testing. Results: Baseline models achieved high accuracy but failed to recall recurrence cases (0% sensitivity). The proposed ensemble reached 100% recall internally with a threshold of 0.05. Key predictors included hospital stay duration, L4–L5 herniation, obesity, and hypertension. However, external holdout performance dropped to 0% recall, revealing poor generalization. Conclusions: The ensemble approach enhances detection of rare recurrence cases under internal validation but exhibits poor external performance, emphasizing the challenge of rare-event modeling in clinical datasets. Future work should prioritize external validation, longitudinal modeling, and interpretability to ensure clinical adoption. Full article

(This article belongs to the Section Clinical Diagnosis and Prognosis)

► Show Figures

Graphical abstract

21 pages, 3139 KiB

Open AccessArticle

Resilient Anomaly Detection in Fiber-Optic Networks: A Machine Learning Framework for Multi-Threat Identification Using State-of-Polarization Monitoring

by Gulmina Malik, Imran Chowdhury Dipto, Muhammad Umar Masood, Mashboob Cheruvakkadu Mohamed, Stefano Straullu, Sai Kishore Bhyri, Gabriele Maria Galimberti, Antonio Napoli, João Pedro, Walid Wakim and Vittorio Curri

AI 2025, 6(7), 131; https://doi.org/10.3390/ai6070131 - 20 Jun 2025

Viewed by 929

Abstract

We present a thorough machine-learning framework based on real-time state-of-polarization (SOP) monitoring for robust anomaly identification in optical fiber networks. We exploit SOP data under three different threat scenarios: (i) malicious or critical vibration events, (ii) overlapping mechanical disturbances, and (iii) malicious fiber [...] Read more.

We present a thorough machine-learning framework based on real-time state-of-polarization (SOP) monitoring for robust anomaly identification in optical fiber networks. We exploit SOP data under three different threat scenarios: (i) malicious or critical vibration events, (ii) overlapping mechanical disturbances, and (iii) malicious fiber tapping (eavesdropping). We used various supervised machine learning techniques like k-Nearest Neighbor (k-NN), random forest, extreme gradient boosting (XGBoost), and decision trees to classify different vibration events. We also assessed the framework’s resilience to background interference by superimposing sinusoidal noise at different frequencies and examining its effects on the polarization signatures. This analysis provides insight into how subsurface installations, subject to ambient vibrations, affect detection fidelity. This highlights the sensitivity to which external interference affects polarization fingerprints. Crucially, it demonstrates the system’s capacity to discern and alert on malicious vibration events even in the presence of environmental noise. However, we focus on the necessity of noise-mitigation techniques in real-world implementations while providing a potent, real-time mechanism for multi-threat recognition in the fiber networks. Full article

(This article belongs to the Special Issue Artificial Intelligence in Optical Communication Networks)

► Show Figures

Figure 1

19 pages, 682 KiB

Open AccessArticle

Analysis of the Behavior of Insider Traders Who Disclose Information to External Traders

by Xingxing Cao, Jing Wang and Zhi Yang

Int. J. Financial Stud. 2025, 13(2), 112; https://doi.org/10.3390/ijfs13020112 - 17 Jun 2025

Viewed by 365

Abstract

This paper establishes an insider trading model under market supervision, which includes four types of trading entities: an insider trader, n external traders, noise traders, and market makers. The insider trader voluntarily discloses information to the external traders during the trading process. The [...] Read more.

This paper establishes an insider trading model under market supervision, which includes four types of trading entities: an insider trader, n external traders, noise traders, and market makers. The insider trader voluntarily discloses information to the external traders during the trading process. The research findings are as follows: (1) strengthening market supervision can significantly reduce the insider’s expected profit and increase the external traders’ expected profits; (2) the optimal market supervision strategy is closely related to the number of external traders; (3) the insider trader tends to disclose low-precision information to maximize their profits; (4) the precision of information disclosed by the insider trader and the intensity of market supervision affect price efficiency and the amount of residual information. The research results provide a basis for how the insider trader discloses information to external traders in market supervision and offer a reference for regulatory authorities to formulate differentiated supervision strategies. Full article

► Show Figures

Figure 1

20 pages, 1949 KiB

Open AccessReview

Sustainable Management of Energy Storage in Electric Vehicles Involved in a Smart Urban Environment

by Adel Razek

Energy Storage Appl. 2025, 2(2), 7; https://doi.org/10.3390/esa2020007 - 17 Jun 2025

Viewed by 283

Abstract

Electric vehicles are increasingly being used for green transportation in smart urban mobility, thus protecting environmental biodiversity and the ecosystem. Energy storage by electric vehicle batteries is a critical point of this ecologically responsible transportation. This storage is strongly linked to the different [...] Read more.

Electric vehicles are increasingly being used for green transportation in smart urban mobility, thus protecting environmental biodiversity and the ecosystem. Energy storage by electric vehicle batteries is a critical point of this ecologically responsible transportation. This storage is strongly linked to the different external managements related to its capacity state. The latter concerns the interconnection of storage to energy resources, charging strategies, and their complexity. In an ideal urban context, charging strategies would use wireless devices. However, these may involve complex frames and unwanted electromagnetic field interferences. The sustainable management of wireless devices and battery state conditions allows for optimized operation and minimized adverse effects. Such management includes the sustainable design of devices and monitoring of complex connected procedures. The present study aims to analyze this management and to highlight the mathematical routines enabling the design and control tasks involved. The investigations involved are closely related to responsible attitude, “One Health”, and twin supervision approaches. The different sections of the article examine the following: electric vehicle in smart mobility, sustainable design and control, electromagnetic exposures, governance of physical and mathematical representation, charging routines, protection against adverse effects, and supervision of complex connected vehicles. The research presented in this article is supported by examples from the literature. Full article

► Show Figures

Figure 1

21 pages, 5840 KiB

Open AccessArticle

Ecological Resilience Assessment and Scenario Simulation Considering Habitat Suitability, Landscape Connectivity, and Landscape Diversity

by Fei Liu, Hong Huang, Fangsen Lei, Ning Liang and Longxi Cao

Sustainability 2025, 17(12), 5436; https://doi.org/10.3390/su17125436 - 12 Jun 2025

Viewed by 463

Abstract

Quantitative assessment of ecological resilience is crucial for understanding regional ecological security and provides a scientific basis for ecosystem protection and management decisions. Previous studies on ecological resilience evaluation predominantly focused on ecosystem resistance and recovery capacity under external threats. To address this [...] Read more.

Quantitative assessment of ecological resilience is crucial for understanding regional ecological security and provides a scientific basis for ecosystem protection and management decisions. Previous studies on ecological resilience evaluation predominantly focused on ecosystem resistance and recovery capacity under external threats. To address this gap, we propose an innovative assessment framework integrating landscape internal structure indicators—habitat suitability (HS), landscape connectivity (SHDI), and landscape diversity (LCI)—into the resilience paradigm. This approach enables the adjustment of landscape patterns, optimization of energy/material flows, and direct enhancement of ecosystem functions to improve regional ecological resilience. Using the ecological barrier area in northern Qinghai as a case study, we employed geographic grid technology to evaluate ecological resilience levels from 2000 to 2020. Combined with geological disaster risk assessment, ecological regionalization was established. The FLUS model was then applied to simulate land use changes under inertia development (ID) and ecological protection (EP) scenarios, projecting future ecological resilience dynamics. Key findings specific to the study area include: (1) In northern Qinghai, grassland degradation was prominent (2000–2020), primarily converting to barren land. (2) Landscape connectivity and diversity declined, leading to a 6% reduction in ecological resilience over twenty years. (3) Based on ecological resilience and geological disaster risk, three ecological management zones were delineated: prevention and protection areas (40.94%), key supervision areas (38.77%), and key ecological restoration areas (20.09%). (4) Compared with 2020, ecological resilience in 2030 decreased by 23.38% under the ID scenario and 14.28% under the EP scenario. The EP scenario effectively mitigated the decline of resilience. This study offers a novel perspective for ecological resilience assessment and supports spatial optimization of land resources to enhance ecosystem sustainability in ecologically vulnerable regions. Full article

► Show Figures

Figure 1

17 pages, 439 KiB

Open AccessEditor’s ChoiceArticle

MultiAVSR: Robust Speech Recognition via Supervised Multi-Task Audio–Visual Learning

by Shad Torrie, Kimi Wright and Dah-Jye Lee

Electronics 2025, 14(12), 2310; https://doi.org/10.3390/electronics14122310 - 6 Jun 2025

Viewed by 782

Abstract

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challenges, we present a new multi-task audio–visual speech recognition, [...] Read more.

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challenges, we present a new multi-task audio–visual speech recognition, or MultiAVSR, framework for training a model on all three types of speech recognition simultaneously primarily to improve visual speech recognition. Unlike prior works which use separate models or complex semi-supervision, our framework employs a supervised multi-task hybrid Connectionist Temporal Classification/Attention loss cutting training exaFLOPs to just 18% of that required by semi-supervised multitask models. MultiAVSR achieves state-of-the-art visual speech recognition word error rate of 21.0% on the LRS3-TED dataset. Furthermore, it exhibits robust generalization capabilities, achieving a remarkable 44.7% word error rate on the WildVSR dataset. Our framework also demonstrates reduced dependency on external language models, which is critical for real-time visual speech recognition. For the audio and audio–visual tasks, our framework improves the robustness under various noisy environments with average relative word error rate improvements of 16% and 31%, respectively. These improvements across the three tasks illustrate the robust results our supervised multi-task speech recognition framework enables. Full article

(This article belongs to the Special Issue Advances in Information, Intelligence, Systems and Applications)

► Show Figures

Figure 1

25 pages, 7043 KiB

Open AccessArticle

Impacts of Consumers’ Heterogeneity on Decision-Making in Electric Vehicle Adoption: An Integrated Model

by Wen Xu, Irina Harris, Jin Li, Peter Wells and Gordon Foxall

Sustainability 2025, 17(11), 4981; https://doi.org/10.3390/su17114981 - 29 May 2025

Viewed by 611

Abstract

Understanding consumer heterogeneity is crucial for analysing attitude formation and its role in innovation diffusion. Traditional top-down models struggle to reflect the nuanced characteristics and activities of the consumer population, while bottom-up approaches like agent-based modelling (ABM) offer the ability to simulate individual [...] Read more.

Understanding consumer heterogeneity is crucial for analysing attitude formation and its role in innovation diffusion. Traditional top-down models struggle to reflect the nuanced characteristics and activities of the consumer population, while bottom-up approaches like agent-based modelling (ABM) offer the ability to simulate individual decision-making in social networks. However, current ABM applications often lack a strong theoretical foundation. This study introduces a novel, theory-driven ABM framework to examine the heterogeneity of consumer attitude formation, focusing on electric vehicle (EV) adoption across consumer segments. The model incorporates non-linear decision-making rules grounded in established consumer theories, incorporating Rogers’s Diffusion of Innovations, Social Influence Theory, and Theory of Planned Behaviour. The consumer agents are characterised using UK empirical data, and are segmented into early adopters, early majority, late majority, and laggards. Social interactions and attitude formation are simulated, micro-validated, and optimised using supervised machine learning (SML) approaches. The results reveal that early adopters and early majority are highly responsive to social influences, environmental beliefs, and external events such as the pandemic and the war conflict in performing pro-EV attitudes. In contrast, late majority and laggards show more stable or delayed responses. These findings provide actionable insights for targeting segments to enhance EV adoption strategies. Full article

► Show Figures

Figure 1

11 pages, 817 KiB

Open AccessArticle

Machine Learning Based Assessment of Inguinal Lymph Node Metastasis in Patients with Squamous Cell Carcinoma of the Vulva

by Gilbert Georg Klamminger, Meletios P. Nigdelis, Annick Bitterlich, Bashar Haj Hamoud, Erich-Franz Solomayer, Annette Hasenburg and Mathias Wagner

J. Clin. Med. 2025, 14(10), 3510; https://doi.org/10.3390/jcm14103510 - 17 May 2025

Viewed by 520

Abstract

Background/Objectives: Despite great efforts from both clinical and pathological sides to address the extent of metastatic inguinal lymph node involvement in patients with vulvar cancer, current research attempts are still mostly aimed at identifying new imaging parameters or superior tissue diagnostic workflows [...] Read more.

Background/Objectives: Despite great efforts from both clinical and pathological sides to address the extent of metastatic inguinal lymph node involvement in patients with vulvar cancer, current research attempts are still mostly aimed at identifying new imaging parameters or superior tissue diagnostic workflows rather than alternative ways of statistical data analysis. In the present study, we therefore establish a supervised machine learning algorithm to predict groin metastasis in patients with squamous cell carcinoma of the vulva (VSCC) based on classical histomorphological features. Methods: In total, 157 patients with VSCC were included in this retrospective study. After initial exploration of valuable clinicopathological predictor variables by means of Spearman correlation, a decision tree was trained and internally validated (5-fold cross-validation) using a training data set (n = 126) and afterwards externally validated employing a holdout validation data set (n = 31) using standard metrices such sensitivity, positive predictive value, and AUROC curve. Results: Our established classifier can predict inguinal lymph node status with an internal accuracy of 79.4% (AUROC value = 0.64). Reaching similar performances and an overall accuracy of 83.9% on an unknown data input (external validation set), our classifier demonstrates robustness. Conclusions: The presented results suggest that machine learning can predict groin lymph node status in VSCC based on histological findings of the primary tumor. Such research attempts may be useful in the future for an additional assessment of inguinal lymph nodes, aiming to maximize oncological safety when targeting the most accurate diagnosis of lymph node involvement. Full article

(This article belongs to the Special Issue Emerging Diagnostic and Treatment Approaches for Gynecological Cancers)

► Show Figures

Figure 1

20 pages, 1902 KiB

Open AccessArticle

Distantly Supervised Relation Extraction Method Based on Multi-Level Hierarchical Attention

by Zhaoxin Xuan, Hejing Zhao, Xin Li and Ziqi Chen

Information 2025, 16(5), 364; https://doi.org/10.3390/info16050364 - 29 Apr 2025

Viewed by 436

Abstract

Distantly Supervised Relation Extraction (DSRE) aims to automatically identify semantic relationships within large text corpora by aligning with external knowledge bases. Despite the success of current methods in automating data annotation, they introduce two main challenges: label noise and data long-tail distribution. Label [...] Read more.

Distantly Supervised Relation Extraction (DSRE) aims to automatically identify semantic relationships within large text corpora by aligning with external knowledge bases. Despite the success of current methods in automating data annotation, they introduce two main challenges: label noise and data long-tail distribution. Label noise results in inaccurate annotations, which can undermine the quality of relation extraction. The long-tail problem, on the other hand, leads to an imbalanced model that struggles to extract less frequent, long-tail relations. In this paper, we introduce a novel relation extraction framework based on multi-level hierarchical attention. This approach utilizes Graph Attention Networks (GATs) to model the hierarchical structure of the relations, capturing the semantic dependencies between relation types and generating relation embeddings that reflect the overall hierarchical framework. To improve the classification process, we incorporate a multi-level classification structure guided by hierarchical attention, which enhances the accuracy of both head and tail relation extraction. A local probability constraint is introduced to ensure coherence across the classification levels, fostering knowledge transfer from frequent to less frequent relations. Experimental evaluations on the New York Times (NYT) dataset demonstrate that our method outperforms existing baselines, particularly in the context of long-tail relation extraction, offering a comprehensive solution to the challenges of DSRE. Full article

(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)

► Show Figures

Figure 1

13 pages, 810 KiB

Open AccessArticle

In Silico Methods for Assessing Cancer Immunogenicity—A Comparison Between Peptide and Protein Models

by Stanislav Sotirov and Ivan Dimitrov

Appl. Sci. 2025, 15(8), 4123; https://doi.org/10.3390/app15084123 - 9 Apr 2025

Viewed by 558

Abstract

Identifying and characterizing putative tumor antigens is essential to cancer vaccine development. Given the impracticality of isolating and evaluating each potential antigen individually, in silico prediction algorithms, especially those employing machine learning (ML) techniques, are indispensable. These algorithms substantially decrease the experimental workload [...] Read more.

Identifying and characterizing putative tumor antigens is essential to cancer vaccine development. Given the impracticality of isolating and evaluating each potential antigen individually, in silico prediction algorithms, especially those employing machine learning (ML) techniques, are indispensable. These algorithms substantially decrease the experimental workload required for discovering viable vaccine candidates, thereby accelerating the development process and enhancing the efficiency of identifying promising immunogenic targets. In this study, we employed six supervised ML methods on a dataset containing 546 experimentally validated immunogenic human tumor proteins and 548 non-immunogenic human proteins to develop models for immunogenicity prediction. These models included k-nearest neighbor (kNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). After validation through internal cross-validation and an external test set, the best-performing models (QDA, RF, and XGBoost) were selected for further evaluation. A comparison between the chosen protein models and our previously developed peptide models for tumor immunogenicity prediction revealed that the peptide models slightly outperformed the protein models. However, since both proteins and peptides can be subject to tumor immunogenicity assessment, evaluating each with the respective models is prudent. The three selected protein models are set to be integrated into the new version of the VaxiJen server. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI