MDPI - Publisher of Open Access Journals

17 pages, 722 KB

Open AccessArticle

Development of a Machine Learning Model for Predicting Treatment-Related Amenorrhea in Young Women with Breast Cancer

by Long Song, Zobaida Edib, Uwe Aickelin, Hadi Akbarzadeh Khorshidi, Anne-Sophie Hamy, Yasmin Jayasinghe, Martha Hickey, Richard A. Anderson, Matteo Lambertini, Margherita Condorelli, Isabelle Demeestere, Michail Ignatiadis, Barbara Pistilli, H. Irene Su, Shanton Chang, Patrick Cheong-Iao Pang, Fabien Reyal, Scott M. Nelson, Paniti Sukumvanich, Alessandro Minisini, Fabio Puglisi, Kathryn J. Ruddy, Fergus J. Couch, Janet E. Olson, Kate Stern, Franca Agresta, Lesley Stafford, Laura Chin-Lenn, Wanda Cui, Antoinette Anazodo, Alexandra Gorelik, Tuong L. Nguyen, Ann Partridge, Christobel Saunders, Elizabeth Sullivan, Mary Macheras-Magias and Michelle Peate Show full author list Hide full author list

Bioengineering 2025, 12(11), 1171; https://doi.org/10.3390/bioengineering12111171 - 28 Oct 2025

Viewed by 277

Abstract

Treatment-induced ovarian function loss is a significant concern for many young patients with breast cancer. Accurately predicting this risk is crucial for counselling young patients and informing their fertility-related decision-making. However, current risk prediction models for treatment-related ovarian function loss have limitations. To [...] Read more.

Treatment-induced ovarian function loss is a significant concern for many young patients with breast cancer. Accurately predicting this risk is crucial for counselling young patients and informing their fertility-related decision-making. However, current risk prediction models for treatment-related ovarian function loss have limitations. To provide a broader representation of patient cohorts and improve feature selection, we combined retrospective data from six datasets within the FoRECAsT (Infertility after Cancer Predictor) databank, including 2679 pre-menopausal women diagnosed with breast cancer. This combined dataset presented notable missingness, prompting us to employ cross imputation using the k-nearest neighbours (KNN) machine learning (ML) algorithm. Employing Lasso regression, we developed an ML model to forecast the risk of treatment-related amenorrhea as a surrogate marker of ovarian function loss at 12 months after starting chemotherapy. Our model identified 20 variables significantly associated with risk of developing amenorrhea. Internal validation resulted in an area under the receiver operating characteristic curve (AUC) of 0.820 (95% CI: 0.817–0.823), while external validation with another dataset demonstrated an AUC of 0.743 (95% CI: 0.666–0.818). A cutoff of 0.20 was chosen to achieve higher sensitivity in validation, as false negatives—patients incorrectly classified as likely to regain menses—could miss timely opportunities for fertility preservation if desired. At this threshold, internal validation yielded sensitivity and precision rates of 91.3% and 61.7%, respectively, while external validation showed 92.9% and 60.0%. Leveraging ML methodologies, we not only devised a model for personalised risk prediction of amenorrhea, demonstrating substantial enhancements over existing models but also showcased a robust framework for maximally harnessing available data sources. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence for Medical Diagnosis)

► Show Figures

Figure 1

22 pages, 4825 KB

Open AccessArticle

Multidimensional Visualization and AI-Driven Prediction Using Clinical and Biochemical Biomarkers in Premature Cardiovascular Aging

by Kuat Abzaliyev, Madina Suleimenova, Symbat Abzaliyeva, Madina Mansurova, Adai Shomanov, Akbota Bugibayeva, Arai Tolemisova, Almagul Kurmanova and Nargiz Nassyrova

Biomedicines 2025, 13(10), 2482; https://doi.org/10.3390/biomedicines13102482 - 12 Oct 2025

Viewed by 368

Abstract

Background: Cardiovascular diseases (CVDs) remain the primary cause of global mortality, with arterial hypertension, ischemic heart disease (IHD), and cerebrovascular accident (CVA) forming a progressive continuum from early risk factors to severe outcomes. While numerous studies focus on isolated biomarkers, few integrate multidimensional [...] Read more.

Background: Cardiovascular diseases (CVDs) remain the primary cause of global mortality, with arterial hypertension, ischemic heart disease (IHD), and cerebrovascular accident (CVA) forming a progressive continuum from early risk factors to severe outcomes. While numerous studies focus on isolated biomarkers, few integrate multidimensional visualization with artificial intelligence to reveal hidden, clinically relevant patterns. Methods: We conducted a comprehensive analysis of 106 patients using an integrated framework that combined clinical, biochemical, and lifestyle data. Parameters included renal function (glomerular filtration rate, cystatin C), inflammatory markers, lipid profile, enzymatic activity, and behavioral factors. After normalization and imputation, we applied correlation analysis, parallel coordinates visualization, t-distributed stochastic neighbor embedding (t-SNE) with k-means clustering, principal component analysis (PCA), and Random Forest modeling with SHAP (SHapley Additive exPlanations) interpretation. Bootstrap resampling was used to estimate 95% confidence intervals for mean absolute SHAP values, assessing feature stability. Results: Consistent patterns across outcomes revealed impaired renal function, reduced physical activity, and high hypertension prevalence in IHD and CVA. t-SNE clustering achieved complete separation of a high-risk group (100% CVD-positive) from a predominantly low-risk group (7.8% CVD rate), demonstrating unsupervised validation of biomarker discriminative power. PCA confirmed multidimensional structure, while Random Forest identified renal function, hypertension status, and physical activity as dominant predictors, achieving robust performance (Accuracy 0.818; AUC-ROC 0.854). SHAP analysis identified arterial hypertension, BMI, and physical inactivity as dominant predictors, complemented by renal biomarkers (GFR, cystatin) and NT-proBNP. Conclusions: This study pioneers the integration of multidimensional visualization and AI-driven analysis for CVD risk profiling, enabling interpretable, data-driven identification of high- and low-risk clusters. Despite the limited single-center cohort (n = 106) and cross-sectional design, the findings highlight the potential of interpretable models for precision prevention and transparent decision support in cardiovascular aging research. Full article

(This article belongs to the Section Molecular and Translational Medicine)

► Show Figures

Figure 1

24 pages, 1582 KB

Open AccessArticle

Future Internet Applications in Healthcare: Big Data-Driven Fraud Detection with Machine Learning

by Konstantinos P. Fourkiotis and Athanasios Tsadiras

Future Internet 2025, 17(10), 460; https://doi.org/10.3390/fi17100460 - 8 Oct 2025

Viewed by 506

Abstract

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing [...] Read more.

Hospital fraud detection has often relied on periodic audits that miss evolving, internet-mediated patterns in electronic claims. An artificial intelligence and machine learning pipeline is being developed that is leakage-safe, imbalance aware, and aligned with operational capacity for large healthcare datasets. The preprocessing stack integrates four tables, engineers 13 features, applies imputation, categorical encoding, Power transformation, Boruta selection, and denoising autoencoder representations, with class balancing via SMOTE-ENN evaluated inside cross-validation folds. Eight algorithms are compared under a fraud-oriented composite productivity index that weighs recall, precision, MCC, F1, ROC-AUC, and G-Mean, with per-fold threshold calibration and explicit reporting of Type I and Type II errors. Multilayer perceptron attains the highest composite index, while CatBoost offers the strongest control of false positives with high accuracy. SMOTE-ENN provides limited gains once representations regularize class geometry. The calibrated scores support prepayment triage, postpayment audit, and provider-level profiling, linking alert volume to expected recovery and protecting investigator workload. Situated in the Future Internet context, this work targets internet-mediated claim flows and web-accessible provider registries. Governance procedures for drift monitoring, fairness assessment, and change control complete an internet-ready deployment path. The results indicate that disciplined preprocessing and evaluation, more than classifier choice alone, translate AI improvements into measurable economic value and sustainable fraud prevention in digital health ecosystems. Full article

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—4th Edition)

► Show Figures

Figure 1

15 pages, 470 KB

Open AccessArticle

Factors Associated with Being on Track for Early Childhood Development in Kinshasa: A Community-Based Cross-Sectional Study

by Berthold M. Bondo, Francis K. Kabasubabo, Nicaise M. Muyulu, Din-Ar B. Batuli, Gloria B. Bukasa, Paulin B. Mutombo and Pierre Z. Akilimali

Children 2025, 12(10), 1329; https://doi.org/10.3390/children12101329 - 3 Oct 2025

Viewed by 421

Abstract

Background/Objectives: This study examines the associations between household socioeconomic status (SES), child nutrition, and developmental status among children aged 24–59 months in the Mont Ngafula health zone in Kinshasa. The primary research question focuses on how SES and stunting affect developmental outcomes in [...] Read more.

Background/Objectives: This study examines the associations between household socioeconomic status (SES), child nutrition, and developmental status among children aged 24–59 months in the Mont Ngafula health zone in Kinshasa. The primary research question focuses on how SES and stunting affect developmental outcomes in early childhood. Methods: A cross-sectional analysis was conducted involving 348 children, assessing developmental outcomes using the Early Childhood Development Index (ECDI2030). Results: The study found that 70.4% of children were classified as on track, with ONTRACK prevalence increasing across SES tertiles. Children who attended preschool education had higher odds of being on track. The rich tertile had higher odds of being on track than those in the poor tertile, while the middle tertile showed a weaker association. Child age categories and stunting were inversely associated with being developmentally on track. The results are consistent with multiple imputation sensitivity analyses. Conclusions: The study concludes that preschool attendance and a higher household socioeconomic position are strongly associated with better early developmental outcomes, while an age of 48–59 months and stunting are associated with a markedly lower likelihood of being developmentally on track. Integrated policies that reduce household poverty, promote early education, and prevent/treat early faltering growth could improve early childhood developmental trajectories. Full article

(This article belongs to the Section Global Pediatric Health)

► Show Figures

Figure 1

23 pages, 3914 KB

Open AccessArticle

Machine Learning-Driven Early Productivity Forecasting for Post-Fracturing Multilayered Wells

by Ruibin Zhu, Ning Li, Guohua Liu, Fengjiao Qu, Changjun Long, Xin Wang, Shuzhi Xiu, Fei Ling, Qinzhuo Liao and Gensheng Li

Water 2025, 17(19), 2804; https://doi.org/10.3390/w17192804 - 24 Sep 2025

Viewed by 434

Abstract

Hydraulic fracturing technology significantly enhances reservoir conductivity by creating artificial fractures, serving as a crucial means for the economically viable development of low-permeability reservoirs. Accurate prediction of post-fracturing productivity is essential for optimizing fracturing parameter design and establishing scientific production strategies. However, current [...] Read more.

Hydraulic fracturing technology significantly enhances reservoir conductivity by creating artificial fractures, serving as a crucial means for the economically viable development of low-permeability reservoirs. Accurate prediction of post-fracturing productivity is essential for optimizing fracturing parameter design and establishing scientific production strategies. However, current limitations in understanding post-fracturing production dynamics and the lack of efficient prediction methods severely constrain the evaluation of fracturing effectiveness and the adjustment of development plans. This study proposes a machine learning-based method for predicting post-fracturing productivity in multi-layer commingled production wells and validates its effectiveness using a key block from the PetroChina North China Huabei Oilfield Company. During the data preprocessing stage, the three-sigma rule, median absolute deviation, and density-based spatial clustering of applications with noise were employed to detect outliers, while missing values were imputed using the K-nearest neighbors method. Feature selection was performed using Pearson correlation coefficient and variance inflation factor, resulting in the identification of twelve key parameters as input features. The coefficient of determination served as the evaluation metric, and model hyperparameters were optimized using grid search combined with cross-validation. To address the multi-layer commingled production challenge, seven distinct datasets incorporating production parameters were constructed based on four geological parameter partitioning methods: thickness ratio, porosity–thickness product ratio, permeability–thickness product ratio, and porosity–permeability–thickness product ratio. Twelve machine learning models were then applied for training. Through comparative analysis, the most suitable productivity prediction model for the block was selected, and the block’s productivity patterns were revealed. The results show that after training with block-partitioned data, the accuracy of all models has improved; further stratigraphic subdivision based on block partitioning has led the models to reach peak performance. However, data volume is a critical limiting factor—for blocks with insufficient data, stratigraphic subdivision instead results in a decline in prediction performance. Full article

(This article belongs to the Special Issue Revolutionizing Hydraulic Fracturing with Machine Learning and Data-Driven Approaches)

► Show Figures

Figure 1

20 pages, 948 KB

Open AccessArticle

High-Accuracy Classification of Parkinson’s Disease Using Ensemble Machine Learning and Stabilometric Biomarkers

by Ana Carolina Brisola Brizzi, Osmar Pinto Neto, Rodrigo Cunha de Mello Pedreiro and Lívia Helena Moreira

Neurol. Int. 2025, 17(9), 133; https://doi.org/10.3390/neurolint17090133 - 26 Aug 2025

Viewed by 1187

Abstract

Background: Accurate differentiation of Parkinson’s disease (PD) from healthy aging is crucial for timely intervention and effective management. Postural sway abnormalities are prominent motor features of PD. Quantitative stabilometry and machine learning (ML) offer a promising avenue for developing objective markers to [...] Read more.

Background: Accurate differentiation of Parkinson’s disease (PD) from healthy aging is crucial for timely intervention and effective management. Postural sway abnormalities are prominent motor features of PD. Quantitative stabilometry and machine learning (ML) offer a promising avenue for developing objective markers to support the diagnostic process. This study aimed to develop and validate high-performance ML models to classify individuals with PD and age-matched healthy older adults (HOAs) using a comprehensive set of stabilometric parameters. Methods: Thirty-seven HOAs (mean age 70 ± 6.8 years) and 26 individuals with idiopathic PD (Hoehn and Yahr stages 2–3, on medication; mean age 66 years ± 2.9 years), all aged 60–80 years, participated. Stabilometric data were collected using a force platform during quiet stance under eyes-open (EO) and eyes-closed (EC) conditions, from which 34 parameters reflecting the time- and frequency-domain characteristics of center-of-pressure (COP) sway were extracted. After data preprocessing, including mean imputation for missing values and feature scaling, three ML classifiers (Random Forest, Gradient Boosting, and Support Vector Machine) were hyperparameter-tuned using GridSearchCV with three-fold cross-validation. An ensemble voting classifier (soft voting) was constructed from these tuned models. Model performance was rigorously evaluated using 15 iterations of stratified train–test splits (70% train and 30% test) and an additional bootstrap procedure of 1000 iterations to derive reliable 95% confidence intervals (CIs). Results: Our optimized ensemble voting classifier achieved excellent discriminative power, distinguishing PD from HOAs with a mean accuracy of 0.91 (95% CI: 0.81–1.00) and a mean Area Under the ROC Curve (AUC ROC) of 0.97 (95% CI: 0.92–1.00). Importantly, feature analysis revealed that anteroposterior sway velocity with eyes open (V-AP) and total sway path with eyes closed (TOD_EC, calculated using COP displacement vectors from its mean position) are the most robust and non-invasive biomarkers for differentiating the groups. Conclusions: An ensemble ML approach leveraging stabilometric features provides a highly accurate, non-invasive method to distinguish PD from healthy aging and may augment clinical assessment and monitoring. Full article

(This article belongs to the Section Movement Disorders and Neurodegenerative Diseases)

► Show Figures

Graphical abstract

16 pages, 1109 KB

Open AccessArticle

Development and Validation of a Machine Learning Model for Early Prediction of Acute Kidney Injury in Neurocritical Care: A Comparative Analysis of XGBoost, GBM, and Random Forest Algorithms

by Keun Soo Kim, Tae Jin Yoon, Joonghyun Ahn and Jeong-Am Ryu

Diagnostics 2025, 15(16), 2061; https://doi.org/10.3390/diagnostics15162061 - 17 Aug 2025

Viewed by 804

Abstract

Background: Acute Kidney Injury (AKI) is a pivotal concern in neurocritical care, impacting patient survival and quality of life. This study harnesses machine learning (ML) techniques to predict the occurrence of AKI in patients receiving hyperosmolar therapy, aiming to optimize patient outcomes in [...] Read more.

Background: Acute Kidney Injury (AKI) is a pivotal concern in neurocritical care, impacting patient survival and quality of life. This study harnesses machine learning (ML) techniques to predict the occurrence of AKI in patients receiving hyperosmolar therapy, aiming to optimize patient outcomes in neurocritical settings. Methods: We conducted a retrospective cohort study of 4886 patients who underwent hyperosmolar therapy in the neurosurgical intensive care unit (ICU). Comparative predictive analyses were carried out using advanced ML algorithms—eXtreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Random Forest (RF)—against standard multivariate logistic regression. Predictive performance was assessed using an 8:2 training-testing data split, with model fine-tuning through cross-validation. Results: The RF with KNN imputation showed slightly better performance than other approaches in predicting AKI. When applied to an independent test set, it achieved a sensitivity of 79% (95% CI: 70–87%) and specificity of 85% (95% CI: 82–88%), with an overall accuracy of 84% (95% CI: 81–87%) and AUROC of 0.86 (95% CI: 0.82–0.91). The multivariate logistic regression analysis, while informative, showed less predictive strength compared to the ML models. Delta chloride levels and serum osmolality proved to be the most influential predictors, with additional significant variables including pH, age, bicarbonate, and the osmolar gap. Conclusions: The prominence of delta chloride and serum osmolality among the predictive variables underscores its potential as a biomarker for AKI risk in this patient population. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

27 pages, 839 KB

Open AccessArticle

AI-Powered Forecasting of Environmental Impacts and Construction Costs to Enhance Project Management in Highway Projects

by Joon-Soo Kim

Buildings 2025, 15(14), 2546; https://doi.org/10.3390/buildings15142546 - 19 Jul 2025

Cited by 1 | Viewed by 982

Abstract

The accurate early-stage estimation of environmental load (EL) and construction cost (CC) in road infrastructure projects remains a significant challenge, constrained by limited data and the complexity of construction activities. To address this, our study proposes a machine learning-based predictive framework utilizing artificial [...] Read more.

The accurate early-stage estimation of environmental load (EL) and construction cost (CC) in road infrastructure projects remains a significant challenge, constrained by limited data and the complexity of construction activities. To address this, our study proposes a machine learning-based predictive framework utilizing artificial neural networks (ANNs) and deep neural networks (DNNs), enhanced by autoencoder-driven feature selection. A structured dataset of 150 completed national road projects in South Korea was compiled, covering both planning and design phases. The database focused on 19 high-impact sub-work types to reduce noise and improve prediction precision. A hybrid imputation approach—combining mean substitution with random forest regression—was applied to handle 4.47% missing data in the design-phase inputs, reducing variance by up to 5% and improving data stability. Dimensionality reduction via autoencoder retained 16 core variables, preserving 97% of explanatory power while minimizing redundancy. ANN models benefited from cross-validation and hyperparameter tuning, achieving consistent performance across training and validation sets without overfitting (MSE = 0.06, RMSE = 0.24). The optimal ANN yielded average error rates of 29.8% for EL and 21.0% for CC at the design stage. DNN models, with their deeper architectures and dropout regularization, further improved performance—achieving 27.1% (EL) and 17.0% (CC) average error rates at the planning stage and 24.0% (EL) and 14.6% (CC) at the design stage. These results met all predefined accuracy thresholds, underscoring the DNN’s advantage in handling complex, high-variance data while the ANN excelled in structured cost prediction. Overall, the synergy between deep learning and autoencoder-based feature selection offers a scalable and data-informed approach for enhancing early-stage environmental and economic assessments in road infrastructure planning—supporting more sustainable and efficient project management. Full article

(This article belongs to the Special Issue Practice and Application of Artificial Intelligence in Urban Decision-Making)

► Show Figures

Figure 1

33 pages, 15612 KB

Open AccessArticle

A Personalized Multimodal Federated Learning Framework for Skin Cancer Diagnosis

by Shuhuan Fan, Awais Ahmed, Xiaoyang Zeng, Rui Xi and Mengshu Hou

Electronics 2025, 14(14), 2880; https://doi.org/10.3390/electronics14142880 - 18 Jul 2025

Cited by 1 | Viewed by 1676

Abstract

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable [...] Read more.

Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable knowledge-sharing without compromising patient confidentiality. While federated learning (FL) offers a promising solution, existing methods struggle with heterogeneous and missing modalities across institutions, which reduce the diagnostic accuracy. To address these challenges, we propose an effective and flexible Personalized Multimodal Federated Learning framework (PMM-FL), which enables efficient cross-client knowledge transfer while maintaining personalized performance under heterogeneous and incomplete modality conditions. Our study contains three key contributions: (1) A hierarchical aggregation strategy that decouples multi-module aggregation from local deployment via global modular-separated aggregation and local client fine-tuning. Unlike conventional FL (which synchronizes all parameters in each round), our method adopts a frequency-adaptive synchronization mechanism, updating parameters based on their stability and functional roles. (2) A multimodal fusion approach based on multitask learning, integrating learnable modality imputation and attention-based feature fusion to handle missing modalities. (3) A custom dataset combining multi-year International Skin Imaging Collaboration(ISIC) challenge data (2018–2024) to ensure comprehensive coverage of diverse skin cancer types. We evaluate PMM-FL through diverse experiment settings, demonstrating its effectiveness in heterogeneous and incomplete modality federated learning settings, achieving 92.32% diagnostic accuracy with only a 2% drop in accuracy under 30% modality missingness, with a 32.9% communication overhead decline compared with baseline FL methods. Full article

(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)

► Show Figures

Figure 1

18 pages, 1141 KB

Open AccessArticle

Machine Learning Approaches for Early Detection of Ossification of Posterior Longitudinal Ligament in Health Screening Settings

by Ryo Mizukoshi, Ryosuke Maruiwa, Keitaro Ito, Norihiro Isogai, Haruki Funao, Retsu Fujita and Mitsuru Yagi

Bioengineering 2025, 12(7), 749; https://doi.org/10.3390/bioengineering12070749 - 9 Jul 2025

Viewed by 964

Abstract

Early detection of ossification of the posterior longitudinal ligament (OPLL) is hampered by the late onset of neurological symptoms, so we built and validated an interpretable machine learning model to identify OPLL during routine health examinations. We retrospectively analyzed 1442 Japanese adults screened [...] Read more.

Early detection of ossification of the posterior longitudinal ligament (OPLL) is hampered by the late onset of neurological symptoms, so we built and validated an interpretable machine learning model to identify OPLL during routine health examinations. We retrospectively analyzed 1442 Japanese adults screened between 2020 and 2023, including 432 imaging-confirmed cases, after median imputation, one-hot encoding, Random Forest feature selection that reduced 235 variables to 20, and class-balance correction with SMOTE. Logistic regression, Random Forest, Gradient Boosting, and XGBoost models were tuned using a 5-fold cross-validated grid search, in which a re-estimated logistic regression yielded odds ratios for clinical interpretation. The logistic model achieved 65% accuracy and an AUROC of 0.69 (95% CI 0.66–0.76), matching tree-based models, yet with fewer false-negatives. Advanced age (OR 1.60, 95% CI 1.27–2.00) and elevated CA19-9 (OR 1.24, 95% CI 1.00–1.35) independently increased OPLL odds. This concise, explainable tool could facilitate early recognition of OPLL, reduce unnecessary follow-up, and enable timely preventive interventions in high-volume screening programs. Full article

(This article belongs to the Section Biomedical Engineering and Biomaterials)

► Show Figures

Figure 1

23 pages, 943 KB

Open AccessReview

Establishing Best Practices for Clinical GWAS: Tackling Imputation and Data Quality Challenges

by Giorgio Casaburi, Ron McCullough and Valeria D’Argenio

Int. J. Mol. Sci. 2025, 26(13), 6397; https://doi.org/10.3390/ijms26136397 - 3 Jul 2025

Viewed by 1731

Abstract

Genome-wide association studies (GWASs) play a central role in precision medicine, powering a range of clinical applications from pharmacogenomics to disease risk prediction. A critical component of GWASs is genotype imputation, a computational method used to infer untyped genetic variants. While imputation increases [...] Read more.

Genome-wide association studies (GWASs) play a central role in precision medicine, powering a range of clinical applications from pharmacogenomics to disease risk prediction. A critical component of GWASs is genotype imputation, a computational method used to infer untyped genetic variants. While imputation increases variant coverage by estimating genotypes at untyped loci, this expanded coverage can enhance the ability to detect genetic associations in some cases. However, imputation also introduces biases, particularly for rare variants and underrepresented populations, which may compromise clinical accuracy. This review examines the challenges and clinical implications of genotype imputation errors, including their impact on therapeutic decisions and predictive models, like polygenic risk scores (PRSs). In particular, the sources of imputation errors have been deeply explored, emphasizing the disparities in performance across ancestral populations and downstream effects on healthcare equity and addressing ethical considerations surrounding the access to equitable genomic resources. Based on the above, we propose evidence-based best practices for clinical GWAS implementation, including the direct genotyping of clinically actionable variants, the cross-population validation of imputation models, the transparent reporting of imputation quality metrics, and the use of ancestry-matched reference panels. As genomic data becomes increasingly adopted in healthcare systems worldwide, ensuring the accuracy and inclusivity of GWAS-derived insights is paramount. Here, we suggest a framework for the responsible clinical integration of imputed genetic data, paving the way for more reliable and equitable personalized medicine. Full article

(This article belongs to the Section Molecular Genetics and Genomics)

► Show Figures

Graphical abstract

18 pages, 426 KB

Open AccessArticle

Reshaping Urban Innovation Landscapes for Green Growth: The Role of Smart City Policies in Digital Transformation

by Dayu Zhu and Shengyong Zhang

Reg. Sci. Environ. Econ. 2025, 2(3), 16; https://doi.org/10.3390/rsee2030016 - 27 Jun 2025

Viewed by 576

Abstract

Under the impetus of the global urbanization, the synergistic relationship between smart city policies and green innovation capabilities has emerged as a critical agenda for achieving sustainable development goals. While existing studies have explored the techno-economic effects of smart cities, systematic evidence remains [...] Read more.

Under the impetus of the global urbanization, the synergistic relationship between smart city policies and green innovation capabilities has emerged as a critical agenda for achieving sustainable development goals. While existing studies have explored the techno-economic effects of smart cities, systematic evidence remains scarce regarding their pathways and heterogeneous impacts on green growth. This study investigates the influence of smart city pilot policies on urban green growth trajectories and their heterogeneous characteristics. Leveraging panel data from 293 Chinese prefecture-level cities, we employ a multi-period difference-in-differences (DID) model with two-way fixed effects to control for unobserved city-specific and time-specific factors, complemented by robustness checks including parallel trend tests, placebo tests, and alternative dependent variable specifications. Data sources encompass the China City Statistical Yearbook, CNRDS, and CSMAR databases, covering core metrics such as green patent applications and grants, industrial upgrading indices, and environmental regulation intensity, with missing values being addressed via mean imputation. The findings demonstrate that smart city pilot policies significantly enhance green innovation levels in treated cities, with effects exhibiting pronounced spatial and resource-based heterogeneity; there are notably stronger impacts in non-resource-dependent cities and eastern regions. Mechanism analysis shows that policies are driven by a dual effect of industrial upgrading and environmental regulation. The former is manifested by the high substitution elasticity of the digital economy for traditional manufacturing, while the latter is reflected in the rising compliance costs of polluting enterprises. This research advances a cross-nationally comparable theoretical framework for understanding green transition mechanisms in smart city development while providing empirical benchmarks for policy design in emerging economies. Full article

► Show Figures

Figure 1

27 pages, 2815 KB

Open AccessArticle

Machine Learning-Augmented Triage for Sepsis: Real-Time ICU Mortality Prediction Using SHAP-Explained Meta-Ensemble Models

by Hülya Yilmaz Başer, Turan Evran and Mehmet Akif Cifci

Biomedicines 2025, 13(6), 1449; https://doi.org/10.3390/biomedicines13061449 - 12 Jun 2025

Cited by 1 | Viewed by 2193

Abstract

Background/Objectives: Optimization algorithms are acknowledged to be critical in various fields and dynamical systems since they provide facilitation in identifying and retrieving the most possible solutions concerning complex problems besides improving efficiency, cutting down on costs, and boosting performance. Metaheuristic optimization algorithms, on [...] Read more.

Background/Objectives: Optimization algorithms are acknowledged to be critical in various fields and dynamical systems since they provide facilitation in identifying and retrieving the most possible solutions concerning complex problems besides improving efficiency, cutting down on costs, and boosting performance. Metaheuristic optimization algorithms, on the other hand, are inspired by natural phenomena, providing significant benefits related to the applicable solutions for complex optimization problems. Considering that complex optimization problems emerge across various disciplines, their successful applications are possible to be observed in tasks of classification and feature selection tasks, including diagnostic processes of certain health problems based on bio-inspiration. Sepsis continues to pose a significant threat to patient survival, particularly among individuals admitted to intensive care units from emergency departments. Traditional scoring systems, including qSOFA, SIRS, and NEWS, often fall short of delivering the precision necessary for timely and effective clinical decision-making. Methods: In this study, we introduce a novel, interpretable machine learning framework designed to predict in-hospital mortality in sepsis patients upon intensive care unit admission. Utilizing a retrospective dataset from a tertiary university hospital encompassing patient records from January 2019 to June 2024, we extracted comprehensive clinical and laboratory features. To address class imbalance and missing data, we employed the Synthetic Minority Oversampling Technique and systematic imputation methods, respectively. Our hybrid modeling approach integrates ensemble-based ML algorithms with deep learning architectures, optimized through the Red Piranha Optimization algorithm for feature selection and hyperparameter tuning. The proposed model was validated through internal cross-validation and external testing on the MIMIC-III dataset as well. Results: The proposed model demonstrates superior predictive performance over conventional scoring systems, achieving an area under the receiver operating characteristic curve of 0.96, a Brier score of 0.118, and a recall of 81. Conclusions: These results underscore the potential of AI-driven tools to enhance clinical decision-making processes in sepsis management, enabling early interventions and potentially reducing mortality rates. Full article

(This article belongs to the Section Molecular and Translational Medicine)

► Show Figures

Figure 1

14 pages, 1107 KB

Open AccessArticle

PlantDeepMeth: A Deep Learning Model for Predicting DNA Methylation States in Plants

by Zhongwei Guo, Wenyuan Fan, Chengcheng Cai, Kang Zhang, Xilin Hou, Ying Li and Feng Cheng

Plants 2025, 14(11), 1724; https://doi.org/10.3390/plants14111724 - 5 Jun 2025

Viewed by 1189

Abstract

Cytosine DNA methylation (5mCs) is an important epigenetic modification in genomic research. However, the methylation states of some cytosine sites are not available due to the limitations of different studies, and there are few tools developed to deal with this problem, especially in [...] Read more.

Cytosine DNA methylation (5mCs) is an important epigenetic modification in genomic research. However, the methylation states of some cytosine sites are not available due to the limitations of different studies, and there are few tools developed to deal with this problem, especially in plants, which have more methylation types than animals. Here, we report PlantDeepMeth, a novel deep learning model that utilizes deep learning to predict DNA methylation states in plants. The evaluation of PlantDeepMeth on known cytosine sites in both the Brassica rapa and Arabidopsis thaliana genomes shows good performance in predicting methylation states, indicating that the tool is good at learning patterns for methylation imputation. Motif analysis of the model’s predictions identified specific motifs associated with hypo- or hyper-methylation states in B. rapa and A. thaliana, further revealing key regulatory patterns captured by the model. Moreover, cross-species validation between B. rapa and A. thaliana demonstrated the generalizability of PlantDeepMeth, with the model maintaining high performance across different plant species. These results highlight the effectiveness of PlantDeepMeth and demonstrate the potential of deep learning to advance plant genomics research. Full article

(This article belongs to the Special Issue Bioinformatics and Functional Genomics in Modern Plant Science)

► Show Figures

Figure 1

40 pages, 5595 KB

Open AccessArticle

Neural Network-Based Composite Risk Scoring for Stratification of Fecal Immunochemical Test-Positive Patients in Colorectal Cancer Screening: Findings from South-West Oltenia

by Alexandra-Georgiana Bocioagă, Carmen-Nicoleta Oancea, Dumitru Rădulescu, Bogdan Silviu Ungureanu, Vlad Florin Iovănescu, Dan Nicolae Florescu, Irina-Paula Doica, Victor-Mihai Sacerdoțianu, Liliana Streba, Tudorel Ciurea and Dan-Ionuț Gheonea

Cancers 2025, 17(11), 1868; https://doi.org/10.3390/cancers17111868 - 2 Jun 2025

Viewed by 1067

Abstract

Background: Colorectal cancer (CRC) remains a major cause of cancer-related mortality worldwide, underscoring the need for more efficient and resource-conscious screening strategies. Methods: We screened 51,437 individuals (50–74 y) in South-West Oltenia, Romania, with FIT values of ≥20 µg Hb/g. Of [...] Read more.

Background: Colorectal cancer (CRC) remains a major cause of cancer-related mortality worldwide, underscoring the need for more efficient and resource-conscious screening strategies. Methods: We screened 51,437 individuals (50–74 y) in South-West Oltenia, Romania, with FIT values of ≥20 µg Hb/g. Of the 2825 FIT-positive individuals, 1550 completed colonoscopy, and we recorded their age, sex, residence, education, comorbidities, medications, and FIT values. After imputing < 8% missing data via multiple imputation, we reduced dimensionality with an autoencoder (ReLU, dropout 0.5, L2, 100 epochs, batch 32) and applied K-Means clustering (k = 5). The following are examples of actionable clusters: Cluster 0 (“High-FIT malignant”): FIT > 200 µg/g, age > 65, diabetes; Cluster 2 (“Low-risk mixed”): FIT 100–199 µg/g, age < 60, no comorbidities; Cluster 3 (“Intermediate-risk older”): FIT 150–200 µg/g, ≥3 comorbidities, rural. Cluster labels were then predicted by a feed-forward neural network (64–32 neurons, dropout 0.6) and validated via 5-fold cross-validation plus a temporal hold-out. Results: Five distinct patient clusters were identified, enabling the development of a composite risk score. Notably, Cluster 0, characterized by elevated FIT levels, exhibited a malignancy rate of 50.91%, while the overall CRC diagnostic rate among colonoscoped patients was approximately 13.87%. This stratification model enhances the diagnostic yield by prioritizing high-risk patients for urgent colonoscopy and sparing low-risk individuals from unnecessary invasive procedures. Conclusions: The AI-driven composite risk score offers a refined framework for CRC risk stratification and optimized resource allocation. Its implementation can lead to earlier detection of advanced lesions, thereby improving patient outcomes. Further external validation on independent cohorts and regions is essential to confirm its broad utility, with potential future integration of additional biomarkers (e.g., genetic or omics-based indicators) to further enhance predictive accuracy. Full article

(This article belongs to the Section Clinical Research of Cancer)

► Show Figures

Figure 1

Search Results (76)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (76)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI