MDPI - Publisher of Open Access Journals

30 pages, 2016 KB

Open AccessArticle

A Novel Knowledge Fusion Ensemble for Diagnostic Differentiation of Pediatric Pneumonia and Acute Bronchitis

by Elif Dabakoğlu, Öyküm Esra Yiğit and Yaşar Topal

Diagnostics 2025, 15(17), 2258; https://doi.org/10.3390/diagnostics15172258 (registering DOI) - 6 Sep 2025

Background: Differentiating pediatric pneumonia from acute bronchitis remains a persistent clinical challenge due to overlapping symptoms, often leading to diagnostic uncertainty and inappropriate antibiotic use. Methods: This study introduces DAPLEX, a structured ensemble learning framework designed to enhance diagnostic accuracy and reliability. A [...] Read more.

Background: Differentiating pediatric pneumonia from acute bronchitis remains a persistent clinical challenge due to overlapping symptoms, often leading to diagnostic uncertainty and inappropriate antibiotic use. Methods: This study introduces DAPLEX, a structured ensemble learning framework designed to enhance diagnostic accuracy and reliability. A retrospective cohort of 868 pediatric patients was analyzed. DAPLEX was developed in three phases: (i) deployment of diverse base learners from multiple learning paradigms; (ii) multi-criteria evaluation and pruning based on generalization stability to retain a subset of well-generalized and stable learners; and (iii) complementarity-driven knowledge fusion. In the final phase, out-of-fold predicted probabilities from the retained base learners were combined with a consensus-based feature importance profile to construct a hybrid meta-input for a Multilayer Perceptron (MLP) meta-learner. Results: DAPLEX achieved a balanced accuracy of 95.3%, an F1-score of ~0.96, and a ROC-AUC of ~0.99 on an independent holdout test. Compared to the range of performance from the weakest to the strongest base learner, DAPLEX improved balanced accuracy by 3.5–5.2%, enhanced the F1-score by 4.4–5.6%, and increased sensitivity by a substantial 8.2–13.6%. Crucially, DAPLEX’s performance remained robust and consistent across all evaluated demographic subgroups, confirming its fairness and potential for broad clinical. Conclusions: The DAPLEX framework offers a robust and transparent pipeline for diagnostic decision support. By systematically integrating diverse predictive models and synthesizing both outcome predictions and key feature insights, DAPLEX substantially reduces diagnostic uncertainty in differentiating pediatric pneumonia and acute bronchitis and demonstrates strong potential for clinical application. Full article

(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Clinical Classification and Prediction)

38 pages, 4944 KB

Open AccessArticle

Integrated Survey Classification and Trend Analysis via LLMs: An Ensemble Approach for Robust Literature Synthesis

by Eleonora Bernasconi, Domenico Redavid and Stefano Ferilli

Electronics 2025, 14(17), 3404; https://doi.org/10.3390/electronics14173404 - 27 Aug 2025

Viewed by 377

Abstract

This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based [...] Read more.

This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based classifications, thereby enhancing reliability and mitigating individual model biases. We demonstrate the generalizability of our approach through comprehensive evaluation on two distinct domains: Question Answering (QA) systems and Computer Vision (CV) survey literature, using a dataset of 1154 real papers extracted from arXiv. Comprehensive visual evaluation tools, including distribution charts, heatmaps, confusion matrices, and statistical validation metrics, are employed to rigorously assess model performance and inter-model agreement. The framework incorporates advanced statistical measures, including k-fold cross-validation, Fleiss’ kappa for inter-rater reliability, and chi-square tests for independence to validate classification robustness. Extensive experimental evaluations demonstrate that this ensemble approach achieves superior performance compared to individual models, with accuracy improvements of 10.0% over the best single model on QA literature and 10.9% on CV literature. Furthermore, comprehensive cost–benefit analysis reveals that our automated approach reduces manual literature synthesis time by 95% while maintaining high classification accuracy (F1-score: 0.89 for QA, 0.87 for CV), making it a practical solution for large-scale literature analysis. The methodology effectively uncovers emerging research trends and persistent challenges across domains, providing researchers with powerful tools for continuous literature monitoring and informed decision-making in rapidly evolving scientific fields. Full article

(This article belongs to the Special Issue Knowledge Engineering and Data Mining, 3rd Edition)

► Show Figures

Figure 1

24 pages, 4754 KB

Open AccessArticle

Machine Learning Prediction of Short Cervix in Mid-Pregnancy Based on Multimodal Data from the First-Trimester Screening Period: An Observational Study in a High-Risk Population

by Shengyu Wu, Jiaqi Dong, Jifan Shi, Xiaoxian Qu, Yirong Bao, Xiaoyuan Mao, Mu Lv, Xuan Chen and Hao Ying

Biomedicines 2025, 13(9), 2057; https://doi.org/10.3390/biomedicines13092057 - 23 Aug 2025

Viewed by 500

Abstract

Background: A short cervix in the second trimester significantly increases preterm birth risk, yet no reliable first-trimester prediction method exists. Current guidelines lack consensus on which women should undergo transvaginal ultrasound (TVUS) screening for cost-effective prevention. Therefore, it is vital to establish [...] Read more.

Background: A short cervix in the second trimester significantly increases preterm birth risk, yet no reliable first-trimester prediction method exists. Current guidelines lack consensus on which women should undergo transvaginal ultrasound (TVUS) screening for cost-effective prevention. Therefore, it is vital to establish a highly accurate and economical method for use in the early stages of pregnancy to predict short cervix in mid-pregnancy. Methods: A total of 1480 pregnant women with singleton pregnancies and at least one risk factor for spontaneous preterm birth (<37 weeks) were recruited from January 2020 to December 2020 at the Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine. Cervical length was assessed at 20–24 weeks of gestation, with a short cervix defined as <25 mm. Feature selection employed tree models, regularization, and recursive feature elimination (RFE). Seven machine learning models (logistic regression, linear discriminant analysis, k-nearest neighbors, support vector machine, decision tree, random forest, XGBoost) were trained to predict mid-trimester short cervix. The XGBoost model—an ensemble method leveraging sequential decision trees—was analyzed using Shapley Additive Explanation (SHAP) values to assess feature importance, revealing consistent associations between clinical predictors and outcomes that align with known clinical patterns. Results: Among 1480 participants, 376 (25.4%) developed mid-trimester short cervix. The XGBoost-based prediction model demonstrated high predictive performance in the training set (Recall = 0.838, F1 score = 0.848), test set (Recall = 0.850, F1 score = 0.910), and an independent dataset collected in January 2025 (Recall = 0.708, F1 score = 0.791), with SHAP analysis revealing pre-pregnancy BMI as the strongest predictor, followed by second-trimester pregnancy loss history, peripheral blood leukocyte count (WBC), and positive vaginal microbiological culture results (≥10⁵ CFU/mL, measured between 11⁺⁰ and 13⁺⁶ weeks). Conclusions: The XGBoost model accurately predicts mid-trimester short cervix using first-trimester clinical data, providing a 6-week window for targeted interventions before the 20–24-week gestational assessment. This early prediction could help guide timely preventive measures, potentially reducing the risk of spontaneous preterm birth (sPTB). Full article

(This article belongs to the Topic Development of Diagnosis and Treatment Modalities in Obstetrics and Gynecology)

► Show Figures

Figure 1

21 pages, 979 KB

Open AccessArticle

AI-Enhanced Coastal Flood Risk Assessment: A Real-Time Web Platform with Multi-Source Integration and Chesapeake Bay Case Study

by Paul Magoulick

Water 2025, 17(15), 2231; https://doi.org/10.3390/w17152231 - 26 Jul 2025

Viewed by 591

Abstract

A critical gap exists between coastal communities’ need for accessible flood risk assessment tools and the availability of sophisticated modeling, which remains limited by technical barriers and computational demands. This study introduces three key innovations through Coastal Defense Pro: (1) the first operational [...] Read more.

A critical gap exists between coastal communities’ need for accessible flood risk assessment tools and the availability of sophisticated modeling, which remains limited by technical barriers and computational demands. This study introduces three key innovations through Coastal Defense Pro: (1) the first operational web-based AI ensemble for coastal flood risk assessment integrating real-time multi-agency data, (2) an automated regional calibration system that corrects systematic model biases through machine learning, and (3) browser-accessible implementation of research-grade modeling previously requiring specialized computational resources. The system combines Bayesian neural networks with optional LSTM and attention-based models, implementing automatic regional calibration and multi-source elevation consensus through a modular Python architecture. Real-time API integration achieves >99% system uptime with sub-3-second response times via intelligent caching. Validation against Hurricane Isabel (2003) demonstrates correction from 197% overprediction (6.92 m predicted vs. 2.33 m observed) to accurate prediction through automated identification of a Chesapeake Bay-specific reduction factor of 0.337. Comprehensive validation against 15 major storms (1992–2024) shows substantial improvement over standard methods (RMSE = 0.436 m vs. 2.267 m; R² = 0.934 vs. −0.786). Economic assessment using NACCS fragility curves demonstrates 12.7-year payback periods for flood protection investments. The open-source Streamlit implementation democratizes access to research-grade risk assessment, transforming months-long specialist analyses into immediate browser-based tools without compromising scientific rigor. Full article

(This article belongs to the Special Issue Coastal Flood Hazard Risk Assessment and Mitigation Strategies)

► Show Figures

Figure 1

13 pages, 337 KB

Open AccessArticle

Synthesizing Explainability Across Multiple ML Models for Structured Data

by Emir Veledar, Lili Zhou, Omar Veledar, Hannah Gardener, Carolina M. Gutierrez, Jose G. Romano and Tatjana Rundek

Algorithms 2025, 18(6), 368; https://doi.org/10.3390/a18060368 - 18 Jun 2025

Viewed by 435

Abstract

Explainable Machine Learning (XML) in high-stakes domains demands reproducible methods to aggregate feature importance across multiple models applied to the same structured dataset. We propose the Weighted Importance Score and Frequency Count (WISFC) framework, which combines importance magnitude and consistency by aggregating ranked [...] Read more.

Explainable Machine Learning (XML) in high-stakes domains demands reproducible methods to aggregate feature importance across multiple models applied to the same structured dataset. We propose the Weighted Importance Score and Frequency Count (WISFC) framework, which combines importance magnitude and consistency by aggregating ranked outputs from diverse explainers. WISFC assigns a weighted score to each feature based on its rank and frequency across model-explainer pairs, providing a robust ensemble feature-importance ranking. Unlike simple consensus voting or ranking heuristics that are insufficient for capturing complex relationships among different explainer outputs, WISFC offers a more principled approach to reconciling and aggregating this information. By aggregating many “weak signals” from brute-force modeling runs, WISFC can surface a stronger consensus on which variables matter most. The framework is designed to be reproducible and generalizable, capable of taking important outputs from any set of machine-learning models and producing an aggregated ranking highlighting consistently important features. This approach acknowledges that any single model is a simplification of complex, multidimensional phenomena; using multiple diverse models, each optimized from a different perspective, WISFC systematically captures different facets of the problem space to create a more structured and comprehensive view. As a consequence, this study offers a useful strategy for researchers and practitioners who seek innovative ways of exploring complex systems, not by discovering entirely new variables but by introducing a novel mindset for systematically combining multiple modeling perspectives. Full article

(This article belongs to the Section Databases and Data Structures)

► Show Figures

Figure 1

21 pages, 1134 KB

Open AccessArticle

Dynamic Ensemble Selection for EEG Signal Classification in Distributed Data Environments

by Małgorzata Przybyła-Kasperek and Jakub Sacewicz

Appl. Sci. 2025, 15(11), 6043; https://doi.org/10.3390/app15116043 - 27 May 2025

Viewed by 588

Abstract

This study presents a novel approach to EEG signal classification in distributed environments using dynamic ensemble selection. In scenarios where data dispersion arises due to privacy constraints or decentralized data collection, traditional global modelling is impractical. We propose a framework where classifiers are [...] Read more.

This study presents a novel approach to EEG signal classification in distributed environments using dynamic ensemble selection. In scenarios where data dispersion arises due to privacy constraints or decentralized data collection, traditional global modelling is impractical. We propose a framework where classifiers are trained locally on independent subsets of EEG data without requiring centralized access. A dynamic coalition-based ensemble strategy is employed to integrate the outputs of these local models, enabling adaptive and instance-specific decision-making. Coalitions are formed based on conflict analysis between model predictions, allowing either consensus (unified) or diversity (diverse) to guide the ensemble structure. Experiments were conducted on two benchmark datasets: an epilepsy EEG dataset comprising 150 segmented EEG time series from ten patients, and the BCI Competition IV Dataset 1, with continuous recordings from seven subjects performing motor imagery tasks, for which a total of 1400 segments were extracted. In the study, we also evaluated the non-distributed (centralized) approach to provide a comprehensive performance baseline. Additionally, we tested a convolutional neural network specifically designed for EEG data, ensuring our results are compared against advanced deep learning methods. Gradient Boosting combined with measurement-level fusion and unified coalitions consistently achieved the highest performance, with an F1-score, accuracy, and balanced accuracy of 0.987 (for nine local tables). The results demonstrate the effectiveness and scalability of dynamic coalition-based ensembles for EEG diagnosis in distributed settings, highlighting their potential in privacy-sensitive clinical and telemedicine applications. Full article

(This article belongs to the Special Issue EEG Signal Processing in Medical Diagnosis Applications)

► Show Figures

Figure 1

22 pages, 1716 KB

Open AccessArticle

Benchmarking Multiple Large Language Models for Automated Clinical Trial Data Extraction in Aging Research

by Richard J. Young, Alice M. Matthews and Brach Poston

Algorithms 2025, 18(5), 296; https://doi.org/10.3390/a18050296 - 20 May 2025

Viewed by 1168

Abstract

Large-language models (LLMs) show promise for automating evidence synthesis, yet head-to-head evaluations remain scarce. We benchmarked five state-of-the-art LLMs—openai/o1-mini, x-ai/grok-2-1212, meta-llama/Llama-3.3-70B-Instruct, google/Gemini-Flash-1.5-8B, and deepseek/DeepSeek-R1-70B-Distill—on extracting protocol details from transcranial direct-current stimulation (tDCS) trials enrolling older adults. A multi-LLM ensemble pipeline ingested ClinicalTrials.gov records, [...] Read more.

Large-language models (LLMs) show promise for automating evidence synthesis, yet head-to-head evaluations remain scarce. We benchmarked five state-of-the-art LLMs—openai/o1-mini, x-ai/grok-2-1212, meta-llama/Llama-3.3-70B-Instruct, google/Gemini-Flash-1.5-8B, and deepseek/DeepSeek-R1-70B-Distill—on extracting protocol details from transcranial direct-current stimulation (tDCS) trials enrolling older adults. A multi-LLM ensemble pipeline ingested ClinicalTrials.gov records, applied a structured JSON schema, and generated comparable outputs from unstructured text. The pipeline retrieved 83 aging-related tDCS trials—roughly double the yield of a conventional keyword search. Across models, agreement was almost perfect for the binary field brain stimulation used (Fleiss κ ≈ 0.92) and substantial for the categorical primary target (κ ≈ 0.71). Numeric parameters such as stimulation intensity and session duration showed excellent consistency when explicitly reported (ICC 0.95–0.96); secondary targets and free-text duration phrases remained challenging (κ ≈ 0.61; ICC ≈ 0.35). An ensemble consensus (majority vote or averaging) resolved most disagreements and delivered near-perfect reliability on core stimulation attributes (κ = 0.94). These results demonstrate that multi-LLM ensembles can markedly expand trial coverage and reach expert-level accuracy on well-defined fields while still requiring human oversight for nuanced or sparsely reported details. The benchmark and open-source workflow set a solid baseline for future advances in prompt engineering, model specialization, and ensemble strategies aimed at fully automated evidence synthesis in neurostimulation research involving aging populations. Overall, the five-model multi-LLM ensemble doubled the number of eligible aging-related tDCS trials retrieved versus keyword searching and achieved near-perfect agreement on core stimulation parameters (κ ≈ 0.94), demonstrating expert-level extraction accuracy. Full article

(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (3rd Edition))

► Show Figures

Figure 1

21 pages, 2174 KB

Open AccessArticle

Deep Learning Ensemble Approach for Predicting Expected and Confidence Levels of Signal Phase and Timing Information at Actuated Traffic Signals

by Seifeldeen Eteifa, Amr Shafik, Hoda Eldardiry and Hesham A. Rakha

Sensors 2025, 25(6), 1664; https://doi.org/10.3390/s25061664 - 7 Mar 2025

Viewed by 2280

Abstract

Predicting Signal Phase and Timing (SPaT) information and confidence levels is needed to enhance Green Light Optimal Speed Advisory (GLOSA) and/or Eco-Cooperative Adaptive Cruise Control (Eco-CACC) systems. This study proposes an architecture based on transformer encoders to improve prediction performance. This architecture is [...] Read more.

Predicting Signal Phase and Timing (SPaT) information and confidence levels is needed to enhance Green Light Optimal Speed Advisory (GLOSA) and/or Eco-Cooperative Adaptive Cruise Control (Eco-CACC) systems. This study proposes an architecture based on transformer encoders to improve prediction performance. This architecture is combined with different deep learning methods, including Multilayer Perceptrons (MLP), Long-Short-Term Memory neural networks (LSTM), and Convolutional Long-Short-Term Memory neural networks (CNNLSTM) to form an ensemble of predictors. The ensemble is used to make data-driven predictions of SPaT information obtained from traffic signal controllers for six different intersections along the Gallows Road corridor in Virginia. The study outlines three primary tasks. Task one is predicting whether a phase would change within 20 s. Task two is predicting the exact change time within 20 s. Task three is assigning a confidence level to that prediction. The experiments show that the proposed transformer-based architecture outperforms all the previously used deep learning methods for the first two prediction tasks. Specifically, for the first task, the transformer encoder model provides an average accuracy of 96%. For task two, the transformer encoder models provided an average mean absolute error (MAE) of 1.49 s, compared to 1.63 s for other models. Consensus between models is shown to be a good leading indicator of confidence in ensemble predictions. The ensemble predictions with the highest level of consensus are within one second of the true value for 90.2% of the time as opposed to those with the lowest confidence level, which are within one second for only 68.4% of the time. Full article

(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)

► Show Figures

Figure 1

14 pages, 2166 KB

Open AccessArticle

Development of a Predictive Model for N-Dealkylation of Amine Contaminants Based on Machine Learning Methods

by Shiyang Cheng, Qihang Zhang, Hao Min, Wenhui Jiang, Jueting Liu, Chunsheng Liu and Zehua Wang

Toxics 2024, 12(12), 931; https://doi.org/10.3390/toxics12120931 - 22 Dec 2024

Cited by 1 | Viewed by 1033

Abstract

Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation [...] Read more.

Amines are widespread environmental pollutants that may pose health risks. Specifically, the N-dealkylation of amines mediated by cytochrome P450 enzymes (P450) could influence their metabolic transformation safety. However, conventional experimental and computational chemistry methods make it difficult to conduct high-throughput screening of N-dealkylation of emerging amine contaminants. Machine learning has been widely used to identify sources of environmental pollutants and predict their toxicity. However, its application in screening critical biotransformation pathways for organic pollutants has been rarely reported. In this study, we first constructed a large dataset comprising 286 emerging amine pollutants through a thorough search of databases and literature. Then, we applied four machine learning methods—random forest, gradient boosting decision tree, extreme gradient boosting, and multi-layer perceptron—to develop binary classification models for N-dealkylation. These models were based on seven carefully selected molecular descriptors that represent reactivity-fit and structural-fit. Among the predictive models, the extreme gradient boosting shows the highest prediction accuracy of 81.0%. The SlogP_VSA2 descriptor is the primary factor influencing predictions of N-dealkylation metabolism. Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms, whose performance is generally better than any single algorithm, with an accuracy rate of 86.2%. Therefore, the classification model developed in this work can provide methodological support for the high-throughput screening of N-dealkylation of amine pollutants. Full article

► Show Figures

Graphical abstract

17 pages, 1974 KB

Open AccessArticle

Assessing Alterations of Rainfall Variability Under Climate Change in Zengwen Reservoir Watershed, Southern Taiwan

by Jenq-Tzong Shiau, Cheng-Che Li, Hung-Wei Tseng and Shien-Tsung Chen

Water 2024, 16(22), 3165; https://doi.org/10.3390/w16223165 - 5 Nov 2024

Cited by 2 | Viewed by 1251

Abstract

This study aims to detect changes in rainfall variability caused by climate change for various scenarios in the CMIP6 (Coupled Model Intercomparison Project Phase 6) multi-model ensemble. Projected changes in rainfall unevenness in terms of different timescale indices using three categories, namely WD50 [...] Read more.

This study aims to detect changes in rainfall variability caused by climate change for various scenarios in the CMIP6 (Coupled Model Intercomparison Project Phase 6) multi-model ensemble. Projected changes in rainfall unevenness in terms of different timescale indices using three categories, namely WD50 (number of wettest days for half annual rainfall), SI (seasonality index), and DWR (ratio of dry-season to wet-season rainfall) are analyzed in Zengwen Reservoir watershed, southern Taiwan over near future (2021–2040) and midterm future (2041–2060) relative to the baseline period (1995–2014) under SSP2-4.5 and SSP5-8.5 scenarios. The projected rainfall for both baseline and future periods is derived from 25 GCMs (global climate models). The results indicate that noticeably deteriorated rainfall unevenness is projected in the Zengwen Reservoir watershed over future periods, which include decreased WD50, increased SI, and decreased DWR. Though there were noticeable differences in the rainfall projections by the different GCMs, the overall consensus reveals that uncertainties in future rainfall should not be ignored. In addition, WD50 has the greatest deviated relative change in mean, which implies that the short-timescale rainfall unevenness index is easily affected by climate change in the study area. Distributional changes in rainfall unevenness determined by simultaneously considering alterations in relative changes in mean and standard deviation indicated that there was no single dominant category. However, the top two categories, with summed frequencies exceeding 0.5, characterize different properties of rainfall unevenness indices. The top two categories of WD50 and SI commonly have decreased mean and increased mean, respectively, but nearly equal frequencies of the top two categories in DWR exhibit opposite variations. The proposed rainfall unevenness change detection approach provides a better understanding of the impacts of climate change on rainfall unevenness, which is useful for preparing adaptive mitigation measures for coping with disasters induced by climate change. Full article

(This article belongs to the Section Hydrology)

► Show Figures

Figure 1

24 pages, 2035 KB

Open AccessArticle

Cheminformatic Identification of Tyrosyl-DNA Phosphodiesterase 1 (Tdp1) Inhibitors: A Comparative Study of SMILES-Based Supervised Machine Learning Models

by Conan Hong-Lun Lai, Alex Pak Ki Kwok and Kwong-Cheong Wong

J. Pers. Med. 2024, 14(9), 981; https://doi.org/10.3390/jpm14090981 - 15 Sep 2024

Viewed by 2153

Abstract

Background: Tyrosyl-DNA phosphodiesterase 1 (Tdp1) repairs damages in DNA induced by abortive topoisomerase 1 activity; however, maintenance of genetic integrity may sustain cellular division of neoplastic cells. It follows that Tdp1-targeting chemical inhibitors could synergize well with existing chemotherapy drugs to deny cancer [...] Read more.

Background: Tyrosyl-DNA phosphodiesterase 1 (Tdp1) repairs damages in DNA induced by abortive topoisomerase 1 activity; however, maintenance of genetic integrity may sustain cellular division of neoplastic cells. It follows that Tdp1-targeting chemical inhibitors could synergize well with existing chemotherapy drugs to deny cancer growth; therefore, identification of Tdp1 inhibitors may advance precision medicine in oncology. Objective: Current computational research efforts focus primarily on molecular docking simulations, though datasets involving three-dimensional molecular structures are often hard to curate and computationally expensive to store and process. We propose the use of simplified molecular input line entry system (SMILES) chemical representations to train supervised machine learning (ML) models, aiming to predict potential Tdp1 inhibitors. Methods: An open-sourced consensus dataset containing the inhibitory activity of numerous chemicals against Tdp1 was obtained from Kaggle. Various ML algorithms were trained, ranging from simple algorithms to ensemble methods and deep neural networks. For algorithms requiring numerical data, SMILES were converted to chemical descriptors using RDKit, an open-sourced Python cheminformatics library. Results: Out of 13 optimized ML models with rigorously tuned hyperparameters, the random forest model gave the best results, yielding a receiver operating characteristics-area under curve of 0.7421, testing accuracy of 0.6815, sensitivity of 0.6444, specificity of 0.7156, precision of 0.6753, and F1 score of 0.6595. Conclusions: Ensemble methods, especially the bootstrap aggregation mechanism adopted by random forest, outperformed other ML algorithms in classifying Tdp1 inhibitors from non-inhibitors using SMILES. The discovery of Tdp1 inhibitors could unlock more treatment regimens for cancer patients, allowing for therapies tailored to the patient’s condition. Full article

(This article belongs to the Special Issue Artificial Intelligence Applications in Precision Oncology)

► Show Figures

Figure 1

13 pages, 2719 KB

Open AccessArticle

Structure-Based Identification of Novel Histone Deacetylase 4 (HDAC4) Inhibitors

by Rupesh Agarwal, Pawat Pattarawat, Michael R. Duff, Hwa-Chain Robert Wang, Jerome Baudry and Jeremy C. Smith

Pharmaceuticals 2024, 17(7), 867; https://doi.org/10.3390/ph17070867 - 2 Jul 2024

Cited by 1 | Viewed by 2316

Abstract

Histone deacetylases (HDACs) are important cancer drug targets. Existing FDA-approved drugs target the catalytic pocket of HDACs, which is conserved across subfamilies (classes) of HDAC. However, engineering specificity is an important goal. Herein, we use molecular modeling approaches to identify and target potential [...] Read more.

Histone deacetylases (HDACs) are important cancer drug targets. Existing FDA-approved drugs target the catalytic pocket of HDACs, which is conserved across subfamilies (classes) of HDAC. However, engineering specificity is an important goal. Herein, we use molecular modeling approaches to identify and target potential novel pockets specific to Class IIA HDAC-HDAC4 at the interface between HDAC4 and the transcriptional corepressor component protein NCoR. These pockets were screened using an ensemble docking approach combined with consensus scoring to identify compounds with a different binding mechanism than the currently known HDAC modulators. Binding was compared in experimental assays between HDAC4 and HDAC3, which belong to a different family of HDACs. HDAC4 was significantly inhibited by compound 88402 but not HDAC3. Two other compounds (67436 and 134199) had IC50 values in the low micromolar range for both HDACs, which is comparable to the known inhibitor of HDAC4, SAHA (Vorinostat). However, both of these compounds were significantly weaker inhibitors of HDAC3 than SAHA and thus more selective, albeit to a limited extent. Five compounds exhibited activity on human breast carcinoma and/or urothelial carcinoma cell lines. The present result suggests potential mechanistic and chemical approaches for developing selective HDAC4 modulators. Full article

(This article belongs to the Special Issue Small Molecule Drug Discovery: Driven by In-Silico Techniques)

► Show Figures

Figure 1

21 pages, 4347 KB

Open AccessArticle

Hydrological Drought and Flood Projection in the Upper Heihe River Basin Based on a Multi-GCM Ensemble and the Optimal GCM

by Zhanling Li, Yingtao Ye, Xiaoyu Lv, Miao Bai and Zhanjie Li

Atmosphere 2024, 15(4), 439; https://doi.org/10.3390/atmos15040439 - 1 Apr 2024

Cited by 4 | Viewed by 1798

Abstract

To ensure water use and water resource security along “the Belt and Road”, the runoff and hydrological droughts and floods under future climate change conditions in the upper Heihe River Basin were projected in this study, based on the observed meteorological and runoff [...] Read more.

To ensure water use and water resource security along “the Belt and Road”, the runoff and hydrological droughts and floods under future climate change conditions in the upper Heihe River Basin were projected in this study, based on the observed meteorological and runoff data from 1987 to 2014, and data from 10 GCMs from 1987 to 2014 and from 2026 to 2100, using the SWAT model, the Standardized Runoff Index, run length theory, and the entropy-weighted TOPSIS method. Both the multi-GCM ensemble (MME) and the optimal model were used to assess future hydrological drought and flood responses to climate change. The results showed that (1) the future multi-year average runoff from the MME was projected to be close to the historical period under the SSP245 scenario and to increase by 2.3% under the SSP585 scenario, and those from the optimal model CMCC-ESM2 were projected to decrease under both scenarios; (2) both the MME and the optimal model showed that drought duration and flood intensity in the future were projected to decrease, while drought intensity, drought peak, flood duration, and flood peak were projected to increase under both scenarios in their multi-year average levels; (3) drought duration was projected to decrease most after 2080, and drought intensity, flood duration, and flood peak were projected to increase most after 2080, according to the MME. The MME and the optimal model reached a consensus on the sign of hydrological extreme characteristic responses to climate change, but showed differences in the magnitude of trends. Full article

(This article belongs to the Topic Hydro-Meteorological Hazards: Forecasting, Assessment and Risk Management)

► Show Figures

Figure 1

15 pages, 3348 KB

Open AccessArticle

Modeling of Human Rabies Cases in Brazil in Different Future Global Warming Scenarios

by Jessica Milena Moura Neves, Vinicius Silva Belo, Cristina Maria Souza Catita, Beatriz Fátima Alves de Oliveira and Marco Aurelio Pereira Horta

Int. J. Environ. Res. Public Health 2024, 21(2), 212; https://doi.org/10.3390/ijerph21020212 - 11 Feb 2024

Cited by 2 | Viewed by 4951

Abstract

Bat species have been observed to have the potential to expand their distribution in response to climate change, thereby influencing shifts in the spatial distribution and population dynamics of human rabies cases. In this study, we applied an ensemble niche modeling approach to [...] Read more.

Bat species have been observed to have the potential to expand their distribution in response to climate change, thereby influencing shifts in the spatial distribution and population dynamics of human rabies cases. In this study, we applied an ensemble niche modeling approach to project climatic suitability under different future global warming scenarios for human rabies cases in Brazil, and assessed the impact on the probability of emergence of new cases. We obtained notification records of human rabies cases in all Brazilian cities from January 2001 to August 2023, as reported by the State and Municipal Health Departments. The current and future climate data were sourced from a digital repository on the WorldClim website. The future bioclimatic variables provided were downscaled climate projections from CMIP6 (a global model ensemble) and extracted from the regionalized climate model HadGEM3-GC31-LL for three future socioeconomic scenarios over four periods (2021–2100). Seven statistical algorithms (MAXENT, MARS, RF, FDA, CTA, GAM, and GLM) were selected for modeling human rabies. Temperature seasonality was the bioclimatic variable with the highest relative contribution to both current and future consensus models. Future scenario modeling for human rabies indicated a trend of changes in the areas of occurrence, maintaining the current pace of global warming, population growth, socioeconomic instability, and the loss of natural areas. In Brazil, there are areas with a higher likelihood of climatic factors contributing to the emergence of cases. When assessing future scenarios, a change in the local climatic suitability is observed that may lead to a reduction or increase in cases, depending on the region. Full article

(This article belongs to the Special Issue Global Climate Change and Public Health)

► Show Figures

Figure 1

17 pages, 3208 KB

Open AccessArticle

Present and Future of Heavy Rain Events in the Sahel and West Africa

by Inoussa Abdou Saley and Seyni Salack

Atmosphere 2023, 14(6), 965; https://doi.org/10.3390/atmos14060965 - 31 May 2023

Cited by 10 | Viewed by 3359

Abstract

Gridding precipitation datasets for climate information services in the semi-arid regions of West Africa has some advantages due to the limited spatial coverage of rain gauges, the limited accessibility to in situ gauge data, and the important progress in earth observation and climate [...] Read more.

Gridding precipitation datasets for climate information services in the semi-arid regions of West Africa has some advantages due to the limited spatial coverage of rain gauges, the limited accessibility to in situ gauge data, and the important progress in earth observation and climate modelling systems. Can accurate information on the occurrence of heavy precipitation in this area be provided using gridded datasets? Furthermore, what about the future of heavy rain events (HRE) under the shared socioeconomic pathways (SSP) of the Inter-Sectoral Impact Model Intercomparison Project (i.e., SSP126 and SSP370)? To address these questions, daily precipitation records from 17 datasets, including satellite estimates, interpolated rain gauge data, reanalysis, merged products, a regional climate model, and global circulation models, are examined and compared to quality-controlled in situ data from 69 rain gauges evenly distributed across West Africa’s semi-arid region. The results show a consensus increase in the occurrence of HRE, between observational and gridded data. All datasets showed three categories of HRE every season, but these categories had lower intensities and an overstated frequency of occurrence in gridded datasets compared to in situ rain gauge data. Eight out of 17 databases (~47%) show significant positive trends and only one showed a significant negative trend, indicating an increase in HRE for all categories in this region. The future evolution of HRE considered under the shared socioeconomic pathways SSP1-2.6 and SSP3-7.0, showed a trend toward the intensification of these events. In fact, the mean of the ensemble of the models showed significant changes toward higher values in the probability distribution function of the future HRE in West Africa, which may likely trigger more floods and landslides in the region. The use of gridded data sets can provide accurate information on the occurrence of heavy precipitation in the West African Sahel. However, it is important to consider the representation of heavy rain events in each data set when monitoring extreme precipitation, although in situ gauge records are preferred to define extreme rainfall locally. Full article

(This article belongs to the Special Issue Precipitation in Africa)

► Show Figures

Figure 1

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (39)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI