MDPI - Publisher of Open Access Journals

19 pages, 4515 KB

Open AccessArticle

An Explainable 2D-QSAR Machine Learning Approach for Predicting COX-2 Inhibitory Activity Using Molecular Fingerprints

by Mebarka Ouassaf and Bader Y. Alhatlani

Pharmaceuticals 2026, 19(5), 698; https://doi.org/10.3390/ph19050698 - 29 Apr 2026

Background/Objectives: Cyclooxygenase-2 (COX-2) is a well-established target in the development of anti-inflammatory drugs due to its central role in mediating inflammation. The identification of novel COX-2 inhibitors remains a key focus in pharmaceutical research. This study aimed to develop a robust and interpretable [...] Read more.

Background/Objectives: Cyclooxygenase-2 (COX-2) is a well-established target in the development of anti-inflammatory drugs due to its central role in mediating inflammation. The identification of novel COX-2 inhibitors remains a key focus in pharmaceutical research. This study aimed to develop a robust and interpretable machine learning framework to predict COX-2 inhibitory activity and support virtual screening efforts. Methods: A curated dataset of 2052 compounds was obtained from the ChEMBL database. Molecular structures were encoded using Morgan fingerprints derived from SMILES representations. Several machine learning algorithms were trained and evaluated, including ensemble-based methods. Model performance was assessed using internal validation and external test sets. Robustness was further evaluated through Y-randomization tests. Model interpretability was investigated using SHAP (SHapley Additive exPlanations) analysis to identify key structural features contributing to activity. Results: Among the evaluated models, ensemble methods demonstrated superior predictive performance, with the Random Forest algorithm providing the most consistent and reliable results across validation and external datasets. Y-randomization confirmed that the model predictions were not due to chance correlations. SHAP analysis revealed that the most influential features corresponded to chemically meaningful substructures aligned with known COX-2 pharmacophore characteristics. The final optimized model was successfully deployed as a publicly accessible web application for real-time prediction using SMILES input. Conclusions: This study demonstrates the effectiveness of explainable machine learning approaches in predicting COX-2 inhibitory activity. The developed framework provides a reliable and interpretable tool for accelerating COX-2 inhibitor discovery and facilitating virtual screening in drug development. Full article

(This article belongs to the Special Issue Application of 2D and 3D-QSAR Models in Drug Design: 2nd Edition)

► Show Figures

Graphical abstract

26 pages, 4830 KB

Open AccessArticle

A Physically Aware Residual Learning Framework for Outdoor Localization in LoRaWAN Networks

by Askhat Bolatbek, Ömer Faruk Beyca, Batyrbek Zholamanov, Madiyar Nurgaliyev, Gulbakhar Dosymbetova, Dinara Almen, Ahmet Saymbetov, Botakoz Yertaikyzy, Sayat Orynbassar and Ainur Kapparova

Future Internet 2026, 18(4), 216; https://doi.org/10.3390/fi18040216 - 18 Apr 2026

Viewed by 256

Abstract

The rapid growth of large-scale Internet of Things (IoT) deployments in urban environments requires accurate and energy-efficient localization methods for low-power wireless devices. In long-range wide-area networks (LoRaWAN), traditional GPS-based positioning is often impractical due to energy consumption constraints and signal propagation challenges [...] Read more.

The rapid growth of large-scale Internet of Things (IoT) deployments in urban environments requires accurate and energy-efficient localization methods for low-power wireless devices. In long-range wide-area networks (LoRaWAN), traditional GPS-based positioning is often impractical due to energy consumption constraints and signal propagation challenges in urban areas. This study proposes a hybrid localization system that integrates weighted centroid localization (WCL) with a machine learning (ML) regression model to improve outdoor positioning accuracy. The proposed approach first estimates approximate transmitter coordinates using a physically grounded WCL method based on received signal strength indicator (RSSI) measurements. These initial estimates are subsequently refined by ML models trained to learn nonlinear residual corrections. In addition to random partitioning, a spatial data splitting strategy is proposed and evaluated using a publicly available LoRaWAN dataset. The experimental results demonstrate that the hybrid WCL framework combined with a multilayer perceptron (MLP) significantly outperforms other ML models. The proposed method achieves a mean localization error of 160.47 m and a median error of 73.78 m. Compared to the baseline model, the integration of WCL reduces the mean localization error by approximately 29%, highlighting the effectiveness of incorporating physically interpretable priors into localization models. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Graphical abstract

21 pages, 1864 KB

Open AccessArticle

Rapid Electrochemical Profiling of Fecal Short-Chain Fatty Acids Using Esterification/Dissociation Fingerprints and Artificial Neural Networks

by Bing-Chen Gu, Guan-Ying Jiang, Ching-Hung Tseng, Yi-Ju Chen, Chun-Ying Wu, Zhi-Xuan Lin, Zhung-Wen Yeh and Chia-Che Wu

Biosensors 2026, 16(4), 223; https://doi.org/10.3390/bios16040223 - 17 Apr 2026

Viewed by 338

Abstract

Short-chain fatty acids (SCFAs) are key biomarkers of gut microbiota activity; however, routine quantification in fecal samples relies largely on chromatography, which is instrument-intensive and throughput-limited chromatography techniques. Herein, we present a rapid machine-learning-assisted electroanalysis platform for SCFAs profiling that integrates a disposable [...] Read more.

Short-chain fatty acids (SCFAs) are key biomarkers of gut microbiota activity; however, routine quantification in fecal samples relies largely on chromatography, which is instrument-intensive and throughput-limited chromatography techniques. Herein, we present a rapid machine-learning-assisted electroanalysis platform for SCFAs profiling that integrates a disposable three-electrode planar gold chip with voltammetric fingerprinting and artificial neural network (ANN)-based signal decoupling. To generate orthogonal chemical information and improve the discrimination of structurally similar species, a dual pretreatment strategy combining acid-catalyzed esterification and alkaline dissociation was employed prior to electrochemical analyses. Differential pulse voltammetry (DPV) and cyclic voltammetry (CV) were employed to acquire high-dimensional fingerprints, from which current-, potential-, and area-based descriptors were extracted using a cross-information feature strategy. A hierarchical modeling framework improved total SCFAs prediction by incorporating ANN-predicted propionate and butyrate concentrations as auxiliary inputs. While linear calibration was achievable in standard mixtures, direct linear models performed poorly in real fecal matrices due to strong sample-dependent matrix interference. In contrast, the ANN captured nonlinear relationships among multifeature inputs and suppressed matrix effects. Validation against gas chromatography–mass spectrometry in an independent fecal test cohort (n = 30) demonstrated excellent agreement and low prediction errors, with mean absolute error/root mean square error values of 0.063/0.072 mM (propionic acid), 0.029/0.034 mM (butyric acid), and 0.135/0.202 mM (total SCFAs). The DPV/CV acquisition requires only minutes per sample, whereas pretreatment takes 1~3 h depending on the target route but can be performed in parallel for batch processing; thus, overall throughput is determined mainly by batch pretreatment rather than per-sample instrument time. This electrochemical–ANN workflow provides a portable, high-throughput alternative to chromatography for fecal SCFAs profiling in clinical screening and microbiome research. Full article

(This article belongs to the Special Issue Electrochemical (Bio-)Sensors in Biological Applications—3rd Edition)

► Show Figures

Figure 1

14 pages, 2210 KB

Open AccessArticle

XGBPred-ACSM: A Hybrid Descriptor-Driven XGBoost Framework for Anticancer Small Molecule Prediction

by Priya Dharshini Balaji, Subathra Selvam, Anuradha Thiagarajan, Honglae Sohn and Thirumurthy Madhavan

Pharmaceuticals 2026, 19(4), 635; https://doi.org/10.3390/ph19040635 - 17 Apr 2026

Viewed by 300

Abstract

Background/Objectives: Cancer remains one of the leading global health burdens, mainly because of the lack of specificity and off-target toxicity associated with conventional therapeutic approaches. To move toward more efficient anticancer drug discovery, we have developed an advanced machine-learning-based architecture that allows [...] Read more.

Background/Objectives: Cancer remains one of the leading global health burdens, mainly because of the lack of specificity and off-target toxicity associated with conventional therapeutic approaches. To move toward more efficient anticancer drug discovery, we have developed an advanced machine-learning-based architecture that allows for predictive modeling of anticancer small molecules. Methods: A total of 3600 compounds with experimentally validated IC₅₀ values were systematically processed to derive a comprehensive suite of molecular representations comprising 2D physicochemical descriptors, structural fingerprints, and hybrid descriptor sets generated via the Mordred and PaDEL frameworks. A total of six machine learning algorithms—Random Forest (RF), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), Extra-Trees classifier (ET), Adaptive Boosting (AdaBoost), and Light Gradient Boosting Machine (LightGBM)—were trained and benchmarked via a rigorous model evaluation protocol incorporating 10-fold cross-validation along with multiple performance metrics. Ensemble voting strategies were also examined to assess potential performance. Result: Of all configurations, the XGB-Hybrid architecture emerged as the most robust and generalizable classifier with an AUC of 0.88 and accuracy of 79.11% on the independent test set. To ensure interpretability and mechanistic insight, SHAP-based feature analysis was conducted, by which feature contributions could be quantified and the molecular determinants most influential for anticancer activity discrimination were revealed. Altogether, the current study establishes an XGB-Hybrid framework as technically rigorous, interpretable, and high-performance predictive modeling with the ability to accelerate early-stage anticancer small molecule identification. Conclusions: The study has brought into focus the transformational effect of machine learning in modern computational oncology and rational drug design pipelines. Full article

(This article belongs to the Special Issue Artificial Intelligence-Assisted Drug Discovery)

► Show Figures

Figure 1

17 pages, 5824 KB

Open AccessArticle

Neurotoxicity Prediction of Compounds: Integrating Knowledge-Guided Graph Representations with Machine Learning Approaches

by Yongxin Jiang, Yilin Gao, Yi He, Shu Xing and Weiwei Han

Int. J. Mol. Sci. 2026, 27(8), 3543; https://doi.org/10.3390/ijms27083543 - 16 Apr 2026

Viewed by 389

Abstract

Neurotoxicity from drugs and environmental pollutants poses serious risks to brain function, yet existing computational models mainly target general neurotoxicity and lack specialized tools for brain-specific assessment. This study aimed to develop and validate a high-performance, brain-focused neurotoxicity prediction framework to improve drug [...] Read more.

Neurotoxicity from drugs and environmental pollutants poses serious risks to brain function, yet existing computational models mainly target general neurotoxicity and lack specialized tools for brain-specific assessment. This study aimed to develop and validate a high-performance, brain-focused neurotoxicity prediction framework to improve drug safety evaluation and toxicity screening. We systematically analyzed molecular features, clustering patterns, and target predictions of brain-toxic compounds. Multiple feature representations were compared, including traditional molecular fingerprints, knowledge-guided pre-trained graph Transformer (KPGT) embeddings, and transformer-based MolFormer embeddings, combined with machine learning classifiers. Model performance was evaluated using multiple metrics, and SHAP analysis was conducted to identify influential molecular substructures. Toxic molecules showed physicochemical properties favoring central nervous system (CNS) penetration, including lower molecular weight, lower LogP, fewer hydrogen bond donors/acceptors, fewer rotatable bonds, and lower polar surface area (PSA). The KPGT-MLP model achieved the best balanced performance, with an accuracy (ACC) of 0.8928 and an ROC-AUC of 0.9459, clearly outperforming traditional fingerprint-based models, MolFormer-based models, and general prediction tools such as DI-NeuroT and ADMETlab 3.0. Overall, this study establishes a robust framework for brain-specific neurotoxicity prediction, with the KPGT-MLP model demonstrating strong accuracy and robustness. The proposed approach provides an effective strategy for early neurotoxicity screening and risk assessment, offering valuable insights for safer drug design and advancing computational toxicology and drug discovery. Full article

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 4th Edition)

► Show Figures

Figure 1

15 pages, 6210 KB

Open AccessArticle

AHR/NRF2 Dual Agonist Prediction and Natural Compound Screening Based on Machine Learning: A New Strategy for the Treatment of Atopic Dermatitis

by Yu Zhen, Qi Li, Xiaoxu Hu, Xiaorui Liu, Zhijie Shao, Heidi Qunhui Xie, Bin Zhao and Li Xu

Int. J. Mol. Sci. 2026, 27(8), 3530; https://doi.org/10.3390/ijms27083530 - 15 Apr 2026

Viewed by 387

Abstract

In the treatment of atopic dermatitis (AD), synergistic activation of the aryl hydrocarbon receptor (AHR)/nuclear factor erythroid 2-related factor 2 (NRF2) pathways represents a promising strategy. However, known dual agonists are limited, and traditional screening methods are inefficient. Therefore, this study developed machine [...] Read more.

In the treatment of atopic dermatitis (AD), synergistic activation of the aryl hydrocarbon receptor (AHR)/nuclear factor erythroid 2-related factor 2 (NRF2) pathways represents a promising strategy. However, known dual agonists are limited, and traditional screening methods are inefficient. Therefore, this study developed machine learning models to predict AHR/NRF2 dual agonists using molecular descriptors and fingerprints. All models achieved area under the receiver operating characteristic curve (AUC) values above 0.86, indicating good classification performance. The optimal AHR model showed an accuracy (ACC) of 0.811 and an AUC of 0.878, while the best NRF2 model yielded an ACC of 0.839 and an AUC of 0.907. Based on this model, compounds with a low fraction of sp³-hybridized carbons, moderate hydrophobicity, limited alkyl chains, and highly conjugated structures tend to act as AHR/NRF2 dual agonists. Finally, this study screened 1011 potential natural AHR/NRF2 dual agonists suitable for drug development. Among these, 2-arylbenzofurans, alkaloids, phenanthrenes, flavones, and furocoumarins demonstrated particular advantages. For validation, Indirubin, imperatorin and 3′-O-Methylbutastatin III were first discovered as AHR/NRF2 dual agonists in HaCaT cells. This work provides a robust predictive tool, clarifies key molecular features of dual agonists, and may support the discovery of anti-AD agents. Full article

(This article belongs to the Section Molecular Biology)

► Show Figures

Graphical abstract

15 pages, 2633 KB

Open AccessArticle

A Sensitive Multichannel Fluorescent Polymer Sensor Array for the Detection of Protein Fluctuations in Serum

by Junwhee Yang, Colby Alves, Kanwal Nazir, Mingdi Jiang, Nicolas Araujo and Vincent M. Rotello

Sensors 2026, 26(8), 2308; https://doi.org/10.3390/s26082308 - 9 Apr 2026

Viewed by 637

Abstract

Serum contains diverse proteins whose concentrations vary with pathological conditions such as cancer, liver disease, neurological disorder, and infections. Conventional methods like serum protein electrophoresis (SPEP) and enzyme-linked immunosorbent assay (ELISA) are gold standards for protein identification; however, they are time-consuming and can [...] Read more.

Serum contains diverse proteins whose concentrations vary with pathological conditions such as cancer, liver disease, neurological disorder, and infections. Conventional methods like serum protein electrophoresis (SPEP) and enzyme-linked immunosorbent assay (ELISA) are gold standards for protein identification; however, they are time-consuming and can miss abnormal serum protein levels. Inspired by chemical nose sensing based on selective sensor–analyte interactions, we synthesized five pyrene-conjugated fluorescent polymers (PFPs) with distinct side-chain head groups to construct a multichannel fluorescence sensor array. These polymers were screened for sensitivity to changes in serum protein levels using linear discriminant analysis (LDA), a machine learning method. This process led to the successful discovery of two PFPs that effectively detect protein level fluctuations. These PFPs provided a sensitive sensor array capable of generating a high-content response pattern (fingerprint) with six fluorescence channels. This sensor array successfully discriminated protein level fluctuations in serum with 98% jackknife classification accuracy and 95% unknown identification accuracy. This polymer sensor array holds strong potential as a diagnostic tool for serum-based samples and can be extended to other applications related to protein identification. Full article

(This article belongs to the Special Issue Design and Application of Nanosensor Arrays)

► Show Figures

Graphical abstract

33 pages, 2336 KB

Open AccessArticle

Machine Learning-Assisted FTIR Spectroscopy Analysis of Kidney Preservation Fluids for Delayed Graft Function Risk Stratification

by Luis Ramalhete, Rúben Araújo, Miguel Bigotte Vieira, Emanuel Vigia, Ana Pena, Sofia Carrelha, Cristiana Teixeira, Anibal Ferreira and Cecilia R. C. Calado

J. Clin. Med. 2026, 15(7), 2762; https://doi.org/10.3390/jcm15072762 - 6 Apr 2026

Cited by 1 | Viewed by 435

Abstract

Background/Objectives: Delayed graft function (DGF) remains a common early complication after deceased donor kidney transplantation and is challenging to anticipate using routine pre-implant clinical variables alone. We investigated whether high-throughput Fourier transform infrared (FTIR) spectroscopy of static cold storage preservation fluid (not [...] Read more.

Background/Objectives: Delayed graft function (DGF) remains a common early complication after deceased donor kidney transplantation and is challenging to anticipate using routine pre-implant clinical variables alone. We investigated whether high-throughput Fourier transform infrared (FTIR) spectroscopy of static cold storage preservation fluid (not machine perfusion perfusate) captures biochemical information associated with DGF and warrants further evaluation alongside routine pre-implant clinical predictors. Methods: In this single-center retrospective cohort, we analyzed preservation fluid samples from 56 kidney transplants originating from 49 deceased donors (7 donors contributed two kidneys); DGF occurred in 14/56 (25.0%). Dried-film FTIR spectra were acquired using a plate-based high-throughput accessory, and analyses focused on the fingerprint region (900–1800 cm⁻¹) with prespecified preprocessing and quality control. We developed and compared clinical-only, FTIR-only, and combined predictive models and estimated performance using donor-blinded 5-fold StratifiedGroupKFold cross-validation (grouped by donor code) to prevent leakage across paired kidneys. Results: Donor-blinded discrimination (pooled out-of-fold ROC-AUC) was 0.775 for the clinical-only model, 0.814 for the FTIR-only model, and 0.796 for the combined model; probabilistic accuracy (Brier score; lower is better) was 0.162, 0.194, and 0.177, respectively. Calibration intercepts were negative and slopes were <1, indicating overly extreme risk estimates under strict donor-blinded validation and supporting recalibration prior to deployment. Decision curve analysis suggested a positive net benefit for clinically plausible thresholds. Conclusions: These findings support the feasibility of rapid, low-cost FTIR profiling of routinely available preservation fluid as a proof-of-concept approach for exploratory DGF risk stratification, rather than as a clinically deployable prediction tool. Given the small sample size and the instability of subgroup estimates, the main next steps are external validation in larger multicenter cohorts, prospective workflow studies, and model updating/recalibration. Full article

(This article belongs to the Section Nephrology & Urology)

► Show Figures

Figure 1

23 pages, 3226 KB

Open AccessArticle

A Detection and Recognition Method for Interference Signals Based on Radio Frequency Fingerprint Characteristics

by Yang Guo and Yuan Gao

Electronics 2026, 15(7), 1393; https://doi.org/10.3390/electronics15071393 - 27 Mar 2026

Viewed by 417

Abstract

With the advancement of 5G and the Internet of Things (IoT), traditional upper-layer authentication mechanisms are vulnerable to attacks, while quantum computing threatens cryptographic security. Radio frequency fingerprint identification (RFFI) offers a physical-layer solution by exploiting inherent hardware imperfections. However, in complex electromagnetic [...] Read more.

With the advancement of 5G and the Internet of Things (IoT), traditional upper-layer authentication mechanisms are vulnerable to attacks, while quantum computing threatens cryptographic security. Radio frequency fingerprint identification (RFFI) offers a physical-layer solution by exploiting inherent hardware imperfections. However, in complex electromagnetic environments, narrowband and especially agile interference (characterized by low power and narrow bandwidth) can severely distort fingerprint features, rendering conventional detection algorithms ineffective. To address this challenge, this paper proposes a novel interference detection framework tailored for Orthogonal Frequency Division Multiplexing (OFDM) systems. First, a signal transmission model incorporating non-ideal hardware characteristics (e.g., DC offset, I/Q imbalance) is established. Based on this model, we design an agile interference detection algorithm comprising two key components: (1) a time-series anomaly detection method that fuses multi-domain expert features (fractal, complexity, and high-order statistics) with machine learning, demonstrating superior performance over the traditional CME algorithm under narrowband interference, and (2) a progressive search segmental detection algorithm that, combined with reconstruction error features extracted by an autoencoder, effectively identifies low-power agile interference by appropriately trading-off computation time for detection sensitivity. Finally, an OFDM simulation platform is developed to validate the proposed methods. The results show that the segmental detection algorithm achieves reliable detection at a jammer-to-signal ratio (JSR) as low as −10 dB, significantly outperforming existing approaches and enhancing the robustness of RFFI in challenging interference environments. Full article

(This article belongs to the Special Issue IoT Based Intelligent Communications: Modelling, Practice and Applications)

► Show Figures

Figure 1

14 pages, 1035 KB

Open AccessArticle

Indoor Localization Based on IoT Crowdsensing Task Allocation

by Bahareh Lashkari, Javad Rezazadeh and Reza Farahbakhsh

J. Sens. Actuator Netw. 2026, 15(2), 27; https://doi.org/10.3390/jsan15020027 - 17 Mar 2026

Viewed by 545

Abstract

Crowdsensing has been recently investigated as an incorporation of Human-Machine intelligence in which contribution of users is crucial. Indoor localization is one of the significant applications among divers applications that have been introduced in this area. Considering the slight infiltration of GPS signals [...] Read more.

Crowdsensing has been recently investigated as an incorporation of Human-Machine intelligence in which contribution of users is crucial. Indoor localization is one of the significant applications among divers applications that have been introduced in this area. Considering the slight infiltration of GPS signals in indoor environments crowdsensing and its promising indoor localization schemes have been utilized for providing precise localization services. Precision of crowdsensing indoor localization schemes and elimination of erroneous data collection is strongly dependent on the underlying task allocation mechanism. In this work, we have approached the localization precision as a consequence of task allocation mechanism of crowd-powered indoor localization schemes. Hence, we have proposed to tackle this issue by applying GWO (Gray Wolf Optimizer) algorithm on participants of crowdsensing scheme. It is expected that the GWO algorithm implicitly performs the task allocation procedure in account of its crowd-powered nature. Accordingly, we have applied GWO algorithm on a proposed indoor localization scenario to undertake the requirements for discrete task allocation mechanism. Implementation results demonstrated that the population-centric structure of the GWO algorithm significantly increments the accuracy of fingerprint collection mechanism which maintains an exceptional localization precision. Full article

(This article belongs to the Special Issue Recent Trends and Advancements in Location Fingerprinting)

► Show Figures

Figure 1

20 pages, 1122 KB

Open AccessArticle

A Robust Fingerprint-Based Machine Learning Model for Indoor Navigation in Real Time

by Md. Selim Al Mamun and Fatema Akhter

Signals 2026, 7(2), 26; https://doi.org/10.3390/signals7020026 - 16 Mar 2026

Viewed by 566

Abstract

The accurate positioning of location in indoor environment has become crucial in many location-based services, mainly where global positioning systems (GPSs) are unavailable or fail to navigate correctly. Conventional fingerprint-based approaches face challenges with instability, low accuracy, and being sensitive to changes in [...] Read more.

The accurate positioning of location in indoor environment has become crucial in many location-based services, mainly where global positioning systems (GPSs) are unavailable or fail to navigate correctly. Conventional fingerprint-based approaches face challenges with instability, low accuracy, and being sensitive to changes in the environment. This study proposes a robust fingerprint-based machine learning (ML) model for dynamic environment indoor navigation in real time. The proposed model uses link quality indicator (LQI) values from IEEE 802.15.4 as fingerprints and supervised learning algorithms, showing high accuracy and a strong ability to adapt to changes in the environment. A room within a building floor has been regarded as the unit of location identification instead of the user’s exact coordinates to make the suggested model more relevant under practical conditions. The model was trained and tested using a real LQI dataset collected from varied indoor conditions to ensure the system can adapt effectively and operate consistently in dynamic environments and signal conditions. The results show that the proposed model surpasses fingerprinting indoor navigation in room detection accuracy and flexibility to environmental changes. An implemented prototype proved the real-time capability of the proposal in smart buildings, hospitals, and industrial IoT settings. Full article

► Show Figures

Figure 1

17 pages, 2037 KB

Open AccessArticle

A High-Performance and Interpretable pK_a Prediction Framework Integrating Count-Based Fingerprints and Ensemble Learning

by Hui Shen, Yongquan He, Juefeng Deng, Xiaoying Li, Chenqiang Yang, Dingren Ma, Dehua Xia and Haiying Yu

Molecules 2026, 31(6), 961; https://doi.org/10.3390/molecules31060961 - 12 Mar 2026

Viewed by 433

Abstract

The acid dissociation constant (pK_a) is a fundamental parameter governing the environmental fate of organic compounds. Accurate pK_a prediction remains challenging, as traditional binary Morgan fingerprints (B-MF) fail to capture stoichiometric information critical for modeling substituent effects. This [...] Read more.

The acid dissociation constant (pK_a) is a fundamental parameter governing the environmental fate of organic compounds. Accurate pK_a prediction remains challenging, as traditional binary Morgan fingerprints (B-MF) fail to capture stoichiometric information critical for modeling substituent effects. This study developed an interpretable machine learning framework for pK_a prediction by integrating count-based Morgan fingerprints (C-MF) with ensemble algorithms. Through systematic comparison across four algorithms (Catboost, XGBoost, GBDT, RF), C-MF consistently outperformed B-MF due to its ability to quantify functional group multiplicity. Subsequent SHAP-based recursive feature elimination (SHAP-RFE) optimized the model, identifying Catboost with only 81 features as the optimal architecture, achieving a test-set R² of 0.890 and RMSE of 1.026. SHAP analysis revealed that the model’s decisions are driven by chemically intuitive features, forming a hierarchical framework where primary ionizable sites set the baseline pK_a and electronic modifiers fine-tune it. The applicability domain, defined using the AD_SAL method, yielded high-confidence predictions (R² = 0.926). External validation on an independent open-source dataset containing 6876 acidic compounds, combined with results from AD_SAL application domain characterization, enabled accurate pK_a prediction for 390 compounds within the application domain (R² = 0.890, RMSE = 0.942). This further confirms the model’s strong generalizability. This work provides a robust and generalizable tool for high-performance pK_a prediction, with significant potential for applications in environmental risk assessment. Full article

(This article belongs to the Section Computational and Theoretical Chemistry)

► Show Figures

Graphical abstract

12 pages, 1595 KB

Open AccessArticle

Cloud Point Temperature of Thermoresponsive Systems: A Predictive Approach in Data Scarcity Conditions

by Marcela Elisabeth Penoff, Facundo Ignacio Altuna and Luis Alejandro Miccio

Appl. Sci. 2026, 16(5), 2557; https://doi.org/10.3390/app16052557 - 6 Mar 2026

Viewed by 334

Abstract

In this study, we employ machine learning techniques to improve materials in data scarcity conditions. In particular, we focus on the prediction of the cloud point temperatures of polymer–water systems with thermoresponsive behavior. We compare a model trained directly on the available data [...] Read more.

In this study, we employ machine learning techniques to improve materials in data scarcity conditions. In particular, we focus on the prediction of the cloud point temperatures of polymer–water systems with thermoresponsive behavior. We compare a model trained directly on the available data with a model based on representations learned through an encoder–decoder model, in turn pre-trained on a larger dataset to generate molecular fingerprints. Our results demonstrate that the embedding-based model significantly outperforms the direct model in predicting the cloud point temperature under the data limitations imposed by rigorous curation. This approach highlights the potential of domain-informed representation learning to tackle complex materials science problems with limited data. Full article

► Show Figures

Figure 1

22 pages, 8506 KB

Open AccessArticle

AI-Generated Spatial Pattern Matching for Hospital Indoor Positioning

by Boseong Kim, Shiyi Li, Jaewi Kim and Beomju Shin

Appl. Sci. 2026, 16(5), 2552; https://doi.org/10.3390/app16052552 - 6 Mar 2026

Viewed by 371

Abstract

Indoor positioning in hospitals is challenging because global navigation satellite systems signals are unavailable and existing solutions struggle with complex indoor propagation and high maintenance requirements. Fingerprinting-based methods using Wi-Fi, Bluetooth Low Energy (BLE), or magnetic field depend on extensive site surveys, while [...] Read more.

Indoor positioning in hospitals is challenging because global navigation satellite systems signals are unavailable and existing solutions struggle with complex indoor propagation and high maintenance requirements. Fingerprinting-based methods using Wi-Fi, Bluetooth Low Energy (BLE), or magnetic field depend on extensive site surveys, while time or angle-based systems such as ultra-wide band, angle of arrival, and Wi-Fi round trip time require additional infrastructure. Recent machine learning approaches improve performance but remain limited by Pedestrian Dead Reckoning (PDR) drift and unstable spatial representations. This study proposes an AI-generated spatial pattern matching framework that integrates an AI-based PDR model with BLE Received Signal Strength Indicator (RSSI) to construct a user RSSI surface. Spatial similarity between user-generated patterns and the pre-built radio map is evaluated using Surface Correlation (SC), and a bi-directional candidate generation strategy with SC-based heading correction is employed to mitigate inertial drift. Experiments in a real hospital setting show that the proposed method achieves robust and accurate localization even in complex indoor environments where conventional fingerprinting and PDR techniques often fail. The results indicate that combining AI-driven inertial modeling with SC-based spatial pattern matching offers a practical and infrastructure-friendly solution for hospital indoor positioning. Full article

► Show Figures

Figure 1

16 pages, 1606 KB

Open AccessArticle

GenReP: An Ensemble Model for Predicting TP53 in Response to Pharmaceutical Compounds

by Austin Spadaro, Alok Sharma and Iman Dehzangi

Molecules 2026, 31(4), 739; https://doi.org/10.3390/molecules31040739 - 21 Feb 2026

Viewed by 503

Abstract

TP53 is a tumor-suppressor gene involved in regulating apoptosis, DNA repair, and genomic stability. Mutations in TP53 are implicated in approximately half of all detected cancers, including breast, lung, colorectal, and ovarian cancers, making it a significant target for therapeutic interventions. Many pharmaceutical [...] Read more.

TP53 is a tumor-suppressor gene involved in regulating apoptosis, DNA repair, and genomic stability. Mutations in TP53 are implicated in approximately half of all detected cancers, including breast, lung, colorectal, and ovarian cancers, making it a significant target for therapeutic interventions. Many pharmaceutical drugs aim to restore TP53 function, and there is a need for predictive tools to assess how compounds may affect TP53 expression. In this study, we propose a new ensemble machine-learning model to predict the direction of TP53 relative gene expression in response to pharmaceutical compounds. Our model utilizes molecular fingerprints, descriptors, and scaffold-based features extracted from SMILES representations of compounds concatenated into a single feature vector. Trained using our newly generated benchmark dataset based on the Connectivity Map (CMap) database and addressing class imbalance with the Synthetic Minority Over-sampling Technique (SMOTE), our model achieves 62.9%, 93.9%, 40.3%, and 0.39 in terms of accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC), respectively. As the first-of-its-kind TP53 gene regulation prediction, our study serves as a convincing proof-of-concept that paves the way for future investigation. GenReP as a stand-alone predictor, its source code, and our newly generated benchmark dataset are publicly available. Full article

(This article belongs to the Special Issue Computational Insights into Protein Engineering and Molecular Design)

► Show Figures

Figure 1

Search Results (516)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (516)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI