Enabling Early Prediction of Side Effects of Novel Lead Hypertension Drug Molecules Using Machine Learning

Takudzwa Ndhlovu; Uche A. K. Chude-Okonkwo

doi:10.3390/ddc4030035

and

Institute for Artificial Intelligent Systems, University of Johannesburg, Auckland Park 2006, South Africa

^*

Author to whom correspondence should be addressed.

Drugs Drug Candidates2025, 4(3), 35;https://doi.org/10.3390/ddc4030035

This article belongs to the Section In Silico Approaches in Drug Discovery

Version Notes

Order Reprints

Review Reports

Abstract

Background: Hypertension is a serious global health issue affecting over one billion adults and leading to severe complications if left unmanaged. Despite medical advancements, only a fraction of patients effectively have their hypertension under control. Among the factors that hinder adherence to hypertensive drugs are the debilitating side effects of the drugs. The lack of adherence results in poorer patient outcomes as patients opt to live with their condition, instead of having to deal with the side effects. Hence, there is a need to discover new hypertension drug molecules with better side effects to increase patient treatment options. To this end, computational methods such as artificial intelligence (AI) have become an exciting option for modern drug discovery. AI-based computational drug discovery methods generate numerous new lead antihypertensive drug molecules. However, predicting their potential side effects remains a significant challenge because of the complexity of biological interactions and limited data on these molecules. Methods: This paper presents a machine learning approach to predict the potential side effects of computationally synthesised antihypertensive drug molecules based on their molecular properties, particularly functional groups. We curated a dataset combining information from the SIDER 4.1 and ChEMBL databases, enriched with molecular descriptors (logP, PSA, HBD, HBA) using RDKit. Results: Gradient Boosting gave the most stable generalisation, with a weighted F1 of 0.80, and AUC-ROC of 0.62 on the independent test set. SHAP analysis over the cross-validation folds showed polar surface area and logP contributing the largest global impact, followed by hydrogen bond counts. Conclusions: Functional group patterns, augmented with key ADMET descriptors, offer a first-pass screen for identifying side-effect risks in AI-designed antihypertensive leads.

Keywords:

hypertension; side effect prediction; machine learning; gradient boosting; XGBoost; functional groups; drug design

1. Introduction

1.1. Background

Hypertension, or high blood pressure, is a public health challenge affecting over one billion adults worldwide [1]. It is a leading cause of cardiovascular diseases, diabetes, and kidney failure when not properly managed [2]. Despite the availability of effective treatments, only about 20% of patients successfully control their hypertension [3]. A major barrier to effective management is the negative side effects of antihypertensive drugs, which often lead to poor patient adherence [4]. These include headaches, heart palpitations, hypotension, and hyperkalemia [5].

The side effects arise due to off-target interactions, dosage issues, and drug–drug interactions, all of which are influenced by the chemical structure of the drug molecules [6]. Traditional methods of evaluating drug side effects rely on experimental and clinical testing, which is time-consuming and costly [7]. With the advent of artificial intelligence (AI) and other in silico techniques in drug discovery, vast numbers of novel antihypertensive drug candidates are being computationally synthesised [3,8,9], requiring the evaluation of their side effects. Obviously, the traditional drug side effect prediction method alone cannot match the demand of side effect prediction for the vast number of molecules generated using AI. In addition, the ability to predict the potential side effects of drug molecules in the early stage of drug discovery can help reduce attrition in the later stage of the drug discovery process. Hence, efficient computational approaches are needed to predict potential side effects early in the drug development process to reduce the attrition rate and the burden on the eventual analysis of experimental side effects.

This study aims to develop a machine learning model to predict possible side effects of AI-generated antihypertensive drug molecules based on their molecular properties, focusing on functional groups sourced from [3,8]. By establishing a relationship between drug molecules and their associated side effects, we seek to enhance lead drug-molecule safety profiling to prioritise candidates with better safety profiles for further development.

Our contributions are as follows.

Curating a dataset that links side effects of antihypertensive drugs to their chemical and molecular properties.
Developing and evaluating machine learning models—random forest, gradient boosting, and XGBoost—for side effect prediction.
Using the developed AI-based prediction model to analyse the side effect profile of a set of computationally synthesised novel lead hypertension drug molecules.

1.2. Related Work

Predicting drug side effects is a complex task due to the intricate biological interactions within the human body. Traditional methods like animal models have limitations in cost, time, and interspecies differences [10,11]. Hence, computational methods, especially machine learning, are viable options in side effect prediction and have gained prominence for their ability to detect nonlinear relationships in large datasets [12].

Previous studies have utilised deep learning techniques on extensive chemical structure datasets to predict specific toxicities, such as drug-induced liver injury [13]. In [14], the authors made use of the gene expression profiles of various drugs to predict side effects. Their work demonstrates that the chemical structure of a drug compound may better predict side effect expression than the gene expression profiles altered by the compound. Naturally, this highlights the need for a set of engineered features to address the side effect prediction challenge that focuses on the structure of the compound itself. Along this perspective, functional groups within molecules are known to influence physicochemical properties and ADMET (absorption, distribution, metabolism, excretion, and toxicity) characteristics and affect the efficacy and safety profile of the compound [15]. Hence, they potentially possess crucial features for predicting the side effects of novel drug molecules.

However, datasets such as SIDER and ChEMBL that are known to provide valuable data for training models to predict adverse side effects [16] do not include compelling features to link molecular structures with side effects of novel drug molecules. The study we undertook employs molecular feature engineering and machine learning, focusing on functional groups and ADMET properties, to predict the side effects of AI-designed hypertension drugs.

More recently, ref. [17] introduced the geometric self-expressive model, which fuses heterogeneous drug–drug and side effect graphs and attains state-of-the-art AUC-ROC on SIDER while prospectively evaluating unseen post-marketing effects [17]. Their emphasis on interpretable similarity matrices and a pre- versus post-approval side effect split informs multi-graph feature integration and chronological test protocol adopted in the present work. The authors note the contribution of this paper but emphasise that their own work focuses specifically on predicting the side effects of synthetically generated drug molecules at the pre-approval stage. In contrast, ref. [17] primarily address post-marketing adverse effects in already approved compounds. Furthermore, fragment-based drug design approaches have also leveraged machine learning to streamline molecular optimisation and predict key pharmacological properties. The MolOptimizer toolkit enables efficient scaffold optimisation through cheminformatics-driven feature extraction and ML modeling [18]. Additionally, a comprehensive survey by [19] highlights the increasing role of artificial intelligence and machine learning in the early detection of adverse drug reactions and drug-induced toxicities. The study shows that the use of diverse computational methodologies enhances the accuracy of early-stage predictions and contribute to reducing drug-development risks. Our work aligns closely with this perspective by providing early-warning side effect predictions for antihypertensive drug candidates.

2. Experimental Results and Discussion

In this section, we present the performance results of the proposed models and the prediction results of applying the best of the three models developed to predict the side effects of novel drug molecules that were computationally synthesised.

2.1. Model Performance Comparison

The cross-validation and out-of-bag diagnostics above suggest that all three architectures fit the SMOTE-balanced training space very well; we now test how that optimism transfers to a completely unseen 30% holdout split. After applying SMOTE to the dataset to address the class imbalance, we evaluated the three models on accuracy, the F1-score, AUC-ROC, and MCC.

In Table 4 gradient boosting records the highest test et MCC (0.15) and the smallest CV-to-test drop, confirming the impressions from the cross-validation and the OOB. These observations are echoed in Figure 1. XGBoost, although dominant during training, loses most of its advantage (MCC 0.84 → 0.13) once real unseen molecules are seen. The random forest’s MCC collapses to 0.10 which implies that it does not reliably capture minority class positives. Thus, standard gradient boosting emerges as the best model, having the highest test-set MCC and smallest cross-validation-to-test drop.

2.2. Impact of SMOTE on Model Performance

It is worthy to note that the application of SMOTE improved the models’ ability to predict and increase the underrepresented side effects as is shown in Figure 2. Figure 2 shows the improvement in AUC-ROC scores after using SMOTE. In doing so, SMOTE provides the models with more examples to learn from, thereby reducing bias, improving the models’ sensitivity, and enhancing generalisation.

Figure 1. Gradient boosting confusion matrix on the independent test split.

Figure 2. AUC–ROC scores with and without SMOTE.

2.3. Analysis of Functional Groups and ADMET Properties

Using the models, we analysed the top fifteen functional groups that are prevalent in the dataset (see Figure 12) regarding how they contribute to the side effects category. The functional groups are amine (primary, secondary, and tertiary), amide, ester, alcohol (primary, secondary, and tertiary), Ether, thiol, benzene, aldehyde, carbonyl, carboxyl, phenol, and thioether. Figure 3 illustrates the relationships between functional groups and side effects.

The study identified strong associations between specific functional groups and particular side effects. For example, phenols and amines are high-degree centrality nodes in the network, which means that they contribute to a large number of side effects. The phenol [20], amine [21], and benzene [22] groups are involved in liver conditions through the inhibition or inducement of liver enzymes such as cytochrome P450 enzymes, which can lead to hepatotoxicity. The presence of the phenol group [23] and amine group [24] can also cause gastrointestinal disturbances.

Figure 3. Functional Group and Condition Network Diagram.

On the other hand, the alcohol group is implicated in eye- [25] and gastrointestinal-related [26] side effects. Aldehydes are linked to respiratory conditions due to their reactivity and potential to cause cellular damage [27,28]. Esters are associated with neurological disorders, consistent with their known neurotoxic effects [29]. Carboxylic acids correlate with cardiovascular issues, reflecting their role in lipid metabolism and the side effects of NSAIDs [28,30]. Additionally, phenols and alcohols are connected to respiratory and neurological conditions, attributable to their toxicity and impact on the central nervous system [31,32,33].

It is useful to quantify how the four ADMET descriptors steer the gradient boosting classifier. Using SHAP values, we obtain the ranking in Figure 4: polar surface area (PSA) contributes the most, followed by logP, hydrogen-bond acceptors (HBA), and hydrogen-bond donors (HBD).

PSA has a well-known impact on membrane permeability and blood–brain barrier penetration, making it a strong indicator of potential side effects [34]. Next in importance is logP which describes a compound’s lipophilicity or affinity for lipids [35]. Lipinski’s “rule of 5” highlights that compounds exceeding certain logP values

C L o g P > 5 o r M L o g P > 4.15

, in conjunction with high counts of hydrogen bond donors

> 5

, acceptors

> 10

, and increased molecular weight

> 500

, are more likely to have poor absorption or permeation [35]. HBA slightly outranks HBD, consistent with Lipinski’s observation that having more than 10 hydrogen-bond acceptors can hinder oral bioavailability more than an excess of donor [35]. The relatively lower SHAP value for HBD suggests that, while still relevant, donor count alone is less predictive than PSA or hydrophobicity for the side effect profiles considered here.

Figure 4. Mean SHAP values over 3 fold cross-validation.

2.4. Exploratory Case Study on Fifty AI-Generated Leads

In this section, we evaluate the predictions of the proposed gradient boosting model on fifty (50) randomly selected molecules from the set of lead hypertension drug molecules that are computationally generated in [3] and validate their functional group analysis. Figure 5 shows a sample of drug molecules selected from the 50 molecules. This subset is intended purely as an exploratory illustration of the model’s interpretability so the accompanying visualisations are meant to highlight how the model reasons over functional group patterns, not to provide confirmatory performance evidence. The model predicts the likelihood of side effects based on molecular properties to provide early warnings during drug development. The distribution of the functional groups in the molecules is shown in Figure 6, where the same number of functional groups that are prominent in Figure 12 is applicable to Figure 6.

Figure 5. Sample of the molecules randomly selected from the dataset, where (a) has the SMILES, CC([NH+]1CCOCC1)CCc2c(N3CCCCC3)nsn2 and (b) has the SMILES Fc1ccc2c(CCC(OCCOc3cccc4c3CCC(=O)N4)C2)C1.

To display the prediction results, we group the fifty (50) molecules into five (5) and display the results in Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 as correlation heatmaps. This is performed purely for visual clarity and to aid in interpretability. There exists no other rationale behind the division. The prediction results from these figures show gastrointestinal conditions as the most common side effect, appearing in over 50% of compounds, while eye conditions are the least frequent (75% of samples show no such effects). Liver/pancreatic conditions occur in 14%, and dermatological and cardiovascular conditions appear in 37% and 18% of cases, respectively. Neurological conditions occur in 29%, and respiratory/otolaryngological conditions in only 10%. Most compounds exhibit two or fewer side effects, though 13.72% have five or more.

Further, we analyse five (5) molecules (one from each group in Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11) to highlight the predictive potential of the model. The molecules analysed are Molecule 10, Molecule 12, Molecule 23, Molecule 39, and Molecule 50. Molecule 10 has the following SMILES representation, CC(CCc1cc(-c2nscc2-c2ccc(O)cc2)ccc1O)N1CCOCC1. The molecule contains the amine groups, the hydroxyl group associated with the primary and secondary alcohol, a phenolic benzene ring, and a thiol group. The proposed model predicts high probabilities of liver and dermatological conditions at 0.91 with about 0.44 probability of side effects related to the eye and speech. The model also predicted the possibilities for the occurrence of gastrointestinal, respiratory, and cardiovascular conditions, although, lower. Molecule 10 has moderate logP (4.54) with two hydrogen-bond donors and six hydrogen-bond acceptors which suggests that it should be suitably permeable. Further, it implies then that the functional group combination drives the risk.

Molecule 12 has the following SMILES representation, CC1(C)OCC(CCC(=O)Nc2nsnc 2C2CCOCC2)O1. This molecule has the following functional groups: primary amine, amide, primary alcohol, secondary alcohol, carbonyl, aldehyde, ether, and thiol groups. The model predicts high probabilities of liver, gastrointestinal, and dermatological conditions at over 0.80 and neurological, at about 0.7. This correlates with its ADMET properties with Molecule 12 showing higher PSA (83) and an elevated number of hydrogen-bond acceptors. Molecule 23 has the following SMILES representation, NC(=O)Cc1nsnc1Oc1cc(C2CCOCC2)ccc1O. This molecule has the following functional groups, amide, alcohol, carboxyl, carbonyl, ether, phenol, benzene, and thiol groups. The model predicts high probabilities of liver, gastrointestinal, and dermatological conditions at over 0.70 with cardiovascular and neurological conditions at about 0.7. Molecule 23 shows a high polar surface area (108) and moderately low logP value (1.96) and descries a molecule with high polarity but one that is mildly lipophilic which agrees with its functional group make-up. Molecule 39 has the following SMILES representation, COC(=O)CCc1ccc(OCC(O)CNC(C)C)cc1. This molecule has the following functional groups, primary alcohol, secondary amine, ether, and ester groups. The model predicts high probabilities of gastrointestinal, cardiovascular, and dermatological conditions at over 0.70, and over 0.5 for respiratory condition Its moderate logP ( 1.5), two H-bond donors, five acceptors, and a mid-level PSA ( 68) line up with what the model sees in ester–amine molecules that often generate gut, skin, and heart side effect flags. For Molecule 50, whose SMILES representation is CC(N)CCc1cc(Nc2cccc3c2C[C@H](O)[C@H](O)C3)ccc1O, the functional groups associated with it include amine, alcohol, phenol, and benzene groups. The model predicted considerable probabilities of occurrence of all the side effects except eye condition.

Hence, the model prediction underscores that Molecule 1–5 and all other molecules in the set with considerable spread across the side effects may have to be discarded from the set or redesigned. However, we note that predicting the side effects of novel drug molecules (with no prior experimental or clinical data; hence, no ground truth to compare against) using the proposed models is not confirmatory but only provides early warnings during drug development. Therefore, further analysis and tests are needed to confirm accuracy.

Figure 6. Functional group distribution in the 50 novel lead drug molecules.

Figure 7. Correlation heatmap of side effects for Group 1 molecules.

Figure 8. Correlation heatmap of side effects for Group 2 molecules.

Figure 9. Correlation heatmap of side effects for Group 3 molecules.

Figure 10. Correlation heatmap of side effects for Group 4 molecules.

Figure 11. Correlation heatmap of side effects for Group 5 molecules.

3. Methodology

3.1. Data Curation

We aggregated data from the SIDER 4.1 Side Effect Resource [16] and ChEMBL databases [36]. The SIDER database provides information on FDA-approved medications and their associated adverse side effects, while ChEMBL offers detailed chemical information on bioactive molecules. Thus, let

D_{S I D E R}

and

D_{C H E M B L}

represent the datasets curated from SIDER 4.1 and CHEMBL, such that

\begin{matrix} \begin{matrix} D_{S I D E R} = {(d_{i}, s_{j}) | d_{i} ϵ D, s_{j} ϵ S}, \\ D_{C H E M B L} = {(d_{i}, c_{k}) | d_{i} ϵ D, c_{k} ϵ C}, \end{matrix} \end{matrix}

(1)

where

D

,

S

, and

C

are the set of drugs, the set of side effects, and the set of chemical properties, respectively. The terms

d_{i}

,

s_{j}

, and

c_{k}

represent the i-

t h

drug, j-

t h

side effect, and the k-

t h

chemical property, respectively.

Using RDKit [37], a cheminformatics toolkit, we enhanced the dataset by extracting molecular descriptors,

M

. The descriptors computed include logarithm of the partition coefficient (logP), hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), and polar surface area (PSA). The logP descriptor measures a compound’s affinity for a lipophilic environment, which can significantly influence the ADMET profile of a chemical compound [38]. Extreme values of logP, on either side of the spectrum, can result in poor absorption or rapid clearance of the drug compound [35]. The HBD and HBA are also key factors in side effect prediction due to their influence on target interactions. HBD provides more flexibility in drug side effect prediction and design compared to HBA, which primarily affects solvation and polarity [39]. Similarly, the PSA of a molecule, which is defined as the area covered by polar atoms such as oxygen and nitrogen (and their hydrogens), is crucial. As discussed in [34], PSA correlates well with molecular transport through membranes, particularly the blood-brain barrier. Chemical compounds with a larger PSA are more likely to exhibit poorer permeability to the cell membrane [34].

We also identified the functional groups,

F

, within each drug molecule and assigned to each drug an anatomical therapeutic chemical (ATC) code,

A

, which is a universal identifier and is used to relate compounds across different data banks. Functional groups within each molecule were identified using substructure matching in RDKit, and the behaviour they demonstrate is observable across different contexts [15], which can link functional groups to specific side effects. We selected 25 functional groups based on their prevalence in bioactive molecules and potential impact on side effects [40]. This not only helps narrow the focus but also allows us to work with the bioactive molecules that are being used by chemical practitioners in the modern age to design drug compounds. The functional groups included in the dataset are listed in Table 1.

We merged the datasets to create a unified representation in which each drug compound is associated with its molecular properties and side effects. To reduce noise in the dataset, the side effects were filtered to focus on preferred terms, which refers to a standardised way of describing side effects in clinical research. We classified side effects into eight major groups based on affected organs or systems (e.g., gastrointestinal, cardiovascular, neurological) as shown in Table 2.

Table 1. Functional groups in the dataset.

Functional Groups
Amine (Primary, secondary, and tertiary)
Amide
Alcohol (Primary, secondary, and tertiary)
Ester
Carboxylic Acid
Carbonyl
Aldehyde
Ketone
Halogenated Compound (Chlorine, Fluorine, and Bromine)
Nitro Group
Sulfonamide
Phenol
Benzene
Imidazole
Thioether
Ether
Thiol
Phosphoryl
Quaternary Ammonium

Our current functional group features, clearly enumerated in Table 1, focus mainly on key pharmacologically relevant groups (amines, alcohols, halogens, carbonyl derivatives, etc.), intentionally chosen for interpretability and common relevance in hypertension drugs. That said, expanding features to include explicit hetereocyclic systems would improve the predictive power and this is recommended for future iterations [41].

Table 2. Conditions and corresponding affected organs or systems.

Group	Affected Organ/System
Gastrointestinal and Eating-related Conditions	Gastrointestinal System
Eye Conditions	Eyes
Liver and Pancreatic Conditions	Liver, Pancreas
Respiratory and Lung Conditions	Respiratory System
Dermatological Conditions	Skin
Cardiovascular Conditions	Heart, Blood Vessels
Neurological and Cognitive Conditions	Brain, Nervous System
Speech and Otolaryngological Conditions	Mouth, Throat, Ears

Using RDKit, we also extracted the SMILES,

U

, of the chemical compounds and identified their common names,

V

, by their common identifier number. SMILES stands for Simplified Molecular Input Line Entry System and is a molecular transcription specification. SMILES allows three-dimensional molecules to be represented as strings that a computer can interpret [42]. This representation makes it possible for RDKit to determine the functional groups that make up a chemical compound.

The final dataset,

D_{F}

, for training the side effect prediction is given as follows:

\begin{matrix} \begin{matrix} D_{F} = {(d_{i}, s_{j}, c_{k}), D (d_{i}), M (d_{i}), F (d_{i}) A (d_{i}), V (d_{i}), U (d_{i})) | (d_{i}, s_{i}, c_{k}) ϵ S \cup C}, \end{matrix} \end{matrix}

(2)

For the purpose of prediction, each instance in

D_{F}

can be represented as a feature vector

x_{i}

such that

\begin{matrix} \begin{matrix} x_{i} = [c_{k}, M (d_{i}), F (d_{i}), A (d_{i}), V (d_{i}), U (d_{i})], \end{matrix} \end{matrix}

(3)

Hence, if

{s_{j}}

represents the actual side effects of a drug, and

{z_{j}}

the predicted side effects of a drug, the objective of this paper is to develop a model

Q

\begin{matrix} \begin{matrix} z_{j} = Q (x_{i}), \end{matrix} \end{matrix}

(4)

such that

\begin{matrix} \begin{matrix} m i n (d ({z_{j}}, {s_{j}})) \end{matrix} \end{matrix}

(5)

where

d (.)

is a distance metric quantifying the difference between

{s_{j}}

and

{z_{j}}

.

3.2. Exploratory Data Analysis

Here, we analyse the distribution of molecular properties and functional groups in the

D_{F}

. The analytical results indicate that the logP values were slightly negatively skewed, indicating that most compounds are neutral in lipid-water affinity. The HBD, HBA, and PSA show slight positive skewness, consistent with expectations for small- to medium-sized molecules.

Figure 12 shows the prevalence of functional groups in the dataset. Some groups, such as primary alcohols and primary amines, were well represented, while others, such as imidazoles, were scarce, indicating potential biases in hypertension drug design trends.

Figure 12. Functional group prevalence in the dataset.

We used a dataset comprising 726 drug molecules as detailed explicitly in the original dataset file. This dataset was split explicitly into 70% training (508 samples) and 30% test set (218 samples) using stratified sampling.

3.3. Handling Data Imbalance

In general,

D_{F}

showed a class imbalance, with some categories of side effects being underrepresented. Hence, we used the Synthetic Minority Oversampling Technique (SMOTE) [43] to generate synthetic samples for minority classes to improve the ability of the models to predict side effects. Specifically, if we let

x_{i}

be a sample of a minority class in the dataset, for a random sample

x_{j}

in a set of k-nearest neighbours, a synthetic sample

x_{s y n t h e t i c}

is generated such that

\begin{matrix} \begin{matrix} x_{s y n t h e t i c} = x_{i} + λ (x_{j} - x_{i}), \end{matrix} \end{matrix}

(6)

3.4. Model Development and Evaluation

Here, we developed and compared three machine learning models, namely, random forest, gradient boost classifier, and XGBoost. In all experiments, we report weighted accuracy, weighted recall, weighted F₁, macro-averaged AUC–ROC and Matthews correlation coefficient (MCC) for each of the eight side effect classes. To verify that the test set predictions fall within the chemical space spanned by the training molecules, we conducted a post-hoc leverage-based Williams plot analysis [44,45]. Using the threshold

h^{*} = 3 p / n

with

p = 29

descriptors and

n = 508

training compounds yielded

h^{*} = 0.171

; no held-out molecule exceeded this limit, indicating that all reported metrics lie inside the model’s applicability domain, whereas predictions for de novo molecules are flagged whenever

h > h^{*}

.

A direct benchmark is non-trivial because no public corpus simultaneously (i) contains pre-approval side effect labels, (ii) targets antihypertensive leads, and (iii) exposes the set of functional group descriptors we require. Nevertheless, to give readers an external point of reference, we compare our best AUC-ROC (0.93) against the Geometric Self-Expressive Model (GSEM) reported by [17], which achieved 0.92 on SIDER when trained on post-marketing data. While the chemotype coverage and label granularity differ, this rough comparison shows our internally synthesised baseline (Random Forest → 0.82; Gradient Boosting → 0.93) is in the same performance range as state-of-the-art graph methods. Because no existing SIDER-class dataset contains the engineered functional group flags essential for our use-case, we elected to synthesise a bespoke corpus and evaluate three classical ensembles (random forest, gradient boosting, XGBoost) as internal baselines.

3.4.1. Random Forest (Baseline Model)

The random forest classifier was used as a baseline due to its simplicity and interpretability. The random forest classifier is an ensemble learning technique that aggregates predictions from multiple decision trees trained on bootstrap samples of the data [46]. Hence, given a set of decision trees

T = {T_{i}}_{i = 1, 2, 3, . . ., N}

, for an input x, the prediction

\hat{y}

is

\begin{matrix} \begin{matrix} \hat{y} = m o d e ({T_{i} (x)}_{i = 1}^{N}), \end{matrix} \end{matrix}

(7)

where

T_{i} (x)

is the prediction of tree

T_{i}

for input x, and

m o d e (.)

returns the most frequent class.

3.4.2. Gradient Boosting Classifier

The gradient boosting classifier builds additive models in a forward stage-wise fashion, focusing on correcting errors from previous models [46]. It is effective at handling complex, nonlinear relationships. The classifier can be expressed as

\begin{matrix} \begin{matrix} G (x) = \sum_{l = 1}^{L} γ_{l} v_{l} (x), \end{matrix} \end{matrix}

(8)

where

v_{l} (x)

denotes the weak learner at iteration l,

γ_{l}

is the weight of the weak learner, and L is the number of iterations.

3.4.3. XGBoost (State-of-the-Art Model)

XGBoost is an optimised implementation of gradient boosting and offers improved speed and performance. It applies advanced regularisation and support for parallel computing [46,47]. For an input x and K number of trees, the prediction

\hat{y}

of the XGBoost is

\begin{matrix} \begin{matrix} \hat{y} = \sum_{k = 1}^{K} f_{k} (x), f_{k} ϵ F, \end{matrix} \end{matrix}

(9)

where

f_{k}

is a decision tree in the space of trees

F

.

3.4.4. Thresholding and Calibration

Since the architecture produces continuous model probabilities, a threshold of 0.652 was applied based on preliminary analyses through which the authors identify an ideal balance between precision and recall across multiple labels. The data is initially split using a 70-30 train-test split. The train set is then used to choose the model using stratified five-fold cross validation in the training dataset to fine tune hyperparameters. It was found that three folds provided a better balance, and to provide overfitting, due to the availability of the dataset. Indeed, calibration on a boosted model yields better performance than when that model is not calibrated [48]. This step is not performed in this experiment because we do not wish to explore exact probabilities but rather a ranking between labels. That said, future iterations would benefit from this optimisation.

3.4.5. Evaluation Methods

The three models are evaluated using the following metrics.

Accuracy: Overall correctness of the model.
F1-score: Harmonic mean of precision and recall which is important for imbalanced datasets.
AUC-ROC Score: Measures the model’s ability to distinguish between classes.
Matthews Correlation Coefficient (MCC): A metric to measure the model’s ability to generalise due to its robustness to class imbalance.

All metrics are reported for the re-sampled training split of 508 molecules and the independent hold-out test split of 218 molecules.

3.4.6. Cross-Validation and Out-of-Bag Assessment

Before evaluating on an external test set, we assess generalisation internally using stratified cross-validation (CV) and out-of-bag (OOB) estimates. We use stratified k-fold CV to preserve class balance across folds; this allows for more reliable performance estimates for imbalanced data [49]. For random forests, this is achieved automatically as each tree is trained on a subset of the data and thus leaves some observations for implicit validation [50].

From Table 3, all three models show high internal scores

M C C \geq 0.91

which is expected because the SMOTE generated samples resemble the ones used in training. The random forest outperforms the gradient boosters internally but performs the worst on a true unseen set (see Table 4). This shows that the random forest is overfitting on the synthetic minority examples. The gradient boosters also show a drop in the MCC, but to a smaller degree which speaks to their resilience to distribution shift relative to the random forest.

Table 3. Median cross-validation scores across eight side effect categories.

Model	CV AUC–ROC	CV MCC	OOB MCC	Best Hyperparameters
Random Forest	0.979	0.947	0.90	$n = 500, d = 6,$ balanced_sub
GradientBoosting	0.953	0.911	–	$n = 200, l r = 0.10, d = 3$
XGBoost	0.958	0.940	–	$n = 300, l r = 0.05, d = 4$

Table 4. Test and training performance metrics of the evaluated models.

Model	Split	Accuracy (%)	F₁ (Weighted)	AUC–ROC	MCC
Random Forest	Train	72.0	0.70	0.927	0.51
Random Forest	Test	83.0	0.79	0.636	0.10
Gradient Boosting	Train	79.0	0.79	0.921	0.61
Gradient Boosting	Test	81.0	0.80	0.616	0.15
XGBoost	Train	92.0	0.92	0.977	0.84
XGBoost	Test	81.0	0.80	0.619	0.13

4. Conclusions

This paper presented a machine learning approach to predict the potential side effects of computationally synthesised antihypertensive drug molecules on the basis of their molecular properties. On an independent 30% holdout set, the gradient boosting model achieved a modest Matthews correlation coefficient (MCC) of 0.15. Therefore, the model should be regarded primarily as a descriptive tool rather than a definitive predictor of novel compound activity. Nevertheless, it highlights chemotypes whose functional group patterns and ADMET profiles resemble drugs with known liabilities and provides an early triage filter to help discard ineffective compounds early. Future research should leverage larger datasets with more comprehensive molecular descriptors to enhance predictive reliability.

Author Contributions

T.N. and U.A.K.C.-O. conceived the presented idea. T.N. conducted the computational simulation and analysis, and T.N. and U.A.K.C.-O. provided the synthesis analysis. T.N. wrote the original manuscript. U.A.K.C.-O. revised the manuscript. U.A.K.C.-O. supervised and coordinated the project. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset of the 50 molecules used in this work is available on request made to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. A Global Brief on Hypertension: Silent Killer, Global Public Health Crisis: World Health Day 2013; World Health Organization: Geneva, Switzerland, 2013; Available online: https://apps.who.int/iris/handle/10665/79059 (accessed on 21 August 2024).
Mills, K.T.; Stefanescu, A.; He, J. The global epidemiology of hypertension. Nat. Rev. Nephrol. 2020, 16, 223–237. [Google Scholar] [CrossRef] [PubMed]
Lehasa, O.M.-E.; Chude-Okonkwo, U.A.K. Dataset for discovering new hypertension small molecules using machine learning-aided computational fragment-based design. Data Brief 2024, 55, 110677. [Google Scholar] [CrossRef] [PubMed]
Olowofela, A.O.; Isah, A.O. Profile and predictors of antihypertensive adherence among patients in a tertiary care setting in Southwestern Nigeria. Am. J. Hypertens. 2017, 30, 919–927. [Google Scholar]
Takuathung, M.N.; Sakuludomkan, W.; Khatsri, R.; Dukaew, N.; Kraivisitkul, N.; Ahmadmusa, B.; Mahakkanukrauh, C.; Wangthaweesap, K.; Onin, J.; Srichai, S.; et al. Adverse Effects of Angiotensin-Converting Enzyme Inhibitors in Humans: A Systematic Review and Meta-Analysis of 378 Randomized Controlled Trials. Int. J. Environ. Res. Public Health 2022, 19, 8373. [Google Scholar] [CrossRef]
Mao, F.; Ni, W.; Xu, X.; Wang, H.; Wang, J.; Ji, M.; Li, J. Chemical Structure-Related Drug-Like Criteria of Global Approved Drugs. Molecules 2016, 21, 75. [Google Scholar] [CrossRef]
Zanders, E.D. Preclinical Development. In The Science and Business of Drug Discovery; Springer: Cham, Switherland, 2020. [Google Scholar] [CrossRef]
Lehasa, O.M.-E.; Chude-Okonkwo, U.A.K. Machine Learning-aided Computational Fragment-based Design of Small Molecules for Hypertension Treatment. Intell.-Based Med. 2024, 10, 100171. [Google Scholar] [CrossRef]
Blanco-González, A.; Cabezón, A.; Seco-González, A.; Conde-Torres, D.; Antelo-Riveiro, P.; Piñeiro, Á.; Garcia-Fandino, R. The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. Pharmaceuticals 2023, 16, 891. [Google Scholar] [CrossRef]
Olson, H.; Betton, G.; Robinson, D.; Thomas, K.; Monro, A.; Kolaja, G.; Lilly, P.; Sanders, J.; Sipes, G.; Bracken, W.; et al. Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul. Toxicol. Pharmacol. 2000, 32, 56–67. [Google Scholar] [CrossRef]
Hartung, T. Toxicology for the twenty-first century. Nature 2009, 460, 208–212. [Google Scholar] [CrossRef]
Valerio, L.G. In silico toxicology for the pharmaceutical sciences. Toxicol. Appl. Pharmacol. 2009, 241, 356–370. [Google Scholar] [CrossRef]
Mostafa, F.; Chen, M. Computational models for predicting liver toxicity in the deep learning era. Front. Toxicol. 2023, 5, 1340860. [Google Scholar] [CrossRef] [PubMed]
Uner, O.C.; Kuru, H.I.; Cinbis, R.G.; Tastan, O.; Cicek, A.E. DeepSide: A Deep Learning Approach for Drug Side Effect Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 5, 1340860. [Google Scholar] [CrossRef] [PubMed]
Di, L.; Kerns, E.H. Drug-like Properties: Concepts, Structure Design and Methods: From ADME to Toxicity Optimization; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2015, 44, D1075–D1079. [Google Scholar] [CrossRef]
Galeano, D.; Paccanaro, A. Machine learning prediction of side effects for drugs in clinical trials. Cell Rep. Methods 2022, 2, 100358. [Google Scholar] [CrossRef]
Soffer, A.; Viswas, S.J.; Alon, S.; Rozenberg, N.; Peled, A.; Piro, D.; Vilenchik, D.; Akabayov, B. MolOptimizer: A Molecular Optimization Toolkit for Fragment-Based Drug Design. Molecules 2024, 29, 276. [Google Scholar] [CrossRef]
Gupta, S.; Sharma, A.; Singh, R. Application of artificial intelligence and machine learning in early detection of adverse drug reactions (ADRs) and drug-induced toxicity. Front. Pharmacol. 2024, 15, 1497397. [Google Scholar] [CrossRef]
Licata, A. Adverse drug reactions and organ damage: The liver. Eur. J. Intern. Med. 2016, 28, 9–16. [Google Scholar] [CrossRef]
Willson, C. Sympathomimetic amine compounds and hepatotoxicity: Not all are alike—Key distinctions noted in a short review. Toxicol. Rep. 2018, 6, 26–33. [Google Scholar] [CrossRef]
Guengerich, F.P. Common and Uncommon Cytochrome P450 Reactions Related to Metabolism and Chemical Toxicity. Chem. Res. Toxicol. 2001, 14, 611–650. [Google Scholar] [CrossRef]
Sumbul, S.; Ahmad, M.A.; Mohd, A.; Mohd, A. Role of phenolic compounds in peptic ulcer: An overview. J. Pharm. Bioallied Sci. 2011, 3, 361–367. [Google Scholar] [CrossRef]
El-Salhy, M.; Solomon, T.; Hausken, T.; Gilja, O.H.; Hatlebakk, J.G. Gastrointestinal neuroendocrine peptides/amines in inflammatory bowel disease. World J. Gastroenterol. 2017, 23, 5068–5085. [Google Scholar] [CrossRef] [PubMed]
Chang, Y.S.; Wu, C.L.; Tseng, S.H.; Kuo, P.Y.; Tseng, S.Y. In vitro benzyl alcohol cytotoxicity: Implications for intravitreal use of triamcinolone acetonide. Exp. Eye Res. 2008, 86, 942–950. [Google Scholar] [CrossRef] [PubMed]
Davies, N.M.; Anderson, K.E. Clinical pharmacokinetics of naproxen. Clin. Pharmacokinet. 1997, 32, 268–293. [Google Scholar] [CrossRef]
El-Maghrabey, M.H.; El-Shaheny, R.; El Hamd, M.A.; Al-Khateeb, L.A.; Kishikawa, N.; Kuroda, N. Aldehydes’ sources, toxicity, environmental analysis, and control in food. In Organic Pollutants: Toxicity and Solutions; Springer International Publishing: Cham, Switherland, 2021; Volume 32, pp. 117–151. [Google Scholar]
Floss, M.A.; Fink, T.; Maurer, F.; Volk, T.; Kreuer, S.; Müller-Wirtz, L.M. Exhaled Aldehydes as Biomarkers for Lung Diseases: A Narrative Review. Molecules 2022, 27, 5258. [Google Scholar] [CrossRef]
He, W.; Ding, J.; Gao, N.; Zhu, L.; Zhu, L.; Feng, J. Elucidating the toxicity mechanisms of organophosphate esters by adverse outcome pathway network. Arch. Toxicol. 2024, 98, 233–250. [Google Scholar] [CrossRef]
Baldo, B.A.; Pham, N.H. Non-steroidal Anti-inflammatory Drugs. In Drug Allergy; Springer: Cham, Switherland, 2021. [Google Scholar] [CrossRef]
National Institute for Occupational Safety and Health. Occupational Exposure to Phenol. Criteria for a Recommended Standard; DHEW Publication: No. (NIOSH) 76-196, 1976. U.S. Department of Health, Education, and Welfare. Available online: https://stacks.cdc.gov/view/cdc/19369 (accessed on 24 August 2024).
Downs, J.W.; Wills, B.K. Phenol Toxicity; StatPearls [Internet] Series; StatPearls Publishing: Treasure Island, FL, USA, 2023; (updated 13 March 2023); Available online: https://www.statpearls.com/ (accessed on 24 August 2024).
Chastain, G. Alcohol, neurotransmitter systems, and behavior. J. Gen. Psychol. 2006, 133, 329–335. [Google Scholar] [CrossRef]
Ertl, P.; Rohde, B.; Selzer, P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. J. Med. Chem. 2000, 43, 3714–3717. [Google Scholar] [CrossRef]
Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2012, 64, 4–17. [Google Scholar] [CrossRef]
Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef]
Landrum, G. RDKit. Q2. 2010. Available online: https://www.rdkit.org/ (accessed on 9 July 2024).
Waring, M.J. Lipophilicity in drug discovery. Expert Opin. Drug Discov. 2010, 5, 235–248. [Google Scholar] [CrossRef] [PubMed]
Kenny, P.W. Hydrogen-Bond Donors in Drug Design. J. Med. Chem. 2022, 65, 14261–14275. [Google Scholar] [CrossRef] [PubMed]
Ertl, P.; Altmann, E.; McKenna, J.M. The Most Common Functional Groups in Bioactive Molecules and How Their Popularity Has Evolved over Time. J. Med. Chem. 2020, 63, 8408–8418. [Google Scholar] [CrossRef] [PubMed]
Tsoumakas, G.; Katakis, I. Multi-label classification: An overview. Int. J. Data Warehous. Min. 2007, 3, 1–13. [Google Scholar] [CrossRef]
U.S. Environmental Protection Agency. Appendix F. SMILES Notation Tutorial. Available online: https://www.epa.gov/sites/default/files/2015-05/documents/appendf.pdf (accessed on 25 September 2024).
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Golbraikh, A.; Tropsha, A. Beware of q²! Validating QSAR models properly. J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Tropsha, A.; Consonni, V.; Todeschini, R. Comparison of different approaches to define the applicability domain of QSAR models. Mol. Inform. 2012, 31, 1–13. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R. In Springer Texts in Statistics, 1st ed.; Springer: New York, NY, USA, 2013; ISBN 978-1-4614-7138-7. [Google Scholar]
He, Z.; Lin, D.; Lau, T.; Wu, M. Gradient Boosting Machine: A Survey. arXiv 2019. arXiv:1908.06951. [Google Scholar] [CrossRef]
Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML), Bonn, Germany, 7–11 August 2005; pp. 625–632. [Google Scholar] [CrossRef]
Friedman, J. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Enabling Early Prediction of Side Effects of Novel Lead Hypertension Drug Molecules Using Machine Learning

Abstract

1. Introduction

1.1. Background

1.2. Related Work

2. Experimental Results and Discussion

2.1. Model Performance Comparison

2.2. Impact of SMOTE on Model Performance

2.3. Analysis of Functional Groups and ADMET Properties

2.4. Exploratory Case Study on Fifty AI-Generated Leads

3. Methodology

3.1. Data Curation

3.2. Exploratory Data Analysis

3.3. Handling Data Imbalance

3.4. Model Development and Evaluation

3.4.1. Random Forest (Baseline Model)

3.4.2. Gradient Boosting Classifier

3.4.3. XGBoost (State-of-the-Art Model)

3.4.4. Thresholding and Calibration

3.4.5. Evaluation Methods

3.4.6. Cross-Validation and Out-of-Bag Assessment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics