A Deep Learning-Based Quantitative Structure–Activity Relationship System Construct Prediction Model of Agonist and Antagonist with High Performance

Molecular design and evaluation for drug development and chemical safety assessment have been advanced by quantitative structure–activity relationship (QSAR) using artificial intelligence techniques, such as deep learning (DL). Previously, we have reported the high performance of prediction models molecular initiation events (MIEs) on the adverse toxicological outcome using a DL-based QSAR method, called DeepSnap-DL. This method can extract feature values from images generated on a three-dimensional (3D)-chemical structure as a novel QSAR analytical system. However, there is room for improvement of this system’s time-consumption. Therefore, in this study, we constructed an improved DeepSnap-DL system by combining the processes of generating an image from a 3D-chemical structure, DL using the image as input data, and statistical calculation of prediction-performance. Consequently, we obtained that the three prediction models of agonists or antagonists of MIEs achieved high prediction-performance by optimizing the parameters of DeepSnap, such as the angle used in the depiction of the image of a 3D-chemical structure, data-split, and hyperparameters in DL. The improved DeepSnap-DL system will be a powerful tool for computer-aided molecular design as a novel QSAR system.


Introduction
Quantitative structure-activity relationship (QSAR) models can reduce the time and cost of molecular screening through mathematical prediction models of regression or classification of properties and activities of a chemical compound based on their chemical structure and statistically significant corresponding physicochemical/toxicological properties with other methods such as homology modeling, molecular docking, and molecular dynamics (MD) simulation . The structure-based molecular design mainly includes a receptor-based method through a three-dimensional (3D) chemical structure to obtain ligand interaction [1,35,36]. However, traditional QSAR models may frequently miss suitable candidate molecules, because of the poor predictive accuracy and versatility caused by poor feature selection that requires skill and knowledge and conformational limitations for coincidence effect [1,[37][38][39]. Therefore, a QSAR system with high-throughput and performance is desired because of the development of novel medicines, chemicals, and nanomaterials on human health. The important factor for solving the QSAR issue is the extraction of information-rich numerical molecular descriptors associated with physicochemical/toxicological properties. However, 3D-QSAR has a high computational cost, and its performance is sensitive to changes in the ligand geometry such as conformation and orientation [1,40]. To resolve these drawbacks, 4D-QSAR, called MD-QSAR, applied the ligand geometry problem for the effective ligand constrains using a 4D-chemical descriptor with multiple structural conformation, orientation, and protonation state calculated through short run MD stimulation to approximate the Boltzman sampling [1,41,42]. Although the 4D-QSAR can reduce the bias by selecting conformation, orientation, and protonation state, it requires more adaptation of the ligand topology within its target protein-binding pocket [43]. Thus, 5D-QSAR was proposed explicitly to represent different induced-fit models in 4D-QSAR [1,41,[43][44][45]. Furthermore, 6D-QSAR was introduced incorporating different solvation models in 5D-QSAR [1,41,46].
A DL-based QSAR system, called DeepSnap-DL, was reported to capture molecular features from molecular images photographed on a 3D-chemical structure [47]. In the DeepSnap-DL system, parameters for depicting a ball-and-stick model of chemical structure influenced prediction-performances of toxicity activity of the molecular initiation event (MIE) molecule, which interacts with protein and/or DNA in an adverse outcome pathway induced by chemical compounds in human body using the Tox21 10k library, including about 10,000 (10k) chemicals, e.g., approved drugs and environmental chemicals [48][49][50]. The prediction models using the DeepSnap-DL system achieved higher performance than conventional ML techniques, such as random forest, XGBoost, LightGBM, and CatBoost [51,52]. Additionally, prediction models of MIE molecule agonist or antagonist activity were constructed using by the DeepSnap-DL system with the Tox21 10k library, suggesting this system as essential tool for novel QSAR analysis due to automatic feature extraction with numerous structural information from a 3D-chemical structure [53,54]. For high-throughput of the DeepSnap-DL system, automation in the DeepSanp-DL system has been conducted by combining each process consisting of the generation of images from a 3D-chemical structure based on the simplified molecular input line entry system (SMILES) format, DL using these images as input data, and calculation of prediction-performance indexes using TensorFlow and Keras [54]. In the modified DeepSnap-DL system, the mean values of receiver operating characteristic area under the curve (ROC_AUC) of the prediction models for 59 MIE targets in validation, test, and foldout datasets indicated 0.818 ± 0.056, 0.803 ± 0.063, and 0.792 ± 0.076, respectively [54]. Furthermore, two of the MIE targets, peroxisome proliferator activated receptor γ (PPARγ) agonist (PPARg_ago, AID:743140) and aromatase antagonist (Arom_ant, AID:743139), improved the prediction-performance by optimizing of parameters in the modified DeepSnap-DL system, such as angle in the depiction of the image from 3D-chemicals, data-split ratio with training (train), validation (valid), and test datasets, background color in an image, and learning rate (LR) and batch size (BS) in hyperparameters in DL [54].
In this study, we used the modified DeepSnap-DL with Python and basic DeepSnap-DL with DIGITS systems to construct prediction models in three of MIEs, glucocorticoid receptor (PubChem assay AID:720725_GR_ant), transforming growth factor (TGF)-beta/Smad (PubChem assay AID:1347032_TGF_beta_ant), and thyrotropin-releasing hormone receptor (PubChem assay AID:1347030_TRHR_ago), by optimizing parameters in the DeepSnap-DL system. According to the previously reported MIE molecules, agonist, or antagonist prediction models in the three MIE molecules constructed using the modified DeepSnap-DL with Python showed that it would be essential tools in a novel QSAR system in computeraided molecular design.  Table 3). Addtionally, DeepSnap_Python in the three MIE targets indicated prediction performances of loss, PR_AUC, and F as follows. The mean loss values on the train and valid datasets were 0.413 ± 0.153 for loss_train in 720725_GR_ant and 0.383 ± 0.115 for loss_valid in 720725_GR_ant, 0.247 ± 0.088 for loss_train in 1347030_TRHR_ago and 0.189 ± 0.070 for loss_valid in 1347030_TRHR_ago, 0.280 ± 0.120 for loss_train in 1347032_TGF_beta_ant and 0.316 ± 0.061 for loss_valid in 1347032_TGF_beta_ant (Figures S5-S8; Table 3). The mean PR_AUC values on the valid dataset were 0.335 ± 0.117 in 720725_GR_ant, 0.103 ± 0.041 in 1347030_TRHR_ago, and 0.315 ± 0.056 in 1347032_TGF_beta_ant (Figures S9 and S10; Table 3). The mean F values on the valid dataset were 0.853 ± 0.039 in 720725_GR_ant, 0.868 ± 0.020 in 1347030_TRHR_ago, and 0.833 ± 0.020 in 1347032_TGF_beta_ant (Figures S11 and S12; Table 3 Table 3).      Table 3). In addition, the highest prediction performance values of F on the valid dataset for the angles and datasplit ratios were 0.935 at 176 • and train:valid:test = 7:1:2 in 720725_GR_ant, 0.914 at 176 • and train:valid:test = 3:1:2 in 1347030_TRHR_ago, and 0.876 at 176 • and train:valid:test = 7:1:2 in 1347032_TGF_beta_ant (Figures S11 and S12; Table 3). In this study, we observed two performance peaks in prediction models at 176 • and 355 • of angles in DeepSnap, according to previous results [53,54]. These findings suggested that image augmentation is effectively worked. It has been reported that even though a small number of images was used, the DL can classify by increasing the number of images with the addition of artificial operations, such as movement, rotation, enlargement/reduction, and inversion to the original images [55,56]. In addition, it is known that in conformation generation using algorithms other than MMFF for the force field, the 3D structure differs significantly depending on the algorithm. Therefore, further performance improvement can be expected using other force field calculation algorithms. Further, as a result of examining the depiction condition for ball-and-stick models in the DeepSnap, it was previously reported that the performance can be improved by adjusting the bond thickness and atom color [50].

Results and Discussion
However, since the image will be similar to the original image, the risk of overfitting, i.e., a decrease in the performance on the test dataset due to the prediction model fitting to match into the training dataset, cannot be ruled [57][58][59][60][61][62][63][64][65][66][67][68]. Thus, data augmentation effectively enables learning with a small number of data. However, suppose it is required to use complex models or obtain high performance to avoid overfitting. In that case, it is important to use high-quality data with few biased features and a sufficiently large data size. There are mainly two methods of data augmentation: offline and online augmentation (also called on-the-fly augmentation), depending on the augmentation timing [69,70]. The offline augmentation is the rotation conversion added to each image in the dataset, doubled in size with the increase in the capacity because the converted image is created for each image. The online augmentation applied to mini-batch that split the dataset into multiple datasets, where the capacity of the dataset does not increase and different random images are generated if the DL is performed using multiple epochs in the same mini-batch. Additionally, this DeepSnap-python has a built-in early stopping function that can the effects of epochs as well as overfit. Therefore, the performance of this system could be more improved by combination of these functions with parameter optimization.
Furthermore, the lowest prediction-performance values of loss on the train and valid datasets for the LRs were 0.022 at 0.00003 and 0.124 at 0.00003 in 720725_GR_ant, 0.020 at 0.00002 and 0.066 at 0.0008 in 1347030_TRHR_ago, 0.038 at 0.00003 and 0.170 at 0.000021 in 1347032_TGF_beta_ant ( Figure 5, Table 4). The highest prediction-performance values of PR_AUC on the valid dataset for LRs were 0.789 at 0.00007 in 720725_GR_ant, 0.213 at 0.0042 in 1347030_TRHR_ago, 0.472 at 0.00003 in 1347032_TGF_beta_ant ( Figure 5, Table 4). Additionally, the highest prediction-performance values of F on the valid dataset for LRs were 0.942 at 0.00007 in 720725_GR_ant, 0.909 at 0.000005, 0.890 at 0.00002 in 1347032_TGF_beta_ant ( Figure 5, Table 4).
As a method often used to improve the generalization performance of DL, LR decay, meaning to lower LR in places where learning has progressed to some extent, is known to improve accuracy sharply [71]. However, their behavior changes significantly depending on datasets, network types, and optimization methods. Therefore, a function that automatically attenuates the LR decay is required when learning converges to some extent. Thus, the improved DeepSnap-DL method was added with an early stopping function to extract models with the highest performance in a series of learning processes by discontinuing learning before entering the overfitting phase; thereby shortening the learning time.
It was previously reported that BS and LR are proportional, whereas BS and momentum coefficient are inversely proportional [72]. It is considered that the learning converges to the sharp minimum as BS increases. Meanwhile, when BS is small, larger variance positively affects the performance in DL, such as regularization. However, it was shown that there was an optimal BS for LRs, suggesting that it is essential to have an appropriate BS within that LRs, instead of reducing the BS. Thus, considering the learning efficiency, it is appropriate to set the BS sufficiently large and adjust the LR.
These findings are expected to lead to drug development from the estimation and identification of new ligands for nuclear receptors.

Data
The datasets of three MIE targets, including antagonists of the glucocorticoid receptor (PubChem assay AID:720725_GR_ant), TGF-beta/Smad (PubChem assay AID: 1347032_TGF_beta_ant), and agonist of the thyrotropin-releasing hormone receptor (Pub-Chem assay AID:1347030_TRHR_ago) for the chemical structures in SMILES format and the corresponding agonist or antagonist scores defined as Pubchem_activity_scores from the Tox21 10K library in the PubChem database housing quantitative high-throughput assays to identify small molecule agonists and antagonists for MIEs, as previously reported, were downloaded [50][51][52][53][54] (Table S1a-c). After eliminating overlapping chemicals and inorganic compounds because of the presence of possible stereoisomers or salts, we defined active and inactive compounds by activity scores, which the agonist and antagonist scores ranged from 0% to 100% by normalizing each titration point relative to the positive control compound and dimethyl sulfoxide (DMSO)-only wells according to the following equation: % activity = [(Vcompound − Vdmso)/(Vpos − Vdmso)] × 100, where Vcompound, Vdmso, and Vpos denote the compound-well values, median values of the DMSO-only wells, and median values of the positive control well in the reporter gene assay, i.e., active and inactive compounds were defined by activity scores 40-100 and 0-39, respectively (Table S1a-c). The mean number of chemicals was 7601 ± 63, and the highest and lowest numbers of chemicals were 7662 in 1347030_TRHR_ago and 7539 in 720725_GR_ant, respectively (Table S1a-c and  Table 1). Further, we divided the data for the chemical compounds into two groups based on their activity scores: active and inactive chemicals. Active chemicals had an activity score ≥ 40, whereas inactive chemicals had an activity score < 40. The mean numbers and percentages of active chemicals among three MIEs were 248 ± 167 and 3.27 ± 2.20, and the highest and lowest numbers and percentages of active chemicals were, respectively, 395 and 5.19% for 1347032_TGF_beta and 67 and 0.87% for 1347030_TRHR_ago (Table 6). Data were divided into train, valid, and test datasets. The first two datasets were used for training and fine-tuning the prediction models. The final evaluation of the constructed models was performed using a foldout test dataset.

DeepSnap
We applied the SMILES format for 3D conformational import to generate the 3D chemical database with rotatable torsion and saved it as a structure data file (SDF) using molecular operating environment (MOE) 2018 scientific applications (MOLSIS Inc., Tokyo, Japan). Then, the external program, CORINA classic software (Molecular Network GmbH, Nürnberg, Germany, https://www.mn-am.com/products/corina, accessed on 25 January 2022) was used to determine a suitable form of each chemical structure. The 3D chemical structures of the compounds from SDF files were depicted as 3D ball-and-stick models with different colors corresponding to different atoms by a Jmol, an open-source Java viewer software for 3D molecular modeling of chemical structures [73][74][75]. The 3D-chemical models were automatically captured as snapshots of user-defined angle increments on the x-, y-, and z-axes saved as 256 × 256-pixel resolution PNG files (RGB) and split into three train, valid, and test datasets, as previously reported [50][51][52][53][54]. All PNG image files produced by DeepSnap were resized using NVIDIA DL GPU training system (DIGITS) version 4.0.0 software (NVIDIA, Santa Clara, CA, USA), on four-GPU systems, Tesla-V100-PCIE (31.7 GB) with 256 × 256-pixel resolution as input data, as previously reported [50][51][52]. We used a pre-trained open-source DL model, Caffe, that the network of GoogLeNet consisted of deep convolutional neural network (CNN) architectures comprised complex inspired by LeNet, on the CentOS Linux distribution 7.3.1611. At the DeepSnap-DL-DIGITS method, the prediction models were constructed by train datasets using 30 epochs in DL. Among these epochs, the lowest loss value in the valid dataset was selected for the next examination for prediction using the test dataset.
The improved DeepSnap-DL-Python system used a new 3D conformational import application, called SMILES_TO_SDF, to produce the SDF files from the SMILES format. We used PyMOL, an open-source molecular visualization system written in the Python programming language (Schrödinger, Inc., New York, NY, USA), to obtain high-quality 3D molecular modeling of chemical structures with 3D ball-and-stick models with different colors corresponding to different atoms. The 3D chemical structures can produce different images depending on the direction. They are captured automatically by DeepSnap as snapshots with user-defined angle increments with respect to the x-, y-, and z-axes as the DeepSnap-DL-DIGITS method. The snapshots, saved as 256 × 256-pixel PNG files (RGB), were divided into the train, valid, and test datasets. Additionally, the external test dataset is permanently fixed. TensorFlow and Keras on CentOS Linux 7.3.1611 with the CNN GoogLeNet were used all 2D PNG images produced by the DeepSnap-DL-Python system for training and fine-tuning the prediction models. Background colors in the images were changed to the color values in PyMOL, where a force field, which is a set of parameters for the bond lengths, angles, torsional parameters, electrostatic properties, and van der Waals interactions, uses the Merck Molecular Force Field (MMFF) [76].
Next, using the structural information for these chemicals derived from the SMILES format, the 3D chemical structure per compound with "rotatable torsions" was depicted using MOE application software program, and optimized to generate a single low energy conformation using CORINA classic software. These 3D chemical structures were saved in SDF format as a database file. Then, molecular images were generated as snapshots of the 3D structure from the SDF file using the DeepSnap method at different angles along the x, y, and z axes. The prediction models of the three MIE targets were constructed using these images of the 3D chemicals as input data for the DIGITS-based DL. Another system that is modified DeepSnap-DL by TensorFlow and Keras with Python was used. The SMILES format was used for a new 3D application, called SMILES_TO_SDF, to produce high-quality 3D molecular modeling of the chemical structures saved as a chemical database in SDF format. 2D PNG images produced from the SDF file were produced by DeepSnap, and the prediction models were constructed using these images as input data by DL with TensorFlow and Keras, called DeepSnap-DL-Python.

Evaluation of Prediction Models
We analyzed the probability of the prediction results using the prediction model with the lowest minimum loss in valid value among 30 examined echoes using the DeepSnap-DL-DIGITS method. We used the medians of each predicted value as representative values for target molecules using statistical analysis software JMP ® Pro. 14 (SAS Institute Inc., Cary, NC, USA), as previously reported [50][51][52], because the process of the DeepSnap-DL-DIGITS method calculated the probabilities for each image prepared from different angles with the x-, y-, and z-axes directions for one molecule. Classification performance was evaluated based on a confusion matrix defined by the cutoff value (θ) from the Youden's Index (YI) as follows [77][78][79]: where k is the diagnostic categories, wj ∈ (0,1).
However, the DeepSnap-DL-Python system automatically obtains the probability of prediction results with the lowest minimum loss_valid value among 30 examined epochs, which are the numbers of repeats for one training dataset modulated by early stopping. Additionally, the performance of each model was automatically calculated in terms of the metrics: ROC_AUC, precision recall_AUC (PR_AUC), balanced accuracy (BAC), F, Matthew's correlation coefficient (MCC), accuracy (Acc), and loss. These performance metrics are defined as follows. Here TP, FN, TN, and FP denote true positive, false negative, true negative, and false positive, respectively. To determine the optimal cutoff point for the definition of TP, FN, TN, and FP, we adopted a method for maximizing the sensitivity (1-specificity), called YI. This index has a value ranging from 0 to 1, where 1 represents the maximum effectiveness, and 0 represents the minimum effectiveness. Additionally, the area under the curve (AUC) for the receiver operating characteristics (ROC) is given by Wt 1 i f pj t 0 otherwise Wt = 1/2 (prec t + 1 − prec t − 1 ) prec t = (# of points i where pi t and ci = 1)/(# of points i where pi t) Here, ROC_AUC denotes AUC f, j iterates over the true points, Np is the number of true points, T is the number of thresholds, and prect is the precision at threshold t. For broader cases, let prec0 = prec1, and precT = 0 [80]. The PR curve is the plot of Recall (x) vs. Precision (y), and PR_AUC was calculated according to previous studies [53,54]. This study used N = 3 to reduce the bias, and the values are represented as averages.

Conclusions
In this study, we constructed prediction models for antagonists of the glucocorticoid receptor, TGF-beta/Smad, and agonist of the thyrotropin-releasing hormone receptor using the classic DeepSnap-DL system with DIGITS and improved DeepSnap-DL system with TensorFlow and Keras using the Tox21 10k library. We performed high-throughput and decreased computational costs using the improved DeepSnap-DL system by optimizing the parameters in DeepSnap. Consequently, we obtained that the improved DeepSnap-DL system would be a powerful advanced QSAR system on toxicological and biochemical/cheminformatic fields.