Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure–Activity Relationship System

In silico approaches have been studied intensively to assess the toxicological risk of various chemical compounds as alternatives to traditional in vivo animal tests. Among these approaches, quantitative structure–activity relationship (QSAR) analysis has the advantages that it is able to construct models to predict the biological properties of chemicals based on structural information. Previously, we reported a deep learning (DL) algorithm-based QSAR approach called DeepSnap-DL for high-performance prediction modeling of the agonist and antagonist activity of key molecules in molecular initiating events in toxicological pathways using optimized hyperparameters. In the present study, to achieve high throughput in the DeepSnap-DL system–which consists of the preparation of three-dimensional molecular structures of chemical compounds, the generation of snapshot images from the three-dimensional chemical structures, DL, and statistical calculations—we propose an improved DeepSnap-DL approach. Using this improved system, we constructed 59 prediction models for the agonist and antagonist activity of key molecules in the Tox21 10K library. The results indicate that modeling of the agonist and antagonist activity with high prediction performance and high throughput can be achieved by optimizing suitable parameters in the improved DeepSnap-DL system.


Introduction
Recently, methods for predicting toxicity risk have focused on deep learning (DL)-based models, due to their high accuracy and the use of large datasets, as alternatives to animal testing. These methods follow the principle of the 3Rs (replacement, reduction, and refinement) for the discovery of molecular initiating events (MIEs) in the adverse outcome pathway (AOP), where MIEs are the first point of chemical-biological interaction within the human body [1][2][3]. The interaction between endocrine-disrupting chemicals and nuclear-receptor family proteins can affect the endocrine system in the AOP [4]. The Toxicology in the 21st Century (Tox21) program is a US federal research collaboration between the US Environmental Protection Agency, the National Toxicology Program, the National Center for Advancing Translational Sciences, and the Food and Drug Administration that aims to develop toxicity assessment methods for commercial chemicals, pesticides, food additives/contaminants, and medical products using quantitative high-throughput screening [5][6][7]. The Tox21 10K library consists of approximately 10,000 (10K) chemicals, including nearly 100 million data points obtained by in vitro quantitative high-throughput screening, indicating the toxicological risk of chemical compounds obtained using an in silico approach [8,9]. Using Tox21 data, the quantitative structure-activity relationship (QSAR) competition organized by Merck-referred to as the Tox21 Data Challenge 2014-demonstrated high-performance prediction modeling using machine learning (ML) [10]. However, ML has limitations if the quality and quantity of the data are insufficient [3,11,12].
Recently, a QSAR approach using DL has demonstrated improvements in terms of time and cost for toxicity prediction using complex nonlinear models of chemical compounds as an in silico approach that can be used as an alternative to animal testing [13][14][15]. In this approach, DL with a specific deep-neural-network architecture that has many layers of hidden neurons with backpropagation and stochastic gradient descent can automatically extract high-dimensional features and heterogeneous chemical data to avoid overfitting without using descriptors [16][17][18][19][20][21][22].
As a recent state-of-the-art method, graph convolutional neural networks-which use a graphical representation of molecules as a more natural framework, embedding nodes in a graph into Euclidean space-have been applied to chemical-network predictions using two-dimensional (2D) molecular graphs without chemical descriptors [16,[23][24][25]. Unlike conventional methods that use descriptors, an elementary portrait of each molecule using a graphical representation with each node and edge of the graph representing an atom and bond, respectively, can avoid overfitting in ML algorithms with an excessive number of descriptors [26][27][28]. Although this approach has higher performance than conventional methods, there are several problems. One problem is that the representation of all nodes converges to the same value, along with repetitions of the graph convolution operation when the number of layers in the graph convolutional neural network is increased, resulting in reduced performance [29][30][31]. Another problem is that label information cannot be propagated to the entire graph when there is limited label information, resulting in low performance [32][33][34].
A deep feedforward neural network, which calculates the weights and biases of many hidden layers and uses a bounded number of hidden neurons with precalculated molecular descriptors as inputs, has become popular because it can approximate any continuous function on a compact subset of R n ; this is called the universal approximation theorem [3,20,35]. Although the learning speed of this neural network is higher due to the calculation of a gradient for a randomly selected training dataset and the sequential update of the parameters, it has the drawback of imprecision: a small neural network can lead to underfitting, while a large neural network can lead to overfitting [20,36].
A DL-based QSAR approach-which can capture detailed information from threedimensional (3D) molecular structures by including their rotation angles to establish a systematic method of inputting molecular structures-has also been reported [37]. In a previous study, we reported 35 prediction models for agonists and antagonists of nuclear receptors as MIEs for chemical compounds derived from the Tox21 10K library using an approach called DeepSnap-DL. Of these prediction models, three models exhibited higher performance than the best-performing models in the Tox21 Data Challenge 2014 [38]. However, DeepSnap-DL is a complex system that consists of four steps: the preparation of a 3D molecular structure from one-dimensional chemical information, generation of a snapshot image of the 3D chemical structure, DL, and statistical calculations.
In the current study, we present an improved DeepSnap-DL approach that is a onestep system that sequentially executes DeepSnap, which is a process that depicts 3D ball-and stick models with different colors representing different atoms and finally captures snapshots automatically in selected angle increments on the x, y, and z axes and saves 256 × 256-pixel resolution PNG files with RGB colors into three types of datasets: the training (train), validation (valid), and test datasets. The DL is improved using TensorFlow, a free and open-source software library for ML. It is particularly useful for the training of and inference from deep neural networks that can run on multiple central processing units and graphics processing units (GPUs). It employs optional Compute Unified Device Architecture and a higher-level programming model to improve programming productivity on various hardware accelerators. It also includes SYCL, which is a royalty-free, higher-level programming model to improve programming productivity on various hardware accelerators and extensions for general-purpose computing on GPUs and employs Keras, an open-source software library that provides a Python interface for artificial neural networks and acts as an interface for the TensorFlow library as well as for high-throughput statistical calculations. We constructed a total of 59 prediction models using the improved DeepSnap-DL system and the Tox21 10K library, and the results indicate that the improved DeepSnap-DL approach has higher throughput than the previous DeepSnap-DL approach. Furthermore, the models constructed using the improved DeepSnap-DL system can achieve high prediction performance by optimizing the parameter choices.

Prediction Models for 59 MIEs
We have previously reported a DL-based QSAR approach, called the DeepSnap-DLsystem, which uses the Deep Learning GPU Training System (DIGITS) [37][38][39][40][41]. In the current study, we present an improved DeepSnap-DL system that uses TensorFlow and Keras. In this system, the flow of image generation and learning is regarded as one unit that can be executed sequentially and automatically to obtain high throughput and high performance ( Figure S1).
To construct prediction models of MIEs for in vivo toxicity responses that affect health outcomes in humans, we downloaded information about the chemical structures and agonist and antagonist activity for a total of 59 MIEs from the Tox21 10K library. The mean number of chemicals in the 59 MIEs was 9699 ± 702, and the highest and lowest numbers of chemicals were, respectively, 12,581 (for the peroxisome proliferator-activated receptor γ antagonist: PPARg_ant, AID: 743199, where AID is the agonist/antagonist identification number, and the antioxidant response element signaling pathway agonist: ARE_ago, AID: 743219) and 8434 (the caspase-2 agonist: Casp2_ago, AID: 1347037) ( Figure S2a). Furthermore, we divided the data for these chemical compounds into two groups based on their activity scores: active chemicals had an activity score ≥ 40, while inactive chemicals had an activity score < 40. The mean percentage of active chemicals among all chemicals was 4.79% ± 3.94%, and the highest and lowest percentages of active chemicals were 21.6% (for the pregnane X receptor agonist: PXR_ago, AID: 1347033) and 0.08% (the transforming growth factor-β agonist: TGFb_ago, AID: 1347035), respectively ( Figure S2b). The data were divided into training, validation, test, and foldout datasets, with the first three datasets used for training and fine-tuning the prediction models. The final evaluation of the constructed models was performed using the foldout dataset.
Next, using structural information for these chemicals derived from the simplified molecular-input line-entry system (SMILES) format, the SMILES_TO_SDF (Structure Data File) program, which performs conformational import from the SMILES format, produced 3D chemical conformation structures. Molecular images were then generated as snapshots of the 3D structure using the improved DeepSnap-DL method at six angles along the x, y, and z axes. This produced the following different numbers of images: ( (27 images). Using these molecular images as input data for DL, we constructed a total of 59 prediction models for agonists or antagonists of the MIEs and fine-tuned them using the modified DeepSnap-DL system and the training, validation, and test datasets. Finally, we selected the models with the highest performance out of the six angles and evaluated them using the foldout dataset. Among the 59 prediction models, one model (the nuclear factor-kappa B agonist: NFkB_ago, AID: 1159518) did not demonstrate predictive ability (ROC_AUC_valid = 0.5, ROC_AUC_test = 0.5, ROC_AUC_foldout = 0.5, where ROC_AUC_valid, ROC_AUC_test, and ROC_AUC_foldout refer to the area under the corresponding receiver operating characteristic curve in the validation, test, and foldout datasets, respectively). The dataset used in the NFkB_ago prediction model was highly imbalanced. This suggests that the performance of the prediction model may depend on the balance of the input data, as previously reported [42,43].
In a previous study, we reported the construction of prediction models for 35 agonist and antagonist allosteric modulators of nuclear receptors for chemicals using the DeepSnap-DL system with DIGITS [38]. That study reported that the mean ROC_AUC and BAC were 0.884 ± 0.017 and 0.8471 ± 0.017, respectively [38]. The results of the present study indicate that the prediction performance of DeepSnap-DL with TensorFlow and Keras is lower than that of DeepSnap-DL with DIGITS. However, DeepSnap-DL with TensorFlow and Keras has the potential for improvement through detailed examinations using different angles.  Table S1).      Table S2).   Figure S7-S10, right; Table S2).

Learning Rate and Batch Size in DeepSnap-DL
To assess the effect of the hyperparameters in the DeepSnap-DL system on the prediction performance, we optimized 11 learning rates (LRs) from 0.001 to 0.00001 of PPARg_ago using the validation and test datasets (Figure 10 Table S3).

Discussion
In this study, we analyzed the effects of angles on the generation of images using two DeepSnap-DL systems, one with TensorFlow and Keras and the other with DIGITS. In each case, we used two MIEs, PPARg_ago (AID:743140) and Arom_ant (AID:743139). In the prediction models constructed using DeepSnap-DL with DIGITS, the prediction performance increased as the number of angles decreased in DeepSnap, which can use various angles, resulting in a larger number of images generated by DeepSnap. By contrast, we observed two peaks-around two angles of 180 • and 360 • -in the prediction performance of the models constructed using TensorFlow and Keras. In addition, the prediction performance of DeepSnap-DL using TensorFlow and Keras decreased markedly as the number of angles used in DeepSnap decreased.
It has recently been reported that convolutional neural networks (CNNs) are effective in dealing with the movement of objects, but they are not suitable for viewpoint conversions such as rotation and scaling because they cannot understand the relationships between parts within the image [44][45][46][47][48][49]. Judging from these reports, there may be angles in DeepSnap for which the relative positional relationships of the features extracted with each angle cannot be understood. Therefore, clarifying appropriate depiction angles in DeepSnap may be an important factor for practical use. In addition, there are still some problems with the extraction of features using CNNs, that is, "a black box problem", whose problem makes it difficult to identify the region in the image extracted automatically as a feature by CNNs. Therefore, in order to estimate the area in the image extracted as a feature by the CNNs in this DeepSnap-DL system, the gradient with respect to the predicted value is weighted using Gradient-weighted Class Activation Mapping (GRAD-CAM) for the visualization of important pixels [50]. As a result, the chemical structure part within the image generated by DeepSnap was detected as the feature by the GRAD-CAM (data not shown). Thus, this finding suggests that the DeepSnap-DL system recognized and classified based on the stereochemical structure in the image used as input data.
In addition, we used a loss-evaluation index to examine the contributions of the LRs and BSs to the prediction performance of DeepSnap-DL using TensorFlow and Keras, as the loss-evaluation index represents the difference between the output of the neural network and the data from the training, validation, and test datasets of the two MIEs, PPARg_ago (AID:743140) and Arom_ant (AID:743139). During training on these datasets, the goal of the neural network was to reduce the loss value on the training dataset. Overfitting-which is the deviation of the loss values among the training, validation, and test datasets during learning-can be prevented by adjusting the parameters, such as the LRs and BSs. In this study, we demonstrated that optimization of these parameters can prevent discrepancies between the prediction performance on the training, validation, and test datasets.
In addition, we investigated the effects of the background colors of the images generated by DeepSnap-DL using two MIEs, PPARg_ago (AID:743140) and Arom_ant (AID:743139). Previously, we reported that using black and white as background colors led to a slightly lower performance of the prediction models constructed using the DeepSnap-DL DIGITS system, as compared with four other background colors (i.e., red, yellow, green, and blue) [41]. In the present study, the prediction model for PPARg_ago (AID:743140) constructed by DeepSnap-DL using TensorFlow and Keras exhibited higher performance for three background colors (i.e., wheat, white, and yellow) than for six other background colors (i.e., blue, cyan, green, magenta, orange, and red). Furthermore, the prediction model for Arom_ant (AID:743139) constructed by DeepSnap-DL using TensorFlow and Keras exhibited higher performance for three background colors (i.e., wheat, white, and orange) than for six other background colors (i.e., blue, cyan, green, magenta, yellow, and red). These differences in the effect of the background colors on the prediction performance of DeepSnap-DL using TensorFlow and Keras and of the DeepSnap-DL DIGITS system may be due to the different background colors used in the pre-training. Our results suggest that the improved DeepSnap-DL system has the potential to improve the prediction performance further by optimizing the aforementioned parameters. In the future, it would be a useful chemoinfomatics tool when the improved system is shaped into an internet server or computer program.

Data
We downloaded information related to the input data-including the chemical structures in SMILES format and the corresponding agonist or antagonist scores, defined as Pubchem_activity_scores-from the Tox21 10K library in the PubChem database, as previously reported [38][39][40][41] (Table S11). After eliminating overlapping chemicals, we determined the agonist and antagonist scores in the range from 0% to 100% by normalizing each titration point relative to the positive control compound and dimethyl sulfoxide (DMSO)-only wells according to the following equation: % activity = [(Vcompound − Vdmso)/(Vpos − Vdmso)] × 100, where Vcompound, Vdmso, and Vpos denote the compound-well values, the median values of the DMSO-only wells, and the median value of the positive control well in the reporter gene assay, respectively. The scores were then grouped into three classes, namely (i) 0, (ii) 1-39, and (iii) 40-100, which represent inactive, inconclusive, and active compounds, respectively. In this study, compounds with activity scores of 40-100 and 0-39 were defined as active and inactive, respectively, in all datasets. We downloaded information from the bioassay on 59 MIEs from the Tox21 10K library in PubChem [51] (Tables S11-S14).

DeepSnap
We applied the SMILES format for 3D conformational import using the SMILES_TO _SDF program, which built 3D chemical structure from SMILES strings containing the steric conformation information of chemical structures to generate a chemical database saved in SDF format in the DeepSnap-DL system, as previously reported [37][38][39][40][41]. Using the SDF files prepared by SMILES_TO_SDF, 3D chemical structures were depicted as 3D ball-andstick models with different colors corresponding to different atoms. We used PyMOL, an open-source molecular visualization system written in the Python programming language (Schrödinger, Inc., New York, NY, US) to obtain high-quality 3D molecular modeling of the chemical structures. The 3D chemical structures can produce different images depending on the direction, and they are captured automatically by DeepSnap as snapshots with user-defined angle increments with respect to the x, y, and z axes. The snapshots, saved as 256 × 256-pixel PNG files (RGB), were divided into three datasets: training, validation, and test. Additionally, the external test dataset is permanently fixed. All two-dimensional PNG images produced by the modified DeepSnap system were used for training and fine-tuning the prediction models using TensorFlow and Keras on CentOS Linux 7.3.1611 with the convolutional neural network GoogLeNet. Background colors in the images were changed to the color values in PyMOL [52].

Evaluation of Model Quality
The DeepSnap-DL system automatically obtains the probability of prediction results with the lowest minimum loss_valid value among 30 examined epochs, which are the numbers of repeats for one training dataset modulated by early stopping. In addition, the performance of each model was automatically calculated in terms of the metrics ROC_AUC, PR_AUC, BAC, F, MCC, Acc, and loss. These performance metrics are defined as follows, where TP, FN, TN, and FP denote true positive, false negative, true negative, and false positive, respectively. Here, ROC_AUC denotes the area under the curve f, j iterates over the true points, Np is the number of true points, T is the number of thresholds, and prect is the precision at threshold t. For broader cases, let prec0 = prec1, and precT = 0 [53].
The PR curve is plot of Recall (x) vs. Precision (y), and PR_AUC was calculated as reported previously [54]. This study used N = 3 to reduce the bias, and the values are represented as averages.

Conclusion
In this study, we used the Tox21 10K library to construct 59 prediction models for agonists and antagonists of MIEs in the AOP using an improved DeepSnap-DL system with TensorFlow and Keras, and we demonstrated that the prediction performance can be improved by optimizing various parameters. In addition, we improved the throughput of the previously reported DeepSnap-DL system by combining several steps into one. Thus, the improved DeepSnap-DL system may be a useful tool for in silico prediction of chemical compounds with high-performance and throughput in the AOP.

Institutional Review Board Statement:
The study was conducted according to internationally and locally accepted ethical guidelines.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
We are applying all data requirements.