Interspecies Quantitative Structure-Toxicity-Toxicity Relationships for Predicting the Acute Toxicity of Organophosphorous Compounds †

: Median lethal concentration values are commonly used to express the relative risk related to the acute toxicity of chemicals. In this paper, we considered rat and mouse acute toxicity (LD 50 ) data of organophosphorous compounds (OPs) with diverse structures. Interspecies QSTTR (quantitative structure-toxicity–toxicity relationships) models were developed to predict the mouse oral acute toxicity using the multiple linear regression (MLR) approach. Descriptors were calculated from the OPs structures optimized by molecular mechanics calculations. Model validation was performed using several statistical parameters. The results suggest the suitability of the developed QSTTR models to reliably predict the acute toxicity of OPs.


Introduction
Organophosphate compounds (OPs) are commonly used as pesticides and were developed as nerve gases for chemical wars [1][2][3]. OPs have been utilized as insecticides, helminthicides, ascaricides, nematocides, and to a lesser degree as fungicides and herbicides for several decades. Despite their worldwide application as crop protection agents, their wide usage has led to many intoxications of nontarget species, including human death. The inhibition of the enzyme acetylcholinesterase is usually the cause of the OPs acute mammalian toxicity [4]. In addition, other OP life-threatening toxicities have been observed, which are not always related to the acetylcholinesterase inhibition.
The oral acute toxicity assessment is very important because the oral route is a very common, convenient, safe, and inexpensive route of drug administration [5]. The importance of predicting rat acute oral toxicity is closely related to the knowledge of biological activity and mechanism of a potential drug, as well as its hazard identification and risk management [6]. This toxicity is often measured using the 50% lethal dose (LD 50 ), the amount of chemical that is expected to cause death in 50% of treated animals in a period of time. These expensive and time-consuming studies use large numbers of animals.
Information about toxicity to multiple species is important to assess the threat, and for the protection of ecological populations. When chemicals cause toxicity in a different genus of living organisms following a similar mechanistic path, there might be a correlation existing between the toxicities of these organisms [7]. Because such data is available for a limited number of species, to address these data gaps in species, alternative methods such as in silico models have been accepted to determine the acute toxicity.
Quantitative Structure-Activity/Toxicity Relationships (QSAR/QSTR) correlate the activity/toxicity of chemicals to their physicochemical properties and structural descriptors.
They may reduce or even replace the need for animal testing and are most powerful when applied in a mechanistic hypothesis [8]. It is considered that as acute toxicity (LD 50 ) is related to whole body information it will be difficult to model it and may require knowledge on metabolism, bioaccumulation, excretion, etc. In addition, all data must be reliable, preferably obtained for the same sex and species [9].
To reduce the in vivo use of animals in toxicology, substitute species are useful in the risk assessment of chemicals [10]. They are based on results obtained using direct or indirect relationships from different toxicity tests [11,12].
Interspecies quantitative structure-toxicity-toxicity (QSTTR) modeling allows the prediction of toxicity to several other species using the experimental toxicity values to one species [13]. This type of modeling can thus promote a reduction in the use of higher organisms and understanding of the mechanism of toxic action.
The interspecies QSTTRs extrapolate the data for one toxicity endpoint to those for another toxicity endpoint and can be used to determine the species-specific toxicity of a chemical [10,[13][14][15].
Using the underlying principle of taxonomic relationship, the development of predictive quantitative structure-toxicity-toxicity relationship (QSTTR) models allows predicting the toxicity of chemicals to a particular species using available experimental toxicity data towards a different species. Such studies may employ, along with the available experimental toxicity data to a species, molecular features and physicochemical properties of chemicals as independent variables for prediction of the toxicity profile against another closely related species [16].
In this paper, we considered experimental rat and mouse acute toxicity data (LD 50 values) of a series of 76 organophosphorous compounds (OPs) with diverse structures (Table 1). Interspecies QSTTR models were developed to predict the oral acute toxicity to a particular species using available experimental data towards a different species. The multiple linear regression approach was applied to extrapolate the known toxicity of chemicals of interest to species missing toxicity data. OP structures were optimized employing molecular mechanics calculations using the MMFF94s force field. Structural parameters were calculated based on the optimized structures. The mouse acute toxicity data of OPs was related to the rat acute toxicity using the multiple linear regression (MLR) approach. Additional descriptors improved the fitting quality of the MLR models. Model validation was performed using several statistical parameters to test the model predictive power. The results suggest the suitability of the developed QSTTR models to reliably predict the acute toxicity of organophosphorous chemicals. Quantitative Structure-Activity/Toxicity Relationships (QSAR/QSTR) correlate the activity/toxicity of chemicals to their physicochemical properties and structural descriptors. They may reduce or even replace the need for animal testing and are most powerful when applied in a mechanistic hypothesis [8]. It is considered that as acute toxicity (LD50) is related to whole body information it will be difficult to model it and may require knowledge on metabolism, bioaccumulation, excretion, etc. In addition, all data must be reliable, preferably obtained for the same sex and species [9].
To reduce the in vivo use of animals in toxicology, substitute species are useful in the risk assessment of chemicals [10]. They are based on results obtained using direct or indirect relationships from different toxicity tests [11,12].
Interspecies quantitative structure-toxicity-toxicity (QSTTR) modeling allows the prediction of toxicity to several other species using the experimental toxicity values to one species [13]. This type of modeling can thus promote a reduction in the use of higher organisms and understanding of the mechanism of toxic action.
The interspecies QSTTRs extrapolate the data for one toxicity endpoint to those for another toxicity endpoint and can be used to determine the species-specific toxicity of a chemical [10,[13][14][15].
Using the underlying principle of taxonomic relationship, the development of predictive quantitative structure-toxicity-toxicity relationship (QSTTR) models allows predicting the toxicity of chemicals to a particular species using available experimental toxicity data towards a different species. Such studies may employ, along with the available experimental toxicity data to a species, molecular features and physicochemical properties of chemicals as independent variables for prediction of the toxicity profile against another closely related species [16].
In this paper, we considered experimental rat and mouse acute toxicity data (LD50 values) of a series of 76 organophosphorous compounds (OPs) with diverse structures (Table 1). Interspecies QSTTR models were developed to predict the oral acute toxicity to a particular species using available experimental data towards a different species. The multiple linear regression approach was applied to extrapolate the known toxicity of chemicals of interest to species missing toxicity data. OP structures were optimized employing molecular mechanics calculations using the MMFF94s force field. Structural parameters were calculated based on the optimized structures. The mouse acute toxicity data of OPs was related to the rat acute toxicity using the multiple linear regression (MLR) approach. Additional descriptors improved the fitting quality of the MLR models. Model validation was performed using several statistical parameters to test the model predictive power. The results suggest the suitability of the developed QSTTR models to reliably predict the acute toxicity of organophosphorous chemicals.

Definition of Target Property and Structural Descriptors
The experimental mouse, respectively rat oral acute toxicity (LD 50 ) (mg/kg body weight), molar converted to pLD 50 values, were taken from the ChemIDplus web search system (https://chem.nlm.nih.gov/chemidplus/, accessed on 28 February 2021) and were considered as the dependent, respectively independent variables for 76 organophosphorus compounds ( Table 1).
The OP structures were pre-optimized using the MMFF94 molecular mechanics force field included in the Omega (Omega v.2.5.1.4, OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com, accessed on 28 June 2021) software [17,18] after curation of salts. Following parameters were used during the conformer ensemble generation: the maximum number of conformers per compound set of 400 and an RMSD value of 0.5 Å.
Structural parameters were further calculated using the minimum energy conformers by the DRAGON (Dragon Professional 5. An external set of 5 chemicals without experimental oral mouse acute toxicity data (Table 1) were collected from the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/, accessed on 28 August 2021). These compounds were chosen based on their structural similarity with the lowest toxic organophosphorous compound included in the above series of 76 OPs.

Multiple Linear Regression Approach and Model Validation
The multiple linear regression (MLR) approach [19] was employed to relate the mouse oral pLD 50 values with the rat oral pLD 50 values and calculated structural descriptors, using the QSARINS v. 2.2.4 program [20,21]. The genetic algorithm with leave-one-out crossvalidation correlation coefficient was used for variable selection, as constrained function to be optimized, a mutation rate of 20%, the population size of 10 and 500 iterations.
For internal validation several measures of robustness were employed: Y-scrambling [22], adjusted correlation coefficient (r 2 adj ), and q 2 (leave-one-out, q 2 LMO , and leave-more-out, q 2 LMO ) cross-validation coefficient. In the Y-scrambling test, the dependent variable is arbitrarily mixed and a model is built using the same X matrix of molecular descriptors. The obtained MLR models (after 2000 randomizations) must have minimal r 2 (correlation coefficient) and q 2 (cross-validation coefficient) values [23].
The model overfit was checked using the Y-randomization test [23] and by comparing the root-mean-square errors (RMSE) and the mean absolute error (MAE) of the training and validation sets [24].
The applicability domain was checked using the Williams plot (hat diagonal values versus standardized residuals) for the training and prediction chemicals to find out the outliers and leverage compounds and the Insubria graph for chemicals without experimental data [25].
The Multi-Criteria Decision Making (MCDM) validation criterion [20,32] is used to summarize the performance of MLR models. To every validation criteria, a desirability function is associated, and MCDM has values between 0 (the worst) and 1 (the best).

Results and Discussion
The autoscaling method was employed for normalizing the data: where for each variable m, XT mj and X mj are the j values for the m variable after and before scaling, respectively, X m is the mean, and S m is the standard deviation of the variable. The variables contained in the MLR models were selected using the genetic algorithm. The statistical (fitting and predictivity) results are included in Tables 2-4. Two compounds (18 and 52) were detected as outliers, having standardized residual values greater than 2.5 standard deviation units, and were not included in the final MLR models.    The 'MCDM all' scores, based on the fitting, cross-validated and external criteria were considered for choosing the best MLR models.
For the reliability of the best MLR1 model, the experimental versus predicted pLD50 values, and Y-scramble plots are presented in Figures 1 and 2, respectively. In the y-scrambling test performed for the MLR models, a significantly low scrambled r 2 ( 2 scr r ) and cross-validated q 2 ( 2 scr q ) values were obtained for 2000 trials. Figure 2 shows that in the case of all the randomized models, the values of   In the y-scrambling test performed for the MLR models, a significantly low scrambled r 2 (r 2 scr ) and cross-validated q 2 (q 2 scr ) values were obtained for 2000 trials. Figure 2 shows that in the case of all the randomized models, the values of r 2 scr and q 2 scr for the MLR1 model were <0.5 (r 2 scr /q 2 scr of 0.072/−0.119). The low calculated r 2 scr and q 2 scr values indicate no chance correlation for all MLR chosen models ( Table 2).
The Williams plot (standardized residuals versus leverages, with the leverage threshold h* = 0.263 for the MLR1 model), in the range of ±2.5σ, was used to verify the domain applicability. All compounds in the dataset are within the applicability domain of the MLR1 model, as presented in Figure 3. The Williams plot (standardized residuals versus leverages, with the leverage threshold h* = 0.263 for the MLR1 model), in the range of ±2.5σ, was used to verify the domain applicability. All compounds in the dataset are within the applicability domain of the MLR1 model, as presented in Figure 3. The selected descriptors included in the MLR1 best model are not intercorrelated, as presented in the correlation matrix from Table 5.  The selected descriptors included in the MLR1 best model are not intercorrelated, as presented in the correlation matrix from Table 5. Good correlations with the acute toxicity and predictive model power were notices for all MLR models. Closer values of the root-mean-square errors (RMSE) and the mean absolute error (MAE) of the training and validation sets were observed for the MLR2, MLR3, and MLR4 models. MLR1 model was considered being the best one according to several other statistical parameters of fitting and the 'MCDM all' score values.
The best MLR1 model has three descriptors: two 3D-MorSE descriptors (Mor06m, which represents 3D-MoRSE-signal 06/weighted by atomic masses and Mor26m, which represents 3D-MoRSE-signal 26/weighted by atomic masses); and one molecular property: TPSA(NO), which represents the topological polar surface area using N, O polar contributions. The increase of the Mor06m descriptor values would lead to lower acute toxicity. Higher values of Mor26m and TPSA(NO) descriptor values raise the OP toxicity.
To predict the mouse oral acute toxicity for OP chemicals without experimental data the best MLR1 model was applied to five external test compounds, found in the PubChem database, based on their structural similarity with the lowest known experimental OP mouse oral acute toxicity data of the 76 OPs.
The Insubria plot (Figure 4) of the predicted pLD 50 versus hat values indicates that the five external set compounds are included in the applicability domain of the set of 76 OP compounds. The lowest predicted acute toxicity pLD 50 values of the external set compounds 77 and 78 were confirmed by all four MLR models (Tables 2-4). These compounds contain a thiophosphonate, respectively thiphosphate group attached to the 2,4,5-trichlorophenyl moiety. Their predicted LD 50 values of 767.8 mg/kg, respectively 519.3 mg/kg, obtained by the MLR1 model, indicate a low oral mouse acute toxicity.
The Insubria plot (Figure 4) of the predicted pLD50 versus hat values indicates that the five external set compounds are included in the applicability domain of the set of 76 OP compounds. The lowest predicted acute toxicity pLD50 values of the external set compounds 77 and 78 were confirmed by all four MLR models (Tables 2-4). These compounds contain a thiophosphonate, respectively thiphosphate group attached to the 2,4,5-trichlorophenyl moiety. Their predicted LD50 values of 767.8 mg/kg, respectively 519.3 mg/kg, obtained by the MLR1 model, indicate a low oral mouse acute toxicity.

Conclusions
Interspecies quantitative structure-toxicity-toxicity relationships were developed using the multiple linear regression approach to model the oral mouse acute toxicity of a series of organophosphorous compounds. The OP structures were modeled using the MMFF94s force field. The experimental mouse oral acute toxicity data of OPs was related to the rat oral acute toxicity using the multiple linear regression (MLR) approach. Additionally calculated descriptors of the minimum conformers improved the fitting quality of the MLR models. Good correlations and predictive models were obtained. Molecular properties and 3D-MorSE descriptors included in the best MLR model can be used for the

Conclusions
Interspecies quantitative structure-toxicity-toxicity relationships were developed using the multiple linear regression approach to model the oral mouse acute toxicity of a series of organophosphorous compounds. The OP structures were modeled using the MMFF94s force field. The experimental mouse oral acute toxicity data of OPs was related to the rat oral acute toxicity using the multiple linear regression (MLR) approach. Additionally calculated descriptors of the minimum conformers improved the fitting quality of the MLR models. Good correlations and predictive models were obtained. Molecular properties and 3D-MorSE descriptors included in the best MLR model can be used for the prediction of missing mouse oral acute toxicity data, saving experimental time and money. Two OPs with known structure (which include three chlorine atoms attached to a phenyl group and a thiophosphonate/thiophosphate group), without mouse toxicity data, were found to have potential low oral acute toxicity for this species.