Evaluation of Organophosphate Pesticide Residues in Food Using the Partial Least Squares Method

: Organophosphorus (OP) chemicals were broadly used as insecticides and in the treatment of human diseases such as malaria mosquitoes, parasitosis, myasthenia, and glaucoma. The OP toxicity is well known. They can cause environmental and health problems and have the possibility to accumulate in the food chain. The acceptable daily intake (ADI) can be considered as a measure of the effect of pesticide residues in food on human health. In this paper, the partial least squares (PLS) approach is used to evaluate the ADIs (expressed as pADIs) of a series of 46 structurally diverse OPs. OP structures were pre-optimized using the MMFF94s force field, and structural descriptors were calculated for the minimum energy conformers. This dataset was divided into 26 training compounds, and 20 pesticides were included in the prediction set. Several criteria to check the model robustness, overfitting, and the potential outliers in the X and Y space were employed. The PLS results indicated that new experimental toxicological data would be needed for five out of the 46 OPs, to improve their known ADI values, for qualitative and quantitative dietary long-term risk assessments.


Introduction
Pesticides are generally used to prevent and control insects, pests, and diseases in the field crops, such as animal and bird repellents, food storage protectants, mold-killing substances, antifouling products, soil sterilants, and wood preservatives [1,2]. Initially, the main use of pesticides was to diminish pest attack. Simultaneously, increased use of chemical pesticides has resulted in pollution of the environment and also caused many long-term changes in society. Pesticides are necessary to the farmer in his fight against plant pests and diseases. Today, it is anticipated that as much as 45% of the world's crops are damaged by plant pests and diseases. Thus, it is important to employ pesticides to protect the crops, both during their growth and their later storage and transport. However, the arbitrary and incautious use of pesticides generated extensive contamination in the food chain.
The organophosphorus pesticides (OPs) were introduced as replacements for the organochlorine pesticides, after the tendency of DDT and its metabolites to bioaccumulate in ecosystems and cause adverse health effects, particularly in top predators, led to the legal prohibition or restraint of their use in the 1970s [3]. As a result of the increased use of OPs, even though they originally were considered to be less dangerous to the environment due to their low persistence, different ecotoxicological problems appeared related to their high acute toxicity. The unreasonable use of organophosphate pesticides can generate environmental pollution problems due to their stability, high toxicity, and capacity to accumulate in the food chain [4].
The organophosphorus insecticides have a common mechanism in the inhibition of acetylcholinesterase enzyme. Their relative potential toxicity in humans, rodents, and insects differ in their biotransformation and accumulation among these species [5]. The binding of OPs to carboxylesterases, cholinesterases, and other targets, which have been identified as receptors and enzymes involved in the hydrolysis of endobiotics, plays a key role in limiting the binding of OP compounds to acetylcholinesterase (AChE) [6]. Phosphorylation of AChE, which hydrolyzes acetylcholine and thus finishes its neurotransmitter activity, is the principal mechanism of OP toxicity in mammals, insects, and nematodes, with 70% to 90% inhibition usually proving lethal.
The risk assessment of chemicals is usually divided into similar but separate practices, depending on whether the evaluated chemical causes cancer (is a carcinogen) or not (is non-carcinogen) [7]. The major difference in the calculations of carcinogenic and noncarcinogenic risks involves the method by which risks from low level exposures are determined (Winter, 1992). For non-carcinogenic effects, it is assumed that a toxicity threshold exists, and exposures at levels below this threshold should not cause any effects. This measured quantity is identified as the no-observed-adverse-effect level (NOAEL). The existence of a NOAEL suggests that a toxicity threshold exists, and this concept of a threshold provides the basis for non-carcinogenic risk assessment [8].
The Joint FAO/WHO Expert Committee on Food Additives (JECFA) proposed, for the first time in 1958, the 'acceptable daily intake' (ADI) concept to assess the pesticide residue in food [9] with irrelevant modifications in 1962 [10,11], 1974, and 1987 [12]. Later, hundreds of food additives and pesticide residues were evaluated and reevaluated by these two international expert groups [13]. The ADIs, used nationally and internationally in the development of food standards, have proved adequate in allowing the careful use of agrochemicals and in protecting the health of the consumer [14].
ADI represents an estimate of the amount of a food additive, expressed on a bodyweight basis that can be ingested daily over a lifetime without significant risk to health [11]. The World Health Organization (WHO) and the United States Environmental Protection Agency (U.S. EPA) have determined an ADI for an actual risk management decision in the regulatory process of pesticide safety standards.
The determination of acceptable daily intake (ADI) for the toxicological assessment implies collection of all significant data, establishing the no-effect level using the most sensitive indicator of the toxicity, and applying an appropriate safety factor for humans [13]. The ADI is determined based on known data at one time. Therefore, it is impossible to be certain about the safety of a chemical, and the ADI may be revised for the new toxicological data.
ADI (considered as health-based control) values of some pesticides were modeled previously by Kim [14] using the multiple linear regression (MLR) approach. He concluded that a robust QSAR approach would be helpful for identifying significant information about the uncertainty of ADI values, as preliminary human health risk assessment for certain pesticides.
This paper presents the application of the partial least squares (PLS) method to evaluate the accessible daily intake (pADIs) values of a series of 46 diverse organophosphorus pesticides (http://www.inchem.org/pages/pims.html (accessed on 6 April 2020)) based on their molecular structure. Molecular mechanics calculations based on the MMFF94s force field were employed to model the pesticide structures. Structural features were computed from the minimum energy structures and were related to the pADI values. Several criteria were checked to establish the model robustness and outliers in the X and Y space.

Definition of Target Property and Structural Descriptors
The pesticide residues in food expressed as the acceptable daily intake (ADI) (mg/kg bodyweight), molar converted to pADI (http://www.inchem.org/pages/pims.html (accessed on 6 April 2020)), was considered as the dependent variable for 46 organophosphorus pesticides (Table 1). The OP structures were pre-optimized using the MMFF94 molecular mechanics force field included in the Omega (Omega v.2.5.1.4, OpenEye Scientific Software, Santa Fe, NM, USA. http://www.eyesopen.com (accessed on 20 April 2020)) software [15,16]. For conformer generation, the maximum number of conformers per compound set of 400 and an RMSD value of 0.5 Å were used during the conformer ensemble generation.

Partial Least Squares (PLS) Method
The partial least squares (PLS) approach [17] was employed to relate the pADI values to the calculated OP structural descriptors, using the SIMCA (SIMCA P+12 12.0.0.0 2008, Umetrics, Sweden, www.umetrics.com (accessed on 27 April 2020)) program. Stable, correct, and highly predictive models can be obtained by the PLS approach. The model quality was verified using the squared correlation regression coefficient R 2 (CUM) and the squared cross-validated correlation coefficient, Q 2 (CUM). The variables importance in the projection (VIP) values and the sign of the variables' coefficients were used to explain the descriptor influence on the pADIs. The leave-7-out cross-validation procedure was employed to select the most significant principal components and to check the internal model validation.
The Y-randomization test was employed to test the model robustness and overfitting. In this procedure, the Y-variable is randomly shuffled using the same structural descriptors. The obtained PLS models (after 999 randomizations) must have minimal r 2 and q 2 values [18].
Several criteria to check the potential outliers in the X and Y space were employed in the training and prediction sets: the score scatter plot, at the significance level of 0.05; the distance to the model in X space (for the selected dimension), for the observations used to fit the model (DmodX, with a significance level of 0.05); and the probability of belonging to the model in the X space, for new observations in the prediction set combined with Hotelling's T 2 when the latter is outside the critical limit (PmodXPS+), The Hotelling's T 2 Range plot (which displays the distance from the origin in the score space for each selected observation, with a significance limit of 0.01).

Results and Discussion
The X matrix of OP descriptors were analyzed using the PCA approach. A model with six significant components (N = 46 and X = 1733) was obtained; the first three components explain 51.5% of the information content.
A robust and stable model with two significant principal components, which explains 88% of the information content of the descriptor matrix (for 16 structural descriptors), with R2Y(CUM) = 0.81 and Q2(CUM) = 0.77, was obtained. The descriptor coefficients and the VIP values included in the final PLS model are presented in Table 2.  The normal distribution pattern of descriptors [19] of the training and prediction sets were checked with a probability of 90% to find the X-outliers (for the training set) and the prediction compounds residing outside the AD, using the descriptor pool of the training and prediction set (included in the best PLS model). According to this criterion, compound 25 was found as a potential outlier for the training set. This assumption was not confirmed by the PModXPS+ criterion (Table 3), according to which compounds 9, 14, 19, 42, and 45 do not belong to the prediction X space.
The distance to the X model plot is presented in Figure 1, the score scatter plot for the best PLS model in Figure 2, and The Hotelling's T2 range plot of the best PLS model is presented in Figure 3.    In the y-scrambling test performed for the PLS model, a significant low scrambled r 2 ( 2 scr r ) and cross-validated q 2 ( 2 scr q ) values were obtained for 999 trials. Figure 6 shows that in the case of all the randomized models, the values of The experimental versus calculated pADIs plot is presented in Figure 7.      The final PLS model is robust and has good fitting results. All the criteria of this model used to check the presence of outliers in the X and Y space indicate that compounds 9, 14, 19, 42, and 45 do not belong to the predicted X space. For these compounds, new experimental toxicological data would be needed, to revise their known ADI values, for qualitative and quantitative dietary long-term risk assessments.

Conclusions
The acceptable daily intake (ADI), considered to be a measure of qualitative and quantitative dietary long-term risk assessments, was modeled for a series of 46 organophosphorus (OP) pesticides using the partial least squares approach. Molecular mechanics calculations using the MMFF94s force field gave pesticide conformer ensembles. The calculated descriptors of the resulting structures of minimum energy were related to the pADIs using the PLS method. Several criteria to verify the model stability and the potential outliers in the X and Y space were applied to establish if new experimental toxicological data would be needed for this dataset. Five OPs were found as potential outliers in the X and Y space, and new ADIs would be needed to be established for these compounds.
Author Contributions: G.I. analyzed the data; S.F.-T. performed molecular modeling calculations, the statistical analysis, and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: http://www.inchem.org/pages/pims.html.