To Bind or Not to Bind? A Comprehensive Characterization of TIR1 and Auxins Using Consensus In Silico Approaches

: Auxins are chemical compounds of wide interest, mostly due to their role in plant metabolism and development. Synthetic auxins have been used as herbicides for more than 75 years and low toxicity in humans is one of their most advantageous features. Extensive studies of natural and synthetic auxins have been made in an effort to understand their role in plant growth. However, molecular details of the binding and recognition process are still an open question. Herein, we present a comprehensive in silico pipeline for the assessment of TIR1 ligands using several structure-based methods. Our results suggest that subtle dynamics within the binding pocket arise from water–ligand interactions. We also show that this trait distinguishes effective binders. Finally, we construct a database of putative ligands and decoy compounds, which can aid further studies focusing on synthetic auxin design. To the best of our knowledge, this study is the first of its kind focusing on TIR1.


Introduction
Agricultural production is fundamental from an economic point of view and to feed the world's population.It has been estimated that from 1961 to 2020, food production from agricultural sources has increased significantly, with current per capita values nearing almost 3200 kcal/day at the end of the supply chain [1].Yet, the annual State of Food Security and Nutrition in the World (SOFI) report published by the Food and Agriculture Organization (FAO) showed that more than 700 million people faced hunger in 2022 [2].Thus, there is a pressing need to improve food security, which translates to the assurance of higher crop yields.Some of the main problems that pose a significant challenge to safe, productive, and sustainable agricultural practices are herbicide resistance and toxicity.There are documented cases of resistance of weeds to several chemical groups worldwide, reported since the discovery of herbicide resistance in common groundsel (Senecio vulgaris) to triazine herbicides in 1968 [3,4].Nowadays, herbicide resistance is spread globally, affecting 96 crops across 72 countries.This phenomenon has been documented in 513 distinct instances, involving 267 weed species that have developed resilience against herbicides [5].Moreover, despite the generally low mammalian toxicity of many newly developed herbicides, growing experimental evidence indicates that exposure to these chemicals can have detrimental effects on the development and/or reproduction of various mammalian species.Notably, 2,4-D (2,4-Dichlorophenoxyacetic acid) and its combination with 2,4,5-T (2,4,5-Trichlorophenoxyacetic acid) have been associated with reproductive issues and malformations in humans [6].
Auxins can be regarded as phytohormones, metabolically derived from L-Tryptophan, responsible for plant growth and development [7].Indole-3-acetic acid (IAA) is the most well-known example of such a compound, often deemed as the "master hormone" in plants [8].Since the 1940s, synthetic derivatives from IAA have been developed and tested, with notable examples being naphthalene acids (1-naphthalene acetic acid); phenoxycarboxylic acids (MCPA, 2,4-D); benzoic acids (dicamba); pyridinecarboxylic acids (e.g., picloram), and quinolinecarboxylic acids (quinmerac, quinclorac), all of which showed strong responses in plants, inducing damage at high concentrations while eliciting higher stability when compared to IAA [9].In the last two decades, synthetic auxins reached commercial success as herbicides, accounting for roughly 1.3 billion USD, which represents approximately 9% of total sales just in 2000 [10].
The intricated auxin signaling pathway is orchestrated by at least three protein families: transport inhibitor response/auxin signaling F-box (TIR1/AFB) proteins, auxin/indole-3-acetic acid (Aux/IAA) transcriptional repressors, and auxin response factor (ARF) transcription factors [11].At the core of this signaling cascade lies TIR1, belonging to a small family of F-box proteins, comprising five additional AFB proteins (AFB1-AFB5).These six proteins collectively function as auxin receptors [12].Auxin transport occurs within and between cells via PIN, ABCB, and AUX/LAX transporters.This then interacts with the SCFTIR1/AFB complex, which, upon creation of the SCF-auxin-Aux/IAA complex, triggers the ubiquitination of Aux/IAA transcriptional repressors, leading to the activation of auxin response genes [13].Therefore, auxin exerts its regulatory influence by directly binding to TIR1, promoting the interaction between SCFTIR1/AFB and Aux/IAAs.This interaction does not induce any conformational changes; instead, auxin acts as a molecular glue, facilitating the formation of a binding pocket for the Aux/IAA protein within the TIR1 structure [14].
Herein, we present an in silico profiling of TIR1 binding coupled with a pipeline for de novo design based on machine learning.We hope this study will aid synthetic auxin design as these molecules can provide additional tools, not just as herbicides but as molecular guidelines for the engineering of plant growth and development [15].

De Novo Design of Putative Auxins and Molecular Decoys
The crystal structure of the TIR1/SCP complex of Arabidopsis thaliana was obtained from the Protein Data Bank.The structure with PDBID 2P1P was selected as its values for resolution, R-free, and R-work are very good, also it is cocrystallized with IAA.The protein was then preprocessed with PDBFixer to add hydrogen atoms, IAA, and the cofactors were removed prior to modelling.
OpenGrowth [16] was used for the de novo generation of molecules within the binding site for IAA.As OpenGrowth uses Openbabel as the backend for compound minimization and atom typing, version 2.4.1 was manually compiled.The parameters for OpenGrowth were implemented as described in Table S1 in the Supplementary Information.Minimization was carried out using the MMFF94 force field and the SMoG2016 scoring function was used for ligand ranking.As a result, more than 1200 ligands were generated with these settings.A heuristic threshold of −8.0 kcal/mol was applied to select candidates for virtual screening.
To establish a set of nonbinders, molecular decoys were obtained with DeepCoy [17]; these were generated to match DUD-E descriptors in a 150:1 decoy-to-reference ratio, following the selection of top decoys using the provided scripts within DeepCoy.
As proof of concept, two sets of molecular descriptors were calculated.The first one explored Lipinski's space, composed of molecular weight (MW), topological polar surface area (TPSA), cLogP, rotable bonds, number of H-bond acceptors, and donors.The second set was selected from the molecular descriptors available in Mordred [18].As an initial filter, features with zero variance or with less than three different values (semiconstant variables) were discarded.Highly correlated descriptors were also excluded, i.e., those with a Kendall correlation coefficient higher than 0.9; the final set contained 395 of the original 1613 variables.Then, a subset of features was selected using the Boruta algorithm [19]; these are listed in Table S2 in the Supplementary Information.Finally, these two sets of descriptors were visualized via principal component analysis (PCA) and t-SNE using DataWarrior [20].

Machine Learning
To evaluate the overall pertinence of the generated compounds, we trained different classification models as proof of concept.These included logistic regression (LR), supporting vector machines (SVC), random forest (RF), and extreme gradient boosting classifiers (XGBoost classifier; XGBC).The comparison was performed with Scikit-learn (v.1.2.2) [21] and XGBoost [22] Python modules.Each model was trained and validated by ten-fold cross-validation, while grid search optimization was used to fine-tune their respective hyperparameters [23].Each compound was represented as a concatenated vector of molecular descriptors selected with Boruta, which was then used for training.In addition, a second iteration was made to train classifiers with topological information; i.e., extended connectivity fingerprints (ECFP4), WHIM, and GETAWAY descriptors, calculated with RDKit (v.2022.9.5) [24].Given the numerical feature differences, each representation was standardized, subtracting the mean and scaling to unit variance using Scikit-learn.
Classification performance was evaluated with several measures, including validation curves, learning curves, ROC curves, Matthews correlation coefficient, and detection error trade-off curves.

Molecular Modelling of Auxins
To better understand molecular recognition by TIR1, we propose a pipeline using mixed-solvent molecular dynamics, conventional molecular dynamics, and enhanced sampling with coarse metadynamics.Detailed information is provided in the following subsections.

Mixed Solvent Molecular Dynamics
To explore the plasticity and overall presence of hotspots within TIR1, molecular dynamics with mixed solvents was used.The MXMD script [25] was modified to include the probes included in Scheme 1.

Machine Learning
To evaluate the overall pertinence of the generated compounds, we traine classification models as proof of concept.These included logistic regression (LR ing vector machines (SVC), random forest (RF), and extreme gradient boosting (XGBoost classifier; XGBC).The comparison was performed with Scikit-learn (v and XGBoost [22] Python modules.Each model was trained and validated b cross-validation, while grid search optimization was used to fine-tune their res perparameters [23].Each compound was represented as a concatenated vecto ular descriptors selected with Boruta, which was then used for training.In second iteration was made to train classifiers with topological information; i.e connectivity fingerprints (ECFP4), WHIM, and GETAWAY descriptors, calcu RDKit (v.2022.9.5) [24].Given the numerical feature differences, each represen standardized, subtracting the mean and scaling to unit variance using Scikit-le Classification performance was evaluated with several measures, includ tion curves, learning curves, ROC curves, Matthews correlation coefficient, and error trade-off curves.

Molecular Modelling of Auxins
To better understand molecular recognition by TIR1, we propose a pipe mixed-solvent molecular dynamics, conventional molecular dynamics, and sampling with coarse metadynamics.Detailed information is provided in the subsections.

Mixed Solvent Molecular Dynamics
To explore the plasticity and overall presence of hotspots within TIR1, mo namics with mixed solvents was used.The MXMD script [25] was modified to i probes included in Scheme 1.A modified version of the Ghanakota et al. protocol was used [26]; briefly col involves the creation of replica systems with a volume/volume concentratio (water/probe).MXMD uses pre-equilibrated boxes of the selected cosolvent pr is then incorporated in a cubic box of water buffering the whole system by 1 systems were then submitted to default relaxation protocol and 15 ns of equilibr Production times were set to 20 ns, as it provides a good compromise between and computing time [27].Each MXMD run consisted of 15 simulations; thre probe, accounting for an overall sampling of 300 ns.MXMD simulations were Scheme 1.Chemical structures of the selected probes for hotspot identification.
A modified version of the Ghanakota et al. protocol was used [26]; briefly, the protocol involves the creation of replica systems with a volume/volume concentration of 95/5% (water/probe).MXMD uses pre-equilibrated boxes of the selected cosolvent probe, which is then incorporated in a cubic box of water buffering the whole system by 15 Å.These systems were then submitted to default relaxation protocol and 15 ns of equilibration time.Production times were set to 20 ns, as it provides a good compromise between sampling and computing time [27].Each MXMD run consisted of 15 simulations; three for each probe, accounting for an overall sampling of 300 ns.MXMD simulations were carried out with PDBIDs 2P1M, 2P1P, and 2P1N, which correspond to TIR1 with IHP cofactor; TIR1 bound to IAA and IHP; and TIR1 bound to 2,4-D and IHP, respectively.
Hotspots were then evaluated using the following score: where Zxyz is the occupancy of functional groups of the probes, converted to Z-scores using the following equation:

Assessment of Pocket Solvation and Its Role in Auxin Recognition
Molecular recognition is often grounded on the "desolvation paradigm", which considers that water molecules need to be displaced by ligands during binding events [28].Only recently, this notion has been challenged with the discovery of protein families, which possess a tightly bound water network, such as bromodomains.It has been shown that these complex arrangements serve as enthalpy control for ligand binding [29].Additionally, it has been suggested that a robust description of binding modes is more consistent when ligands are properly solvated [30,31].
Thus, we conducted a series of simulations under different starting conditions using 3D structures with the following PDBIs 2P1N, 2P1O, 2P1P, and 2P1Q.Structures were prepared with pdbfixer to add missing atoms and/or residues followed by hydrogen bond optimization using PropKa and Maestro's protein preparation wizard.From here, two independent systems were built: with and without preserving water molecules in the vicinity of cocrystal ligands.Both systems had the protein buffered in orthorhombic boxes of TIP3P water and NaCl at 0.15 M. TIR1-ligand complexes were then submitted to molecular dynamics simulations using Desmond (v.6.1)[32].Systems were parametrized with the OPLS_2005 force field simulation and integration was carried out with the RESPA algorithm using 4 and 8 fs values for timesteps.A hydrogen mass repartition scheme was implemented to allow the timesteps to increase [33].The prepared systems were then relaxed using Desmond's default protocol.This includes Brownian dynamics and an equilibration period to attain NVT and NPT conditions.The target values were 300 K for temperature and 1 atm for pressure.Following system relaxation, the complex was equilibrated for 50 ns under the NPT ensemble using MTTK barostat and Nosé-Hoover chain thermostat.Production runs of 500 ns for both systems of each PDBID were obtained and analyzed for RMSD and RMSF with MDTraj (v.1.98)[34].In addition, interaction fractions were obtained with Maestro.TIR1-auxin contacts were also used to define microstates to construct transition networks to assess binding mode metastability.
Finally, we tested pocket solvation using grand-canonical Monte Carlo (GCMC) as it is a well-documented method with notable results [35][36][37][38].As proof of concept, we selected the IAA/Tryptophan pair.As positive and negative controls, respectively, IAA was used as is from PDBID 2P1P and tryptophan was docked to the structure using PLANTS (more details in the following section).Each complex was prepared as described above, without preserving the water molecules of the binding site.Pocket solvation was achieved using the GCMC plugin in Desmond.This required defining the MC region in a buffer of 6 Å around ligands; solvent parameters were default values for TIP3P water, similar to those described elsewhere [39].
Simulations were carried out in five stages, beginning with GCMC/MD with restraints on heavy atoms, consisting of 10,000 MC moves per cycle and a total of 500 ps of MD.This stage was followed by GCMC/MD with 100,000 moves per cycle, 1 ns of MD, and no restraints.Systems were then relaxed with short runs of Brownian dynamics and Langevin dynamics for equilibration of volume and pressure.For these simulations, hydrogen mass repartitioning was not used; thus, timestep values were kept at default.An additional equilibration of 10 ns using the MTTK barostat and the Nosé-Hoover thermostat was carried out.Hydrogen mass repartition was implemented during this stage.
Finally, production runs of 500 ns were made for each ligand, keeping conditions from the previous step.

Molecular Dynamics
We compared a series of pairs, consisting of known ligands and chemically related "decoys".Decoys were intended to be low affinity/nonbinders for TIR1 based on experimental data [40,41] while also having structural similarity to known ligands.Selected pairs were IAA/Tryptophan; α-naphthylacetic acid/β-naphthylacetic acid; Diclophenac/Rinskor and Fluroxypyr/Quinclorac.
As a first step, compounds were docked to PDBID 2P1P using PLANTS software (v.1.2) as it is a well-established method with high accuracy and consistency [42][43][44][45].Ligands were prepared using SPORES [46] to obtain proper input for PLANTS.The binding site was defined in a sphere with a radius of 15 Å centered with IAA's crystallographic coordinates.Additional parameters were kept at default values.
The top-ranked pose was selected for GCMC/MD as detailed previously and production times were kept at 500 ns for each ligand.In addition to the previously mentioned analyses, concurrence maps were calculated using ContactMap explorer (v.0.7.0) [47].

Coarse Metadynamics
To evaluate the metastability and get a rough estimate of the affinity differences between binders and decoys, well-tempered metadynamics was used.This enhanced sampling method has been successfully applied to study binding events [48][49][50][51][52]. Metadynamics uses specific degrees of freedom, which are related to the desired reaction coordinate (collective variables; CVs).In this case, we applied two CVs; the first was defined using distances between centers of mass (protein and ligand).For the protein, residues 435-437 were selected and the ligand's heavy atoms delimited the second center of mass.The remaining CV was defined from WHIM's second principal moment using the ligand's heavy atoms.The bias factor was set to 5.0 while the Gaussian height was set to 0.08 kcal/mol with a width of 0.25 Å.The Gaussians were deposited every 250 steps; additional parameters were similar to those detailed in Section 2.3.2.
Additionally, we examined the role of the so-called engagement niche, a region composed of residues K410, S440, G441, A464, and P465 [53].For this case, the compounds were docked at a centroid of these residues in PDBID 2P1M.The same definition of CVs was used.Both metadynamics simulations had production times of 50 ns.

De Novo Design of Putative Auxins and Decoys
Figure 1 shows the chemical space delimited by the selected descriptors with Boruta.Loading values can be found in Table S3 in the Supplementary Information.A good superposition between generated binders and decoys was observed in both principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE).This suggests that classes may not be linearly separable; a similar behavior was found using the Rule of Five descriptors (Figure S1).Such a trait is positive, as it shows that decoys are indeed robust due to sharing similar physicochemical spaces, which in turn may lead to effective discrimination of its latent space with a good value as screening tools.In addition, this lack of clear separation underscores the challenge of effective classification using only physicochemical descriptors.Overreliance on molecular properties leading to negative effects, such as molecular inflation, has been reported [54,55].Having established a good baseline, we trained different classifiers using machine learning algorithms.

Machine Learning
Classification and prediction of biological activity based on latent spaces have been attempted since early quantitative structure activity relationship (QSAR) studies.Still, after several decades and unexpected results, it became clear that the QSAR is hardly linear in nature [56], with the added complexity of highly similar compounds with highly variable biological activity, an issue now defined as activity cliffs [57].Such a concept has evolved to activity ridges due to the consistent presence of such behavior in certain scaffolds and not just compounds [58].
The emergence of empirical rules, such as Lipinski's Rule of Five [59], suggested the existence of latent spaces for drug discovery.The notion of a drug-likeness has been challenged and perhaps clearly debunked [60].Herein, we did not intend to find a set of empirical rules to characterize putative auxins but rather a starting point for chemical space exploration and expansion.We aim to aid the rational design of auxins and related agents.To this end, we used the generated compounds to train commonly reported classifying algorithms in order to verify chemical space coverage and the overall significance of structural information within our data set.For classification tasks, several methods are used in conjunction to evaluate the accuracy and consistency of predictions.Hereunder, we selected a combination of four measurements that convey a general picture of both the data and the model's performance: 1. Validations curves: To visualize the trend of variance and suggest the presence of bias; either underfitting or overfitting.

Machine Learning
Classification and prediction of biological activity based on latent spaces have been attempted since early quantitative structure activity relationship (QSAR) studies.Still, after several decades and unexpected results, it became clear that the QSAR is hardly linear in nature [56], with the added complexity of highly similar compounds with highly variable biological activity, an issue now defined as activity cliffs [57].Such a concept has evolved to activity ridges due to the consistent presence of such behavior in certain scaffolds and not just compounds [58].
The emergence of empirical rules, such as Lipinski's Rule of Five [59], suggested the existence of latent spaces for drug discovery.The notion of a drug-likeness has been challenged and perhaps clearly debunked [60].Herein, we did not intend to find a set of empirical rules to characterize putative auxins but rather a starting point for chemical space exploration and expansion.We aim to aid the rational design of auxins and related agents.To this end, we used the generated compounds to train commonly reported classifying algorithms in order to verify chemical space coverage and the overall significance of structural information within our data set.For classification tasks, several methods are used in conjunction to evaluate the accuracy and consistency of predictions.Hereunder, we selected a combination of four measurements that convey a general picture of both the data and the model's performance:

1.
Validations curves: To visualize the trend of variance and suggest the presence of bias; either underfitting or overfitting.

2.
Learning curves: Convey information on the learning rate as a function of n in data; i.e., if the provided examples are enough for effective classification or if the addition of data may benefit scoring.

ROC curves and Matthews correlation coefficient (MCC):
These provide a general description of the predictive power due to their relation with the confusion matrix.
The inclusion of both is complementary.As MCC can take values between −1 and 1, it is possible to assess if a model is indeed providing a significant difference when applied or if its success relies more on chance, even when the area under the curve of the ROC plot would suggest otherwise.4.
Detection error trade-off: Similar to the ROC curve, this metric gives information on the success of classification tasks.During classification, there is always a trade-off between the rate of falsely classified values.Therefore, this plot conveys the trend of said trade-off; an ideal classifier would keep a low rate of both false positives and false negatives, resulting in a curve downward and to the left, an inverse of sorts to an expected ROC curve.All these plots can be found in the Supplementary Information (Figures S2-S9).ROC curves and error trade-off for the models' best models are shown in Figure 2. To confirm the second half of the provided definition, topological and structural features shall be more conclusive in this classification test.Indeed, consideration of these features proved very significant, as the area under the ROC curves increased roughly by 0.2 in all cases.A random forest classifier showed the lowest AUC value, consistent with its MCC and error trade-off.Following this logic, it would seem that logistic regression was the top performer.Yet, such is not the case, as its MCC was 0.53.For this test, nonlinear As expected, models trained solely with physicochemical data showed low performance.Based on the area under the ROC curves, algorithms based on linear methods performed worst.Moreover, it is quite significant that the classifiers were barely superior to random chance.When analyzing the detection error trade-off, it also gives an almost linear trend, suggesting that any improvement in the discrimination of either false positives or negatives inherently results in an increase in misclassification of the other.This underscores that proper discrimination between auxins and nonbinders is no easy task.Even random forest and supporting vector classifiers, which are often regarded as very robust [61][62][63][64][65], showed poor performance.In addition, a strong tendency to overfitting was found for these models (see Figures S4 and S6 in the Supplementary Information).
Such results verify the convenience of generating molecular decoys.A common use of molecular decoys is for structure-based design.Thus, decoys ought to be molecules that share chemical space with actives, with little to no pharmacophoric similarity.The quest for such compounds resulted in the construction of datasets such as DEKOIS [66] or, most notably, DUDE-Z [67].Only recently have use cases been extended to ligand-based methods and scoring function construction [68,69].Hence, we have confirmed that decoys do share the chemical space using two sets of very different descriptors.
To confirm the second half of the provided definition, topological and structural features shall be more conclusive in this classification test.Indeed, consideration of these features proved very significant, as the area under the ROC curves increased roughly by 0.2 in all cases.A random forest classifier showed the lowest AUC value, consistent with its MCC and error trade-off.Following this logic, it would seem that logistic regression was the top performer.Yet, such is not the case, as its MCC was 0.53.For this test, nonlinear methods were more consistent and showed significantly better performance.At first glance, SVC would seem the superior choice; then again, comparing MCC values suggests that XGBC shall be preferred.
Our goal was achieved, as the generated compounds do serve as a reasonable sample for further analysis while also conveying enough information on true binders; still, there is no denying that the rate of false negatives remains quite high.To some extent, this result was to be expected, as no computational method is completely accurate and reliable on its own.Thus, we used structure-based methods to gain further information on what characterizes a true binder of TIR1.

Mixed Solvents Molecular Dynamics
Protein plasticity is paramount for their function, still this results in an outstanding challenge to tackle, computationally speaking.A quick comparison between crystallographic structures of TIR1 (Table 1) shows little conformational variations.Nonetheless, in a dynamic context, substantial differences may arise, with critical implications for binding events [70,71].Hence, to better understand the overall flexibility and features of the auxin binding pocket, we conducted molecular dynamics with mixed solvents.
During drug discovery campaigns, molecular design often follows a fragment-based approach where protein structures are crystallized iteratively in the presence of probe scaffolds or leads, thus creating an empirical mapping for binding phenomena [72].A computational equivalent of this can be found in molecular dynamics with mixed solvents, where a rather discrete amount of polar solvent molecules is introduced to the simulation cell.Following an equilibration period, the probe molecules will interact with the protein.Therefore, the occupancy of these can be evaluated and further compared.A notable advantage of such protocol is that the calculation of occupancy can be made from different probes as a means of attaining consensus, which, in turn, increases descriptive power and may be used to delimit allosteric or even cryptic pockets within proteins [73].
For this work, 45 independent simulations of TIR1 structures were produced, accounting for 900 ns. Figure 3 shows the computed surfaces from this sampling.Notably, the auxin binding pocket had the highest scoring.scaffolds or leads, thus creating an empirical mapping for binding phenomena [72].A computational equivalent of this can be found in molecular dynamics with mixed solvents, where a rather discrete amount of polar solvent molecules is introduced to the simulation cell.Following an equilibration period, the probe molecules will interact with the protein.Therefore, the occupancy of these can be evaluated and further compared.A notable advantage of such protocol is that the calculation of occupancy can be made from different probes as a means of attaining consensus, which, in turn, increases descriptive power and may be used to delimit allosteric or even cryptic pockets within proteins [73].
For this work, 45 independent simulations of TIR1 structures were produced, accounting for 900 ns. Figure 3 shows the computed surfaces from this sampling.Notably, the auxin binding pocket had the highest scoring.In general, the conformational variations in TIR1 did result in noticeable changes in the shape of the druggable regions.Overall, the observed ring shape is suggestive of the known mechanism of auxins, as these stabilize the protein-protein interface.Considering that said shape is more prominent in 2P1N, a link between ligand polarity and pocket dynamics is suggested.
As for individual probes, in 2P1M major occupancy was found for N-methylacetamide and acetohydroxamic acid.In contrast, 2P1P and 2P1N showed a major occupancy of drug-like probes, particularly imidazole.
Additionally, significant changes in the neighboring region to the engagement niche were found.This region comprises a phenylalanine triad (F79, 82, and 380), of which F82 has been reported as significant for the molecular perception of auxins [74].Based on the alignments of Table 1, F79 and 82 are consistently flexible between structures.A higher deviation can be found precisely between 2P1P and 2P1Q, suggesting a transition between hydrophobic and stacking interactions.This, together with the observed changes in molecular dynamics with mixed solvents, reinforces the importance of stacking interactions within the binding pocket.

Assessment of Pocket Solvation and Its Role in Auxin Recognition
As stated in the Methods Section, only recently has the role of pocket solvation been critically assessed within the context of small molecule binding.Based on crystallographic structures, it does seem that TIR1 has several water molecules that ought to participate in auxin binding [75].To determine if such contribution is relevant for auxin perception, we compared protein-ligand interactions under different starting conditions of solvation (Figure 4).
Major differences between the interaction profiles include the contact with R436.This interaction seems to be mostly driven by water bridges, suggesting a change in ligand orientation and directionality.From the previous section, residues showing changes between TIR1 conformations are H78, L439, and A464, with the latter being more displaced when 2,4-D is bound to TIR1.This trait is noteworthy, as A464 belongs to the engagement niche.Perhaps the observed variation for L439 can also be linked to the presence of water bridges.Unfortunately, this was only observed for IAA, so a clear reason remains elusive.Nevertheless, this may be due to the presence of a nitrogen atom in the indole scaffold, a feature absent in other binders under study.Still, previous studies have suggested that the presence of this heteroatom is paramount for the potency and overall affinity of IAA [76].Hence, it does seem that water interactions are significant within the engagement niche.
H78 also showed subtle but significant differences; unfortunately, it is difficult to assess if these are the result of solvation or merely the nature of numerical simulation.Another subtle effect is the change in hydrophobic contacts to stacking interactions.It can be argued that, due to the lipophilicity of auxins, these are prone to the latter.However, considering the cases of NAA and 2,4-D (2P1O and 2P1N, respectively), interaction distribution and persistence did change due to the solvation environment.Figures S11 and S12 show some examples of this (vide infra).
within the binding pocket.

Assessment of Pocket Solvation and Its Role in Auxin Recognition
As stated in the Methods Section, only recently has the role of pocket solvation been critically assessed within the context of small molecule binding.Based on crystallographic structures, it does seem that TIR1 has several water molecules that ought to participate in auxin binding [75].To determine if such contribution is relevant for auxin perception, we compared protein-ligand interactions under different starting conditions of solvation (Figure 4).For further proof, we constructed transition networks based on protein-ligand interactions.Briefly, these networks encode microstate information.Hence, neighboring microstates share similar information.If these clusters are densely populated, this can be related to metastability, delimiting a local minimum [77].Plus, given enough data, these can even confer kinetic information [78].Thus, it is very important to delimit which information is used to describe said microstates.Due to this, an additional criterion for comparison was introduced.Transition networks were then constructed with and without the inclusion of water bridges in their definition.These involve a hydrogen bond with the following criteria: a distance of 2.8 Å between the donor and acceptor atoms; an angle ≥ 110 • between donor-hydrogen-acceptor atoms; an acceptor angle ≥ 90 • between hydrogen-acceptor-bonded atoms.Figure 5 shows the obtained transition networks.
As suggested previously, water-ligand interactions do have a significant impact on auxin binding.Metastability of binding modes was present in all three simulations with crystal waters; plus, this trait is further accentuated with the introduction of water bridges to microstate definition.Such behavior is more noticeable for the synthetic auxins, reinforcing the established link between ligand polarity and pocket dynamics.In stark contrast, transition networks from conventional protocols were highly variable.For IAA's case (2P1Q), an inverse pattern is observed as it shows some degree of metastability based on the three clusters and a densely populated network.NAA, on the other hand, showed highly transient interactions.While some states are indeed revisited, no apparent clustering arises under these conditions.The omission of water bridges does result in a more complex network; then again, the transient nature of contacts persists.Finally, for 2,4-D, metastability is clearly shown by three well-defined clusters, suggesting a local minima or basin.
The inclusion of water-mediated interactions did improve the representation and gave a more robust description of basins and metastability.It can be argued that such observations are to be expected due to the complex nature of binding and the balance of enthalpic and entropic contributions.Nonetheless, our aim is not to strictly evaluate the role of water interactions but their general contribution to discern between binders and nonbinders.To this end, we conducted GCMC/MD simulations with IAA and TRP.For these simulations, PDBID 2P1P was used.The comparison was between a common MD protocol and GCMC/MD using similar analyses described above.Figure 6 shows proteinligand interactions retrieved from IAA and TRP.For the most part, the GCMC/MD protocol was successful in establishing a water network within the pocket.The prominent water bridge between the ligand and R436 was also recovered.This is a positive result, as it highlights the robustness of the method and its placement of solvent molecules.
On the other hand, the interaction profile remains consistent and similar to those in previous simulations, providing further proof of the significance of water bridges in microstate definition.For IAA, perhaps a notable change is the dominant stacking interaction with F79 instead of a hydrophobic contact.In stark contrast, transition networks from conventional protocols were highly variable.For IAA's case (2P1Q), an inverse pattern is observed as it shows some degree of metastability based on the three clusters and a densely populated network.NAA, on the other hand, showed highly transient interactions.While some states are indeed revisited, no apparent clustering arises under these conditions.The omission of water bridges does result in a more complex network; then again, the transient nature of contacts persists.Finally, for 2,4-D, metastability is clearly shown by three well-defined clusters, suggesting a local minima or basin.
The inclusion of water-mediated interactions did improve the representation and gave a more robust description of basins and metastability.It can be argued that such observations are to be expected due to the complex nature of binding and the balance of enthalpic and entropic contributions.Nonetheless, our aim is not to strictly evaluate the role of water interactions but their general contribution to discern between binders and nonbinders.To this end, we conducted GCMC/MD simulations with IAA and TRP.For these simulations, PDBID 2P1P was used.The comparison was between a common MD protocol and GCMC/MD using similar analyses described above.Figure 6 shows protein-ligand interactions retrieved from IAA and TRP.For the most part, the GCMC/MD protocol was successful in establishing a water network within the pocket.The prominent water bridge between the ligand and R436 was also recovered.This is a positive result, as it highlights the robustness of the method and its placement of solvent molecules.
On the other hand, the interaction profile remains consistent and similar to those in previous simulations, providing further proof of the significance of water bridges in microstate definition.For IAA, perhaps a notable change is the dominant stacking interaction with F79 instead of a hydrophobic contact.
In contrast, one of the most consistent residues was R436, which, in turn, suggests that this water bridge can be regarded as "canonical" and not anecdotal behavior.Moreover, the complete absence of said bridge in conventional protocols is very interesting.A similar case was found in 2P1O; this interaction was not able to form under standard conditions.Of note, simulations of both 2P1Q and 2P1N were able to form this interaction.The fact these subtle variations within the binding site of TIR1 introduce such divergent results is remarkable.This stands as a cautionary tale for the well-known limitations of structure base methods, while also highlighting that structure selection is no trivial matter [79,80].Herein, it was found that 2P1P is prone to the loss of ligand-solvent interactions, whereas 2P1Q and 2P1N may yield more reliable and consistent results.For further assessment, we analyzed the root mean square fluctuation of auxins in these sets of simulations (Figures S10-S13).In contrast, one of the most consistent residues was R436, which, in turn, suggests that this water bridge can be regarded as "canonical" and not anecdotal behavior.Moreover, the complete absence of said bridge in conventional protocols is very interesting.A similar case was found in 2P1O; this interaction was not able to form under standard conditions.Of note, simulations of both 2P1Q and 2P1N were able to form this interaction.The fact these subtle variations within the binding site of TIR1 introduce such divergent results is remarkable.This stands as a cautionary tale for the well-known limitations of structure base methods, while also highlighting that structure selection is no trivial matter [79,80].Herein, it was found that 2P1P is prone to the loss of ligand-solvent interactions, whereas 2P1Q and 2P1N may yield more reliable and consistent results.For further assessment, we analyzed the root mean square fluctuation of auxins in these sets of simulations (Figures S10-S13).
Considering IAA and TRP in 2P1P, RMSF plots indicate higher fluctuation when water molecules are correctly placed in the pocket.Under a standard protocol, the fluctuations are significantly suppressed, more so for IAA's case.For both ligands, higher fluctuation was observed on the nitrogen side of the scaffold.This is worth mentioning, as this part of the molecule faces in the vicinity of F79.
When comparing NAA and 2,4-D, interesting patterns were found.For the former, the fluctuation profile was very persistent with respect to its fit within the binding site, with the main difference being found in the carboxylic acid moiety, which may be explained by the loss of the water bridge with R436.For 2,4-D, the fluctuation profile changed but in contrast to IAA, TRP, and NAA, higher values were observed under stand- Considering and TRP in 2P1P, RMSF plots indicate higher fluctuation when water molecules are correctly placed in the pocket.Under a standard protocol, the fluctuations are significantly suppressed, more so for IAA's case.For both ligands, higher fluctuation was observed on the nitrogen side of the scaffold.This is worth mentioning, as this part of the molecule faces in the vicinity of F79.
When comparing NAA and 2,4-D, interesting patterns were found.For the former, the fluctuation profile was very persistent with respect to its fit within the binding site, with the main difference being found in the carboxylic acid moiety, which may be explained by the loss of the water bridge with R436.For 2,4-D, the fluctuation profile changed but in contrast to IAA, TRP, and NAA, higher values were observed under standard conditions.Taking into account that atoms showing this behavior were both chlorines and the carbon atoms of this side of the phenyl ring, this is yet another case where the solvation environment ought to be the main driving force.
As for the ligand's fit, the fluctuation was very consistent and almost unaffected.In all cases, the highest value was found in the charged oxygen.For TRP, this was more prominent and even expected due to the amino group.Beyond that, only 2,4-D showed differences in fluctuation for this atom between protocols.

Molecular Dynamics
In view of prior results, we used an integrative approach for the analyses of known binders and nonbinders.In an attempt to facilitate the correct assessment of binding, we propose the use of protein-ligand interaction fractions, concurrence maps, transition networks, and ligand-protein interactions simultaneously.We present two cases as proof of concept: diclofenac (DCF) and quinclorac (QCL).Both are arylic carboxilic acids, with quinclorac being an auxin herbicide.However, the molecular target of quinclorac is not TIR1 but rather F-box protein ABP5 [40].
DCF is a nonsteroidal anti-inflammatory agent (NSAID) and, thus, not an auxin per se but it has been shown that exposure to this drug can cause effects similar to those of IAA [41]. Figure 7 presents the consensus analyses of the molecular dynamics simulation of DCF after docking in TIR1.Indeed, it can be observed that its interaction profile is similar to that of synthetic auxins.Nonetheless, there are no ionic or stacking interactions with H78.This may be attributed to the ring's orientation due to the hindrance of the chlorine atoms.Another significant observation is the diminished presence of the water bridge with R436.Both traits are consistent with our previous results and hypothesis on the role of auxin polarity, water bridges, and pocket dynamics.

Molecular Dynamics
In view of prior results, we used an integrative approach for the analyses of known binders and nonbinders.In an attempt to facilitate the correct assessment of binding, we propose the use of protein-ligand interaction fractions, concurrence maps, transition networks, and ligand-protein interactions simultaneously.We present two cases as proof of concept: diclofenac (DCF) and quinclorac (QCL).Both are arylic carboxilic acids, with quinclorac being an auxin herbicide.However, the molecular target of quinclorac is not TIR1 but rather F-box protein ABP5 [40].
DCF is a nonsteroidal anti-inflammatory agent (NSAID) and, thus, not an auxin per se but it has been shown that exposure to this drug can cause effects similar to those of IAA [41]. Figure 7 presents the consensus analyses of the molecular dynamics simulation of DCF after docking in TIR1.Indeed, it can be observed that its interaction profile is similar to that of synthetic auxins.Nonetheless, there are no ionic or stacking interactions with H78.This may be attributed to the ring's orientation due to the hindrance of the chlorine atoms.Another significant observation is the diminished presence of the water bridge with R436.Both traits are consistent with our previous results and hypothesis on the role of auxin polarity, water bridges, and pocket dynamics.
When analyzing contact concurrence and interaction network, these suggest that DCF remains in the general vicinity of the main pocket.This, in addition to the observed basins and metastability, can be taken as an indication of true binding.These observations provide a strong case in favor of the proposed protocol; still, a criterion related to binding affinity is needed to assess the relative potency of a given ligand.When analyzing contact concurrence and interaction network, these suggest that DCF remains in the general vicinity of the main pocket.This, in addition to the observed basins and metastability, can be taken as an indication of true binding.These observations provide a strong case in favor of the proposed protocol; still, a criterion related to binding affinity is needed to assess the relative potency of a given ligand.
On the other hand, QCL has no experimental affinity for TIR1. Figure 8 shows the results of the consensus analyses from MD.Based on the interaction profile, all contacts are significantly weaker for quinclorac.Additionally, the water bridge with R436 is nonexistent; perhaps it is due to this that the interaction network shows a highly transient behavior.No basins were observed and this is consistent with the concurrence plot as it suggests that quinclorac remains near hydrophobic residues; i.e., F82 and 380.Due to its planarity, the presence of stacking interactions with these hydrophobic residues may be enough to stabilize QCL within the pocket.Beyond that, any significant contact is barely present; the main anchor was R489, as confirmed by ligand-protein interactions.Yet, even this interaction is lost in favor of hydrogen bonding with H78.On the other hand, QCL has no experimental affinity for TIR1. Figure 8 shows the results of the consensus analyses from MD.Based on the interaction profile, all contacts are significantly weaker for quinclorac.Additionally, the water bridge with R436 is nonexistent; perhaps it is due to this that the interaction network shows a highly transient behavior.No basins were observed and this is consistent with the concurrence plot as it suggests that quinclorac remains near hydrophobic residues; i.e., F82 and 380.Due to its planarity, the presence of stacking interactions with these hydrophobic residues may be enough to stabilize QCL within the pocket.Beyond that, any significant contact is barely present; the main anchor was R489, as confirmed by ligand-protein interactions.Yet, even this interaction is lost in favor of hydrogen bonding with H78.Results for the other pairs under study can be found in the Supplementary Information, Figures S14-S19.Akin to results from Section 3.3.2,IAA showed a highly stable binding.Stacking interactions with F82 and 380 were observed.The transition network showed only one highly populated basin with significant metastability.Also, the persistence of the water bridge with R436 is noteworthy.
Interestingly, as discussed in the previous section, a densely populated basin was found for TRP.Such a trait by itself would suggest that TRP is a putative ligand of TIR1.Nevertheless, inspection of the interaction profile is divergent when compared to that of known binders, just as QCL contact with H78 is diminished and mostly ionic in nature.Moreover, it must be stated that, based on concurrences and ligand protein interactions, the recovered binding mode for TRP is notably stable.Even after 500 ns, this nonbinder was able to remain within the pocket.This may be attributed to the indole scaffold due to Results for the other pairs under study can be found in the Supplementary Information, Figures S14-S19.Akin to results from Section 3.3.2,IAA showed a highly stable binding.Stacking interactions with F82 and 380 were observed.The transition network showed only one highly populated basin with significant metastability.Also, the persistence of the water bridge with R436 is noteworthy.
Interestingly, as discussed in the previous section, a densely populated basin was found for TRP.Such a trait by itself would suggest that TRP is a putative ligand of TIR1.Nevertheless, inspection of the interaction profile is divergent when compared to that of known binders, just as QCL contact with H78 is diminished and mostly ionic in nature.Moreover, it must be stated that, based on concurrences and ligand protein interactions, the recovered binding mode for TRP is notably stable.Even after 500 ns, this nonbinder was able to remain within the pocket.This may be attributed to the indole scaffold due to the stacking interactions with both F82 and 380, the strong presence of the water bridge with R436, and additional water bridges with the engagement niche.May this be a notable warning that even within "long" simulations, nonbinders can be falsely identified as hits due to pharmacophore similarity.As seen with DCF, this is yet another case where affinity would provide a decisive criterion for true binding assessment and pruning.
As for NAA, both isomers had similar concurrence maps and ligand-protein interactions.Despite this, the interaction profile of the α-napthyl acetic acid showed stacking interactions with H78, F79, and 82; traits absent for the β isomer.The main difference, however, was found in transition networks.With one of the NAA showed a clear basin with significant metastability, whereas 2-NAA showed a network more in line with the one observed for QCL.Experimental evidence confirms that, indeed, NAA has a higher potency.
Finally, we discuss the results for fluroxypyr (FXY) and rinskor (RSK); both auxin herbicides belonging to the picolinate subclass [81].Similarly to QCL, these have a preference for ABF proteins.Still, fluroxypyr can bind significantly to TIR1.Based on the interaction profile, FXY seems to be a binder whose main anchors are H78 and R489.Interestingly, this arginine exhibits an analogous behavior to that of R436.
This added to its general positioning, makes us hypothesize that auxin orientation initiates with K410 just outside the engagement niche, with R489 serving as the pulling mechanism toward the auxin site, a premise supported by the transition network, where some minor basins seemed to be found with shared microstates.Based on the interaction profile and the presence of additional water bridges, this could be the origin of said transitions.The concurrence plot also seems to favor this, since FXY stays in the general vicinity of H78 during the simulation, making close contact with D487 only during the last segment of the run.
In turn, for RSK, the interaction profile was dominated by hydrophobic and stacking interactions.Notably, water bridges were also present, again with R489.A crucial detail, however, is that RSK presented an inverse binding; i.e., a general orientation toward the engagement niche and R489 and the phenylalanine triad.Uzunova et al., found similar arrangements for TRP using tomographic docking [53].Perhaps this can be attributed to the absence of the carboxylic acid moiety, esterified in RSK.
Of note, both RSK and related picolinate Arylex are readily metabolized to carboxylates; it has been suggested that their high potency is due to this activation step [82].In our case, the preservation of the ester was deliberate, precisely to test the interaction profile and prior observations.The transition network shows a putative basin with transient behavior; still, as the simulation continues, microstates are highly transient and the basin is no longer explored.In this regard, the concurrence map suggests that overall RSK remains very stable during most of the run.Therefore, this is yet another case where water bridges may be the reason behind this pattern.

Coarse Metadynamics
In order to have an additional criterion, for cases where consensus analyses could prove ambiguous, we conducted coarse metadynamics.Briefly, metadynamics belongs to enhanced sampling methods, intended to favor rare events.For this case, a bias is introduced as specific degrees of freedom (CVs), which are subject to Gaussian potentials.These potentials help to fill the minima defined by CVs; in the context of small molecule binding, a coarse approach has been proposed and validated elsewhere [49].While it is true that metadynamics has been successfully applied to determine binding mode stability, a major challenge is CV selection.
Herein, we used the distance between centers of mass as it has proven to be a very robust choice in previous studies.Now, for the ligand's degrees of freedom, we settled on the WHIM descriptor due to its conceptualization and the overall positive results found during machine learning.Commonly, metadynamics simulations are made from docking poses.Therefore, in order to probe the engagement niche, we carried out docking in the vicinity of this region and used metadynamics to "force" ligands toward the binding pocket.Figure 9 shows free energy surfaces (FES) for DCF.
pocket and adopts a canonical orientation toward H78, F82, and 380, indicative of true binding.It can be argued that in both arrangements, DCF is enclosed by several water molecules.Yet, this could be due, to the noisy behavior of short metadynamics.Now, taking all prior results and the computed affinity value (~−19 kcal/mol) confirms the role of DCF as a true binder.Nevertheless, the significance of this value remains relative to that of other auxins calculated with the same protocol, discussed below.This compound only exhibited one basin in each metadynamics run.Interestingly, the distance and affinity values for these were close.Nonetheless, when binding poses from the basins were inspected, an interesting picture emerged (Figure 10).When DCF starts within the pocket and is pulled outward, it gets stuck in the vicinity of F351 and R489.In contrast, when DCF starts from the engagement niche, it is able to enter the auxin pocket and adopts a canonical orientation toward H78, F82, and 380, indicative of true binding.It can be argued that in both arrangements, DCF is enclosed by several water molecules.Yet, this could be due, to the noisy behavior of short metadynamics.Now, taking all prior results and the computed affinity value (~−19 kcal/mol) confirms the role of DCF as a true binder.Nevertheless, the significance of this value remains relative to that of other auxins calculated with the same protocol, discussed below.Recovered FES for QCL are shown in Figure 11.For this compound, a clear unbinding event took place.Akin to DCF, a single basin was found, when a visual inspection of the binding mode was carried out (Figure 12), QCL was outside the neighboring region of the engagement niche.Contrary to DCF, the synthetic auxin was not able to effectively bind to any region during its exit.When QCL starts from the engagement niche, based on COM, distance values suggest that binding did occur.Following visual inspection of the binding mode, it would seem that a true binding was indeed achieved, as QCL is interacting with H78.Then again, this is only via hydrogen bonding, there is no interaction with F79, 82, or 380 nor any contact or presence of any arginine residue.Even if the suggested affinity is rather high (−32 kcal/mol), this binding mode is not characteristic of any auxin under study.Also, this value can be the result of low sampling of the observed basin.As stated, even affinity estimates from metadynamics are qualitative in nature, more so when short runs are produced.Given enough time, this value is likely to diminish toward a more representative one.Now, for the additional auxins and nonbinders, their FES diagrams and representative binding poses from basins can be found in the supplementary information (Figures S20-S31).Herein, we just provide some details and highlights but we think that the results are self-explanatory.
IAA serves to establish a baseline for affinity values.Akin to experimental reports, it showed the highest value of all true binders (−24 kcal/mol) when initiating from within the binding pocket, followed closely by NAA isomers (−22 kcal/mol), also consistent with experimental affinity, which is usually one order of magnitude below IAA [15,40,83].For Recovered FES for QCL are shown in Figure 11.For this compound, a clear unbinding event took place.Akin to DCF, a single basin was found, when a visual inspection of the binding mode was carried out (Figure 12), QCL was outside the neighboring region of the engagement niche.Contrary to DCF, the synthetic auxin was not able to effectively bind to any region during its exit.When QCL starts from the engagement niche, based on COM, distance values suggest that binding did occur.Following visual inspection of the binding mode, it would seem that a true binding was indeed achieved, as QCL is interacting with H78.Then again, this is only via hydrogen bonding, there is no interaction with F79, 82, or 380 nor any contact or presence of any arginine residue.Even if the suggested affinity is rather high (−32 kcal/mol), this binding mode is not characteristic of any auxin under study.Also, this value can be the result of low sampling of the observed basin.As stated, even affinity estimates from metadynamics are qualitative in nature, more so when short runs are produced.Given enough time, this value is likely to diminish toward a more representative one.Now, for the additional auxins and nonbinders, their FES diagrams and representative binding poses from basins can be found in the supplementary information (Figures S20-S31).Herein, we just provide some details and highlights but we think that the results are self-explanatory.
IAA serves to establish a baseline for affinity values.Akin to experimental reports, it showed the highest value of all true binders (−24 kcal/mol) when initiating from within the binding pocket, followed closely by NAA isomers (−22 kcal/mol), also consistent with experimental affinity, which is usually one order of magnitude below IAA [15,40,83].
For these cases, metadynamics could not distinguish between isomers just with affinity estimates.Nonetheless, only NAA showed true binding modes in both metadynamics runs.
TRP was correctly classified as a nonbinder, even when a quasicanonical arrangement was found, the affinity estimate was very low (−16 kcal/mol).RSK showed even lower values (−14 kcal/mol); notably, this was consistent between runs and its tendency to remain inversely placed within the pocket.
The case for FXY is a rather interesting one as an unbinding event occurred and it remained tightly bound to the engagement niche surface.Given the previous results, we cannot provide an explanation for this discrepancy.Indeed, this case presses for further study while also highlighting the need for additional experimental information.TRP was correctly classified as a nonbinder, even when a quasicanonical arrangement was found, the affinity estimate was very low (−16 kcal/mol).RSK showed even lower values (−14 kcal/mol); notably, this was consistent between runs and its tendency to remain inversely placed within the pocket.
The case for FXY is a rather interesting one as an unbinding event occurred and it remained tightly bound to the engagement niche surface.Given the previous results, we cannot provide an explanation for this discrepancy.Indeed, this case presses for further study while also highlighting the need for additional experimental information.

Conclusions
Herbicide development is an endeavor of high interest from both an economic and chemical standpoint.Currently, there are worldwide efforts to replace commonly used agents such as glyphosate or 2,4-D.Herein, we presented a computational characterization of TIR1, one of the molecular targets associated with auxin pathways.With the aid of structure-based methods, we analyzed a series of compounds in silico; we found that water plays an important role in binding and were able to characterize significant proteinligand interactions.
Using de novo design, we proposed a library of putative binders and molecular decoys.Chemical space showed a good superposition between them, a trait further supported by machine learning results.Herein, nonlinear methods were more consistent and performed reasonably while conveying enough information on true binders.Our results also showed that current data make correct classification of false negatives difficult.This behavior was recurrent in this and other studies.
Hence, we conducted a structure-based characterization of the TIR1 pocket to determine major traits that could provide a robust protocol for true binder assessment.We found that the inclusion of water-mediated interactions gave a more robust description of local minima and metastability.Also, we found a persistent water bridge between ligands and R436, which, to the best of our knowledge, has not been described as significant.
In addition, our results suggest that the TIR1 structures best suited for molecular modeling are 2P1Q and 2P1N as these are able to retain subtle solvent effects, even under conventional protocols.
We also probed the engagement niche, confirming its role in molecular recognition and the overall plasticity of the auxin-binding pocket.Changes in ligand polarity make a significant impact on pocket dynamics and the orientation of hydrophobic residues involved in stacking interactions.

Conclusions
Herbicide development is an endeavor of high interest from both an economic and chemical standpoint.Currently, there are worldwide efforts to replace commonly used agents such as glyphosate or 2,4-D.Herein, we presented a computational characterization of TIR1, one of the molecular targets associated with auxin pathways.With the aid of structure-based methods, we analyzed a series of compounds in silico; we found that water plays an important role in binding and were able to characterize significant proteinligand interactions.
Using de novo design, we proposed a library of putative binders and molecular decoys.Chemical space showed a good superposition between them, a trait further supported by machine learning results.Herein, nonlinear methods were more consistent and performed reasonably while conveying enough information on true binders.Our results also showed that current data make correct classification of false negatives difficult.This behavior was recurrent in this and other studies.
Hence, we conducted a structure-based characterization of the TIR1 pocket to determine major traits that could provide a robust protocol for true binder assessment.We found that the inclusion of water-mediated interactions gave a more robust description of local minima and metastability.Also, we found a persistent water bridge between ligands and R436, which, to the best of our knowledge, has not been described as significant.
In addition, our results suggest that the TIR1 structures best suited for molecular modeling are 2P1Q and 2P1N as these are able to retain subtle solvent effects, even under conventional protocols.
We also probed the engagement niche, confirming its role in molecular recognition and the overall plasticity of the auxin-binding pocket.Changes in ligand polarity make

Scheme 1 .
Scheme 1.Chemical structures of the selected probes for hotspot identification.

Figure 1 .
Figure 1.Chemical spaces of the generated compounds using descriptors selected with Boruta.

Figure 1 .
Figure 1.Chemical spaces of the generated compounds using descriptors selected with Boruta.

Figure 2 .
Figure 2. Performance assessment of the trained classifiers: using only physicochemical descriptors (top); using physicochemical and structural descriptors (bottom).

Figure 3 .
Figure 3. Probe occupancy surfaces recovered by molecular dynamics with mixed solvents using PBDIDs 2P1M (a), 2P1P (b), and 2P1N (c).Pockets with top-three scores are shown in green, blue, and yellow, respectively.

Figure 4 .
Figure 4. Comparison of protein-ligand interactions for the auxin/TIR1 complexes (PDBIDs 2P1Q, 2P1O, and 2P1N).Whenever the interaction fraction goes beyond 1, it is indicative of more than one contact with said residue.Top row shows results from simulations preserving crystal water molecules, bottom row shows results discarding said waters.

Figure 5 .
Figure 5. Transition networks built from protein-ligand interactions.Top row shows results from simulations preserving crystal water molecules; bottom row shows results discarding these waters

Figure 5 .
Figure 5. Transition networks built from protein-ligand interactions.Top row shows results from simulations preserving crystal water molecules; bottom row shows results discarding these waters.

Computation 2024 , 26 Figure 6 .
Figure 6.Influence of pocket solvation in TIR1.Interaction fractions and transition networks are compared between IAA (a,b) and TRP (c,d).Simulation protocols under conventional MD (left) and GCMC/MD (right) are also compared.

Figure 6 .
Figure 6.Influence of pocket solvation in TIR1.Interaction fractions and transition networks are compared between IAA (a,b) and TRP (c,d).Simulation protocols under conventional MD (left) and GCMC/MD (right) are also compared.

Figure 9 .
Figure 9. Free energy surfaces computed from well-tempered metadynamics for the DCF-TIR1 complexes with ligand positioned within the binding site (top) and positioned within the engagement niche (bottom).

Figure 9 .
Figure 9. Free energy surfaces computed from well-tempered metadynamics for the DCF-TIR1 complexes with ligand positioned within the binding site (top) and positioned within the engagement niche (bottom).

Figure 10 .
Figure 10.Representative DCF conformations found at basins from metadynamics runs with ligands positioned within the binding site (a) and positioned within the engagement niche (b).

Figure 10 .
Figure 10.Representative DCF conformations found at basins from metadynamics runs with ligands positioned within the binding site (a) and positioned within the engagement niche (b).

Figure 11 .
Figure 11.Free energy surfaces computed from well-tempered metadynamics for the QCL-TIR1 complexes with ligand positioned within the binding site (top) and positioned within the engagement niche (bottom).

Figure 11 .
Figure 11.Free energy surfaces computed from well-tempered metadynamics for the QCL-TIR1 complexes with ligand positioned within the binding site (top) and positioned within the engagement niche (bottom).

Figure 12 .
Figure 12.Representative QCL conformations found at basins from metadynamics runs with ligands positioned within the binding site (a) and positioned within the engagement niche (b).

Figure 12 .
Figure 12.Representative QCL conformations found at basins from metadynamics runs with ligands positioned within the binding site (a) and positioned within the engagement niche (b).

Table 1 .
Root mean squared deviation values from the structural alignment of TIR1 structures.Binding site residues with above-average deviations are shown.