A Structure- and Ligand-Based Virtual Screening of a Database of “Small” Marine Natural Products for the Identification of “Blue” Sigma-2 Receptor Ligands

Sigma receptors are a fascinating receptor protein class whose ligands are actually under clinical evaluation for the modulation of opioid analgesia and their use as positron emission tomography radiotracers. In particular, peculiar biological and therapeutic functions are associated with the sigma-2 (σ2) receptor. The σ2 receptor ligands determine tumor cell death through apoptotic and non-apoptotic pathways, and the overexpression of σ2 receptors in several tumor cell lines has been well documented, with significantly higher levels in proliferating tumor cells compared to quiescent ones. This acknowledged feature has found practical application in the development of cancer cell tracers and for ligand-targeting therapy. In this context, the development of new ligands that target the σ2 receptors is beneficial for those diseases in which this protein is involved. In this paper, we conducted a search of new potential σ2 receptor ligands among a database of 1517 “small” marine natural products constructed by the union of the Seaweed Metabolite and the Chemical Entities of Biological Interest (ChEBI) Databases. The structures were passed through two filters that were constituted by our developed two-dimensional (2D) and three-dimensional Quantitative Structure-Activity Relationship (3D-QSAR) statistical models, and successively docked upon a σ2 receptor homology model that we built according to the FASTA sequence of the σ2/TMEM97 (SGMR2_HUMAN) receptor.


Introduction
Sigma (σ) receptors include a particular pharmacologically defined family of membrane-bound receptors that bind compounds belonging to a variety of structural classes. Discovered in 1976 and recognized in two distinct subtypes in the early 1990s, they represent a potential target for the diagnosis and therapy of cancer and central nervous system (CNS) diseases [1,2]. Indeed, both σ receptor subtypes are highly expressed in several tissues and distinguished from each other through their affinity for different ligands and biological profiles.
Since structure and ligand-based computer-aided drug design are nowadays effective and useful tools in rational drug design [34][35][36][37][38][39], we used them to allow the identification of new virtually potent and selective molecules that are able to interact with the σ 2 receptor. Therefore, herein, we report an investigation of new potentially σ 2 /TMEM97 receptor ligands among a database of 1517 "small" marine natural products, here named the Blue DataBase (BDB, Table S1), which was composed by the merging of the Seaweed Metabolite (http://www.swmd.co.in/) [40], the Chemical Entities of Biological Interest (ChEBI, http://www.ebi.ac.uk/chebi/) [41] databases, and from the reference [42]. The chemical structures were passed through two filters constituted by our developed 2D and 3D-QSAR statistical models that showed high statistical quality and robust predictive potential capability, and successively docked upon the σ 2 /TMEM97 receptor homology model. To the best of our knowledge, this is the first report on the build of a 3D structure of the σ 2 /TMEM97, by mixing the classic homology modeling approach with the evolutionary coupling analysis. Moreover, the robustness of this model has been confirmed by docking on it 200 σ 2 -ligands that were randomly selected from the S2RSLDB. Finally, the results of the two filters and docking have been merged, ordering them by the mean of the obtained pK i (−log 10 K i ) to draw up a list of the 15 best candidates (mean of 2D, 3D-QSAR, and docking pK i results) as possible powerful σ 2 receptor marine ligands. Four of them are already known in the literature for their antiproliferative and cytotoxic effects against A549 and HT29 cancer cell lines, which are two typical cancer cell lines characterized by σ 2 receptor overexpression. In particular, compound 848 resembles progesterone, which in itself is a potent σ 2 receptor ligand.

2D Ligand-Based Filter
The BDB was filtered through an in-house hybrid σ 2 receptor affinity filter [30], returning for each chemical entity a predicted σ 2 receptor pK i . This 2D-QSAR model [30,31] has been built by using a Monte Carlo-based QSAR analysis employing the software CORAL (version 2016, http://www.insilico.eu/coral/index.html) [43,44]. CORAL allows for a hybrid representation of molecular structures that includes a simplified molecular input line entry system (SMILES) and a molecular graph. Hybrid representation using SMILES together with the molecular graph may give better models with higher statistical quality with respect to those models, and with a unique representation of the molecular structure [45,46]. The here-used 2D-QSAR model, being constructed with a large and structurally diverse set of 548 compounds, allows for a prediction of different populations of chemical compounds endpoints (σ 2 receptor pK i ).
The chemical structures of the 1517 "blue" compounds were transformed into SMILES and converted into freebase, while those compounds presenting a carboxylate were left as is. Eventually, the aromatic rings were converted into the kekulé forms. These conversions have been performed in order to generate SMILES with the same depiction as those used for building the hybrid σ 2 receptor affinity filter, since this model works by chopping the SMILES or the molecular graphs into small fragments, and iteratively overmatching them with the SMILES and molecular graphs fragments that compose the training set [30,31].
Among 1517 compounds, 1313 have been defined by the model to be outliers, which means that their chemical structures were expressed as SMILES or a molecular graph that was not described by the model, entirely or for a significant part. Indeed, for these compounds, the type of SMILES or molecular graphs fragments generated do not overmatch, entirely or for a significant part, with the SMILES or molecular graphs fragments that compose the training set. For these molecules, the software still predicts an endpoint, but it highlights them for not falling in the field of applicability, and thus being outliers. The remaining 204 "blue" compounds were returned with a predicted endpoint, and were indicated as falling within the domain of applicability. From this subset, 42 compounds have been predicted to possess a σ 2 receptor, K i , which was higher or equal to 100 nM (pK i ≥ 7), and is empirically considered an appropriate value for processing a compound into more complex phenotypic assays. These compounds are reported in Table S2.

3D Ligand-Based Filter
The same dataset of compounds was also evaluated using a second ligand-based filter. All of the 3D structures of the compounds were aligned to our previously published 3D-QSAR model for the σ 2 receptor. The alignment of the molecules in the 3D-QSAR pharmacophore model was performed with Forge (v10.4.2, Cresset, New Cambridge House, Hertfordshire, UK) [47]. Once aligned, the compounds were scored assuming that if the fields (the local extrema of the electrostatic, van der Waals, and hydrophobic potentials of the molecule) of the newly evaluated molecules were very similar to those of the original compounds, the resulting compounds will have similar biological properties [47]. The 15 most potent compounds resulted from the 3D-ligand based filter are reported in Table 1, while the full set of compounds is present in in an excellent distance to the model (i.e., description by the model), which means that all or most of the features in the molecules were present in the training set of the 3D-QSAR model, and hence that the predicted activity is reliable. In the full set of evaluated compounds, there are also several compounds that are not well described by the QSAR equation; the external user should pay attention to the "3D Applicability" column in the supplementary material (Table S2). Values of "Excellent" or "OK" indicate the predictive reliability. Worse values (i.e., "Bad" or "Poor") indicate that the molecule has field points in places that are not described by the equation, resulting in unreliable predicted activities.

Homology Model and Molecular Docking
To further reinforce the results obtained through the 2D and 3D filters, we decided to add a third filter based on molecular docking. Exploiting the recent identification of the σ 2 receptor as the TMEM97, we built its 3D molecular structure by the homology modeling approach, starting from the Q5BJF2 (SGMR2_HUMAN) sequence deposited in the UniProt Knowledgebase (https://www.uniprot.org/ uniprot/). To pursue this challenging task, due to the scarcity of crystallized structures possessing the sufficient sequence identity (>30%), we have chosen to employ an approach based on two strategies used in parallel. Indeed, this task has been developed by mixing the classical first three steps (i. finding the homologous template proteins of the known structure, ii. selecting the best template or set of templates, and iii. optimizing the multiple sequence alignment between the query and template protein sequences) [48] with the evolutionary coupling analysis [49].
Eventually, we performed the typical fourth step, which consisted of the build of the homology model for the query sequence that resembles the structures of the templates as closely as possible, accommodating for the deletions and insertions of query residues with respect to the template structures, in order to obtain a series of hybrid models that have been ordered by their overall z-score [50]. The obtained 3D model was docked with PB-28, which is a known σ 2 receptor ligand (exp. K i =2.0 nM [51]); the best-obtained complex was then immersed in a simulated endoplasmic reticulum membrane, in physiological environment conditions, and subjected to a molecular dynamics (MD) simulation of 10 ns to accommodate the ligand, verified by the Root-mean-square deviation (RMSD) of the ligand. After the minimization of the frame with the best binding energy (belonging to the last 3 ns of MD simulation, where the ligand RMSD is constant), the re-docking of the ligand gave a calculated K i of 1.5 nM, which was in excellent agreement with the experimental value. To additionally validate the 3D model, we docked 200 σ 2 receptor ligands that were randomly selected from the S2RSLDB among those that possess a K i in the range 0.01-1000 nM. The results (Table S3)  in Figure 1 as a plot of the experimental versus calculated K i , and highlight a very good prediction power with an R 2 of 0.91. membrane, in physiological environment conditions, and subjected to a molecular dynamics (MD) simulation of 10 ns to accommodate the ligand, verified by the Root-mean-square deviation (RMSD) of the ligand. After the minimization of the frame with the best binding energy (belonging to the last 3 ns of MD simulation, where the ligand RMSD is constant), the re-docking of the ligand gave a calculated Ki of 1.5 nM, which was in excellent agreement with the experimental value. To additionally validate the 3D model, we docked 200 σ2 receptor ligands that were randomly selected from the S2RSLDB among those that possess a Ki in the range 0.01-1000 nM. The results (Table S3) are reported in Figure 1 as a plot of the experimental versus calculated Ki, and highlight a very good prediction power with an R 2 of 0.91. On this model, we docked the best-predicted σ2 receptor ligands returned from the two 2D and 3D QSAR filters (524 compounds, being 13 in common). The 15 most potent compounds resulted from the docking analysis are reported in Table 2, while the full set of compounds is reported in Table S2. On this model, we docked the best-predicted σ 2 receptor ligands returned from the two 2D and 3D QSAR filters (524 compounds, being 13 in common). The 15 most potent compounds resulted from the docking analysis are reported in Table 2, while the full set of compounds is reported in Table S2.   The 3D structures of the complex with the two best-docked molecules have been represented in Figure 2. Both molecules reside in the pocket constituted by the two ASP29 and ASP56 residues; in particular, compound 1169 shows two hydrogen bonds with ASP29. The 3D structures of the complex with the two best-docked molecules have been represented in Figure 2. Both molecules reside in the pocket constituted by the two ASP29 and ASP56 residues; in particular, compound 1169 shows two hydrogen bonds with ASP29. The 3D structures of the complex with the two best-docked molecules have been represented in Figure 2. Both molecules reside in the pocket constituted by the two ASP29 and ASP56 residues; in particular, compound 1169 shows two hydrogen bonds with ASP29.

529
8.95 The 3D structures of the complex with the two best-docked molecules have been represented in Figure 2. Both molecules reside in the pocket constituted by the two ASP29 and ASP56 residues; in particular, compound 1169 shows two hydrogen bonds with ASP29.

The Ultimate Filter
Finally, to account for all of the typology of obtained results, we sorted the BDB library according to the mean of the predicted pKi by 2D-QSAR, 3D-QSAR, and docking methodologies. The structures of the best 15 compounds have been reported in Table 3.

The Ultimate Filter
Finally, to account for all of the typology of obtained results, we sorted the BDB library according to the mean of the predicted pK i by 2D-QSAR, 3D-QSAR, and docking methodologies. The structures of the best 15 compounds have been reported in Table 3. Table 3. Structure, calculated mean pK i , and corresponding K i (nM) values of the 15 most potent marine products resulted from the mean of the three combined filters.

BDB ID
Structure Mean pK i Mean K i 1169 529 8.95 The 3D structures of the complex with the two best-docked molecules have been represented in Figure 2. Both molecules reside in the pocket constituted by the two ASP29 and ASP56 residues; in particular, compound 1169 shows two hydrogen bonds with ASP29.

The Ultimate Filter
Finally, to account for all of the typology of obtained results, we sorted the BDB library according to the mean of the predicted pKi by 2D-QSAR, 3D-QSAR, and docking methodologies. The structures of the best 15 compounds have been reported in Table 3.  An accurate literature search for the 15 most promising hits highlighted that four of them (848, 984, 1169, and 1172) were reported to show clearly antiproliferative and cytotoxic effects against typical cancer cell lines such as A549 and HT29 (Table 4) [52][53][54][55][56][57][58]. These biological data were in agreement with specific σ 2 receptors overexpression that was related to the same cell lines [59][60][61][62]. In particular, compound 848 possess a steroidal structure resembling that of progesterone; this compound has been demonstrated to act as a potent σ 2 receptor ligand with a K i of 441 nM [63].
To further validate our model and the goodness of the predicted data, we docked progesterone, obtaining a calculated K i of 749 nM that became 369 nM after allowing a best accommodation by 100 ps step-10 annealing and 100 ps steepest descent minimization, followed by a local re-docking, which is in good agreement with the experimental K i .

Dataset of Marine Compounds
The chemical structures of the marine dataset were retrieved from the Seaweed Metabolite (http://www.swmd.co.in/), the Chemical Entities of Biological Interest (http://www.ebi.ac.uk/ chebi/) databases, and from reference [42]. All of the molecules were manually checked, and the duplicates were removed to achieve a final number of 1517 compounds. The full list of molecules is available as SMILES for external users in the supplementary material (Table S1). In Figure 3, the workflow that was used has been schematized.

Mar. Drugs 2018, 16, x 12 of 18
An accurate literature search for the 15 most promising hits highlighted that four of them (848, 984, 1169, and 1172) were reported to show clearly antiproliferative and cytotoxic effects against typical cancer cell lines such as A549 and HT29 (Table 4) [52][53][54][55][56][57][58]. These biological data were in agreement with specific σ2 receptors overexpression that was related to the same cell lines [59][60][61][62]. In particular, compound 848 possess a steroidal structure resembling that of progesterone; this compound has been demonstrated to act as a potent σ2 receptor ligand with a Ki of 441 nM [63]. To further validate our model and the goodness of the predicted data, we docked progesterone, obtaining a calculated Ki of 749 nM that became 369 nM after allowing a best accommodation by 100 ps step-10 annealing and 100 ps steepest descent minimization, followed by a local re-docking, which is in good agreement with the experimental Ki.

Dataset of Marine Compounds
The chemical structures of the marine dataset were retrieved from the Seaweed Metabolite (http://www.swmd.co.in/), the Chemical Entities of Biological Interest (http://www.ebi.ac.uk/chebi/) databases, and from reference [42]. All of the molecules were manually checked, and the duplicates were removed to achieve a final number of 1517 compounds. The full list of molecules is available as SMILES for external users in the supplementary material (Table S1). In Figure 3, the workflow that was used has been schematized.

Structures 2D to 3D Building and Minimization
The structures of the marine products were built using Marvin Sketch (v. 18.24, ChemAxon Ltd, Budapest, Hungary) [64]. The 2D structures were subjected to molecular mechanics energy minimization by Merck molecular force field (MMFF94) using the Marvin Sketch geometrical descriptors plugin [64]. The protonation states of the molecules were calculated assuming a pH of 7.0. Before the alignment for the 3D-QSAR filter, the geometry of the obtained 3D structures was further optimized at the semi-empirical level using the parameterized model

Structures 2D to 3D Building and Minimization
The structures of the marine products were built using Marvin Sketch (v. 18.24, ChemAxon Ltd., Budapest, Hungary) [64]. The 2D structures were subjected to molecular mechanics energy minimization by Merck molecular force field (MMFF94) using the Marvin Sketch geometrical descriptors plugin [64]. The protonation states of the molecules were calculated assuming a pH of 7.0. Before the alignment for the 3D-QSAR filter, the geometry of the obtained 3D structures was further optimized at the semi-empirical level using the parameterized model number 3 (PM3) Hamiltonian as implemented in the MOPAC package (MOPAC2016 v. 18.151, Stewart Computational Chemistry, Colorado Springs, CO, USA) [65][66][67].

2D-QSAR
The software CORAL (CORrelation And Logic, version 2016, Istituto di Ricerche Farmacologiche Mario Negri, Milano, Italy) was used for building the 2D-QSAR model using 548 σ 2 receptor selective ligands over the σ 1 receptor, as previously reported [33,34]. The unique SMILES composing the blue dataset were converted in order to have a SMILES depiction that was equal to that used for generating the model (vide supra). To each SMILES, a random endpoint value was associated in order for the software to compare this value with the predicted one. The following regression was used for predicting the endpoints: pKi σ2 = 3.5937472(± 0.0139734) + 0.0352642(± 0.0001213) * DCW(0, 16) where DCW is defined as the "descriptor of correlation weights". The regression for the σ 2 receptor pK i has been developed in a previously published 2D regression model [30].

Compound Alignment for the 3D-Ligand Based Filter
All of the optimized structures were imported into the software Forge (v10.4.2, Cresset, New Cambridge House, Hertfordshire, UK) for the evaluation of the dataset in the field-based 3D-QSAR model that was previously published [32]. All of the molecules were aligned with the training set of the 3D-QSAR model. The negative, positive, shape, and hydrophobic field points of each molecule were generated using the extended electron distribution (XED) force field in Forge. The molecules were then aligned with the training set of the 3D-QSAR model by a maximum common substructure algorithm using a customized set-up. The software's parameters that were used for the conformation hunt and alignment are presented in the supplementary material ( Figures S1 and  S2). The maximum number of conformations that was generated for each molecule was set to 500, as suggested by the software. The root mean square deviation of atomic positions' cutoff for duplicate conformers was set to 0.5 Å (the similarity threshold below which two conformers are assumed to be identical). The gradient cutoff for conformer minimization was set to 0.1 kcal/mol. The energy window was set to 2.5 kcal/mol. Conformers with a minimized energy outside the energy window were discarded.

Homology Model and Docking
All of the simulations and molecular modeling experiments have been conducted with YASARA software (v. 18.4.24, YASARA Biosciences GmbH, Vienna, Austria). The homology model was built starting from the Q5BJF2 (SGMR2_HUMAN) sequence deposited in the UniProt Knowledgebase (https://www.uniprot.org/uniprot/) and using the crystallographic structures corresponding to the following PDB IDs as templates: 4LGC, 1VT4, 4M58, 2PFF, 2MGY, AND 1T33. To these structures have been added the best two structures obtained by the evolutionary coupling analysis, which were executed with the EVfold web-server (http://evfold.org/evfold-web/newprediction.do), and the ensemble has been processed with the hm_build macro of YASARA. In the end, an optimized hybrid model was built through iteratively replacing bad regions in the top scoring model with the corresponding fragments from the other models.
This model was docked with the σ 2 receptor ligand PB-28 (see below for details), and the best pose ligand/receptor complex structure was then immersed in a simulated endoplasmic reticulum membrane [68], in physiological environment conditions, and subjected to a molecular dynamics (MD) simulation of 10 ns to accommodate the ligand. The simulation was set up automatically by first scanning the protein for exposed transmembrane helices [i.e., helices longer than 16 residues, with more than seven hydrophobic residues and more than three exposed ones (accessible side-chain surface area >30% of maximum)]. The major axis vectors of these helices (i.e., the direction vectors of the least-squares lines through the C alpha atoms) were summed up to obtain the major axis of the protein, which was then oriented along the Y-axis, normally with respect to the plane of the membrane and the X-Z plane. The best shift of the membrane along this major axis was obtained by scanning the protein for the region with the largest number of exposed hydrophobic residues (see definition above) and a width of 28 Å (corresponding to the membrane core).
Having placed an equilibrated membrane structure (consisting of 55% of phosphatidylcholine, 30% of phosphatidylethanolamine, 10% of phosphatidylserine, and 5% of phosphatidylinositol molecules [68]) at this location named 'MemCenterY', the system was enclosed in a simulation cell of size [X*Y*Z] Å, the protein was temporarily scaled by 0.9 along the X-Z axes, and then, strongly clashing membrane lipids were deleted (lipids with an atom closer than 0.75 Å to a protein atom).
The temporary protein scaling, which was needed to avoid the deletion of too many lipids around the protein, was then slowly removed during a short simulation at 298 K in vacuum. The protein (with all of the atoms kept fixed) was scaled by 1.02 along the X-Z axes every 200 femtoseconds, while the membrane was allowed to move, but was restrained to ideal geometry (by pulling lipid residues with an atom further than 21.5 Å away from MemCenterY back into the membrane, and by pushing phosphorus atoms closer than 14 Å to MemCenterY back outwards). The force field was AMBER14, with Lipid17/GAFF2/AM1BCC parameters for non-standard residues. As soon as the protein had reached its original size again, the protein side-chain pK a s were predicted, protonation states were assigned according to pH 7.4, and the simulation cell was filled with water, 0.9% NaCl, and counter ions (proteins 57, 678-683). The main simulation was then run with PME, and an 8.0 Å cutoff for non-bonded real space forces, a four fs time-step, constrained hydrogen atoms, and at constant pressure and temperature (NPT ensemble), as described in detail previously. During the initial 250 picoseconds, the membrane was restrained to avoid distortions while the simulation cell adapted to the pressure exerted by the membrane (see above, additionally water molecules that got closer than 14 Å to MemCenterY were pushed outside). The source code of this simulation protocol and visualizations of the individual steps can be found at www.yasara.org/membranemd. Docking experiments were effectuated, employing the AutoDock (4.2.5.1, The Scripps Research Institute, San Diego, California Jupiter, Florida, US) software implemented in YASARA. The maps were generated by the program AutoGrid (4.2.5.1, The Scripps Research Institute, San Diego, California Jupiter, Florida, US) with a spacing of 0.375 Å, and dimensions that encompass all of the atoms extending 5 Å from the surface of the PM3 minimized structure of PB-28. All of the parameters were inserted at their default settings. In the docking tab, the macromolecule and ligand are selected, and Lamarckian Genetic Algorithm (LGA) parameters are set as ga_runs = 100, ga_pop_size = 150, ga_num_evals = 20,000,000, ga_num_generations = 27,000, ga_elitism = 1, ga_mutation_rate = 0.02, ga_crossover_rate = 0.8, ga_crossover_mode = two points, ga_cauchy_alpha = 0.0, ga_cauchy_beta = 1.0, number of generations for picking worst individual = 10.

Conclusions
In this study, we describe the screening of new potentially σ 2 /TMEM97 receptor ligands that were reported in a database of 1517 "small" marine natural products, and formulated by the union of the Seaweed Metabolite (http://www.swmd.co.in/) and the Chemical Entities of Biological Interest (ChEBI, http://www.ebi.ac.uk/chebi/). The structures were selected by our developed 2D and 3D-QSAR statistical models, and successively verified by a robust σ 2 /TMEM97 receptor homology model appropriately built by us. This work provided 15 best candidates as powerful σ 2 receptor marine ligands; four of them are clearly reported in the literature as antiproliferative and cytotoxic compounds against typical cancer cell lines such as A549 and HT29, and are in agreement with the specific σ 2 receptor overexpression that is related to the same cell lines. In particular, compound 848 resembles progesterone, which in itself is a potent σ 2 receptor ligand. These findings will ensure prospectively advantageous applications to speed up the design and identification of new natural hit compounds as potent and selective σ 2 receptor ligands. In vitro and biological screenings of the most promising compounds are in due course.
Supplementary Materials: The following are available online at http://www.mdpi.com/1660-3397/16/10/384/ s1, Figure S1: Forge's parameters used for the conformation hunt, Figure S2: Forge's parameters used for the alignment, Figure S3: Homology model of the σ 2 -receptor immersed in the endoplasmic reticulum membrane, Table S1: Dataset of marine products, Table S2: Complete results of the three filters by color code, Table S3: Experimental and calculated K i by docking.