Virtual Screening of Different Subclasses of Lignans with Anticancer Potential and Based on Genetic Profile

Cancer is a multifactorial disease that continues to increase. Lignans are known to be important anticancer agents. However, due to the structural diversity of lignans, it is difficult to associate anticancer activity with a particular subclass. Therefore, the present study sought to evaluate the association of lignan subclasses with antitumor activity, considering the genetic profile of the variants of the selected targets. To do so, predictive models were built against the targets tyrosine-protein kinase ABL (ABL), epidermal growth factor receptor erbB1 (EGFR), histone deacetylase (HDAC), serine/threonine-protein kinase mTOR (mTOR) and poly [ADP-ribose] polymerase-1 (PARP1). Then, single nucleotide polymorphisms were mapped, target mutations were designed, and molecular docking was performed with the lignans with the best predicted biological activity. The results showed more anticancer activity in the dibenzocyclooctadiene, furofuran and aryltetralin subclasses. The lignans with the best predictive values of biological activity showed varying binding energy results in the presence of certain genetic variants.


Introduction
The incidence of many types of cancer continues to rise in the global population, despite many successes in screening, prevention and treatment [1]. Several factors contribute to the development and progression of cancer, such as resistance to various drugs and mutations in important genes [2]. The sum of changes in oncogenes, tumor suppressor genes, repair mechanisms and epigenetic changes has led to cancer development [3]. It is impossible to infer how many independent events are required to produce all the changes that result in most human cancers [4]. Therefore, continuous discovery of new therapeutic alternatives is necessary.
In addition, 40% of the interindividual differences are responsible for the varying response to treatment of patients with the same type of cancer. Genetic factors such as single nucleotide polymorphisms (SNPs) are reported as important variants that may affect treatment [5,6]. Furthermore, natural human genetic variation can cause individuals to respond differently to the same drug [7]. Thus, it is also necessary to identify new therapeutic alternatives based on the genetic profile of the patient.
Several studies have focused on enzymes as important therapeutic targets for various types of tumors. Tyrosine kinases belonging to the ABL family act in coordinating and remodeling actin in response to stimuli mediated by tyrosine phosphorylation of actin cytoskeletal remodeling proteins and by the adhesion and aggregation domain of carboxyterminal filamentous actin ABL [8]. The ABL family performs a variety of activities which are vital to cell maintenance, such as division, adhesion to membranes, cellular skeletal remodeling and DNA damage repair [9]. However, mutations or abnormalities in tyrosine kinase activity can result in an uninterrupted disruption of the highly active state that can lead to cancer development and progression [10].
The epidermal growth factor receptor (EGFR) acts as a transcription factor that plays a regulatory role in the expression of many genes important for inflammation [11]. EGFR belongs to the ErbR family, which can irregularly activate epithelial tumors [11]. Histone deacetylases (HDACs) are key enzymes that have a regulatory function of gene expression related to controlling cell cycle advancement and apoptosis, boosting tumorigenesis and favoring the evolution of cancer [12][13][14]. mTOR threonine-protein kinase (mTOR) is an enzyme formed by two structural complexes, namely, mammalian rapamycin complex 1 (mammalian target of rapamycin complex 1) (mTORC1) and mammalian target of rapamycin complex 2 (mammalian target of rapamycin complex 2) (mTORC2). mTORC1 performs the function of regulating cell growth and metabolism, while mTORC2 controls cell proliferation and survival [15][16][17]. mTOR plays a functional role in tumor formation and is widely used in targeted therapy research for tumors and other diseases [18].
The polyadenosine diphosphate ribose polymerase (PARP) superfamily is made up of proteins and enzymes that are responsible for regulating the identification and repair process of deoxyribonucleic acid molecules through the BER pathway [19]. PARP1 is the best known and acts as a catalyst for ADP-ribose units of the NAD+ substrate in nuclear proteins. This process is performed as a post-translational modification necessary for activating the response to DNA damage generated by ionizing radiation, alkylating agents and/or free radicals [20]. PARP inhibitors destabilize replication forks through entrapment of PARP DNA and induce cell death through replication stress-induced mitotic catastrophe [21].
Targeted therapy is usually the initial treatment for a cancer patient. More than 60% of antitumor drugs are closely related to natural products [10]. Lignans are natural products made up of phenylpropanoid dimers and have a wide variety of biological activities. Many studies have reported lignans as potent anticancer agents [22][23][24]. However, as there are 10 extremely diverse subclasses of lignans, it is difficult to correlate anticancer activity with chemical structure. In addition, genetic variations cause different responses to treatment, requiring personalized treatment per patient. Therefore, the present study aims to select and evaluate the association of lignan subclasses with antitumor activity, considering the genetic profile of important variants.

Quantitative Modeling of the Structure-Activity Relationship (QSAR)
The external performances of the selected models were analyzed for sensitivity (true positive rate or active rate), specificity (true negative rate or inactive rate) and accuracy (overall predictability). These parameters are the most used to ensure high predictability of the model. Other parameters such as ROC curve results and MCC analyzes revealed excellent results. The models obtained ROC curves greater than 0.89 during cross-validation, and MCC values were also greater than 0.63 during cross-validation, revealing a model with excellent classification, performance and robustness (Table 1, Figure S1). Thus, the lignan bank was screened using the created models with excellent performance to select potentially active compounds against the selected proteins. The RF model was able to select a lignan with potential activity against the EFGR receptor with a probability of activity of 54%. Next, 86 lignans were considered active for the HDAC enzyme, with activity probabilities ranging from 50 to 63%. Then, 155 active compounds with activity ranging from 50 to 58% were selected for the mTOR protein.
The model selected 156 compounds for the PARP enzyme with activity ranging from 50 to 56%. No lignans were active for the ABL enzyme. We noticed that the only active compound against the EFGR enzyme was colocasinol A (390). Table 2 shows details of these results by subclasses. The structure, subclass and compounds with the highest predicted biological activity values in the QSAR modeling for each enzyme can be seen in Figure 1. We observed that although the HDAC model selected fewer active compounds when compared to the mTOR and PARP models, the HDAC model was able to select compounds with greater potential for biological activity. Moreover, active compounds were seen for all subclasses for the mTOR protein. The subclass with the most active compounds for PARP was dibenzocyclooctadiene. Furthermore, we also noticed that the subclasses with more active compounds were dibenzocyclooctadiene, furofuran and aryltetralin for all proteins. Figure 2 represents the distribution of active and inactive compounds for each analyzed protein, except for ABL, which did not present an active compound, and for EFGR, which only obtained one active compound.

Mapping Single Nucleotide Polymorphisms (SNPs)
Genomic analysis mapped nonsynonymous SNPs to regions of catalytic domains and the active site. Table 3 shows the clinically relevant SNPs that were selected for molecular docking studies. Figure 3 shows the regions of each protein that were analyzed. We identified 16 SNPs for the EGFR receptor, 28 SNPs for the HDAC8 protein, 40 SNPs for mTOR1 and 113 SNPs for the PARP1 protein. The choice of SNPs to design polymorphic mutations and undergo molecular docking was based on clinical relevance. Therefore, we considered the SNPS that presented some study related to an altered phenotype or disease. In addition, we selected SNPs with allele frequencies of the polymorphic allele greater than expected for certain populations. Therefore, we could select 4 SNPs for the EGFR receptor, 11 for the HDAC8 protein, 11 for mTOR1 and 2 for PARP1 for these analyses. Although the results show few SNPs with clinical relevance, most show a high probability that the polymorphic variant will affect protein function if the mutation occurs. This result was provided by Polyphen, a tool that predicts variants likely to affect protein function based on physical and comparative considerations.

Molecular Docking
The lignans with the highest values of predicted biological activity for EFGR, HDAC8, mTOR and PARP1 proteins were subjected to molecular docking. Docking was performed with the ancestral protein and the mutated proteins. Only one lignan (colocasinol A) was coupled to the EGFR receptor. The remaining lignans were renamed to better express the docking results in Table 4. For HDAC8, the lignans longipedlignan H, longipedlignan I and sinolignan A were renamed as lignan 1, lignan 2 and lignan 3, respectively. For mTOR, the lignans (1R,2R)-2-{4-[(1R,3aS,4R,6aS)-4-(4-hydroxy-3,5-dimethoxyphenyl)-hexahydrofuro[3,4-c] furan-1-yl]-2,6-dimethoxyphenoxy, bizanthplanispine B and bigraminol A were renamed as lignan 1, lignan 2 and lignan 3, respectively. Similar renaming was carried out with the lignans active against the PARP1 protein (longipedlignan H, longipedlignan I and longipedlignan J). The results were generated using the Moldockscore function. More negative values indicated better predictions. In this study, we only sought to analyze the difference in binding affinity according to the polymorphic variant. Thus, it is possible to verify the binding affinity contribution of a compound according to the genetic profile.

Molecular Docking
The lignans with the highest values of predicted biological activity for EFGR, HDAC8, mTOR and PARP1 proteins were subjected to molecular docking. Docking was performed with the ancestral protein and the mutated proteins. Only one lignan (colocasinol A) was coupled to the EGFR receptor. The remaining lignans were renamed to better express the docking results in Table 4. For HDAC8, the lignans longipedlignan H, longipedlignan I and sinolignan A were renamed as lignan 1, lignan 2 and lignan 3, respectively. For mTOR, the lignans (1R,2R)-2-{4-[(1R,3aS,4R,6aS)-4-(4-hydroxy-3,5-dimethoxyphenyl)-hexahydrofuro[3,4-c] furan-1-yl]-2,6-dimethoxyphenoxy, bizanthplanispine B and bigraminol A were renamed as lignan 1, lignan 2 and lignan 3, respectively. Similar renaming was carried out with the lignans active against the PARP1 protein (longipedlignan H, longipedlignan I and longipedlignan J). The results were generated using the Moldockscore function. More negative values indicated better predictions. In this study, we only sought to analyze the difference in binding affinity according to the polymorphic variant. Thus, it is possible to verify the binding affinity contribution of a compound according to the genetic profile.   The docking results can be seen in Table 4. According to the results, the colocasinol A lignan presented similar values, except for the V845L variant, which obtained a binding energy value of −34.46 kcal/mol. For the HDAC8 protein, we observed that lignan 1 (longipedlignan H) failed to interact with the enzyme, despite the QSAR modeling results. The other lignans presented varied results, with more negative values for the ancestor and some variants, such as S199T. More negative values may indicate better binding affinity probability, suggesting stronger protein inhibition. We were unable to design the mutation in two variants for mTOR due to the incomplete ancestral protein available in databases. The results for mTOR revealed quite varied energy values when compared with the results for the ancestral protein. Only bizanthplanispine B obtained higher binding energy values than the ancestor. The lowest binding energy value for this protein was −34.21 kcal/mol for the M2327I variant. However, the bizanthplanispine B lignan obtained a high binding energy value (−96.85 kcal/mol) for this variant. These results show that not only the type of mutation interferes with binding affinity, but also the structural variety of compounds. Lastly, the PARP1 protein showed similar values, except for longipedlignan H, which did not interact with the protein.
We took the opportunity to analyze the observed interactions between the selected lignans and the proteins with the ancestral allele (Figures 4-7). We observed that the EGFR receptor formed several hydrogen bonds, stabilizing the bonds with the Glu762, Glu796 and Asp855 amino acids, in addition to several hydrophobic interactions, including Leu718, Val726, Ala743, Met766 and Leu844. There was no interaction with the longipedlignan H lignan for the HDAC8 protein. However, interactions with the other lignans were observed, and we identified four common links/interactions between these compounds and the active site. The interactions were observed with Lys33, Tyr100, Phe152 and His180 residues. Moreover, three common interactions between lignans and the active site were observed for the mTOR protein.        The hydrophobic interactions formed were with the His2180, Glu2181 and Val2183 residues. The longipedlignan H lignan also did not interact with the active site for the PARP1 protein. Nevertheless, we analyzed the interactions with longipedlignan I and The hydrophobic interactions formed were with the His2180, Glu2181 and Val2183 residues. The longipedlignan H lignan also did not interact with the active site for the PARP1 protein. Nevertheless, we analyzed the interactions with longipedlignan I and longipedlignan J lignans and observed interactions between them and the active site, forming interactions with the His862, Tyr896, Ala898 and Tyr907 amino acids.

Prediction of Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) Properties
Many drugs used to treat cancer are toxic and have a variety of side effects. Therefore, in this study, we sought to verify the ADMET properties of lignans, with an emphasis on toxicity. The pharmacokinetic mechanisms that involve the steps of absorption, distribution, metabolism, excretion and toxicity (ADMET) when failures are found are considered the main causes of failure in developing drugs derived from natural or synthetic products. Therefore, virtual platforms such as pkCSM can be used to identify these possible problems through an in silico approach involving predictive models [25]. Thus, ADMET analyzes for the studied lignans were performed with the parameters shown in the tables below ( Table 5). The colocasinol A, longipedlignan H, longipedlignan I, longipedlignan J, sinolignan A, (1R,2R)-2-{4-[(1R,3aS,4R,6aS)-4-(4-hydroxy-3,5-dimethoxyphenyl)-hexahydrofuro[3,4c]furan-1-yl]-2,6-dimethoxyphenoxy, bizanthplanispine B and ligraminol lignans were renamed in these analyzes as L1, L2, L3, L4, L5 and L6, respectively.
Next, virtual models were used to evaluate the absorption parameters against Caco-2 cellular models, intestinal absorption and skin permeability. Caco-2 cell lines are composed of human epithelial colorectal adenocarcinoma cells widely used to predict oral drug absorption. In this predictive model, values above 0.90 are considered to demonstrate that the compounds have high permeation in these cells. Therefore, the lignans (L1, L2, L3 and L8) present satisfactory values. All lignans showed adequate values regarding the complementary result of intestinal absorption (the main drug absorption site), ranging from 89-100% absorption. Finally, all lignans showed (log Kp > −2.5) permeation values, suggesting low potential for transdermal administration [26].
It can be seen that lignans (L4, L5 and L6) are potential substrates with regard to possible interactions with P-glycoproteins (an important efflux pump that can compromise the bioavailability of xenobiotics). The modulation of P-glycoprotein-mediated transport has significant pharmacokinetic implications for the PgP substrate, which can be exploited for therapeutic advantages or result in contraindications in case there are interactions between drugs used concurrently with lignans [27].
Among the distribution parameters, it was possible to evaluate the theoretical volume of distribution (VDss)-the total dose of the drug distributed in a way that guarantees the same plasma concentration, permeability in the blood-brain barrier and permeation in the CNS. In view of this, the results obtained suggest that lignans (L3, L4 and L7) have a high volume of distribution with values (logL/kg > 0.45); however, all compounds have a low permeation potential in the brain with regard to bioavailability in the CNS (logBB < −1 and logPS < −3). Therefore, these results demonstrate that lignans have low pharmacological potential against neurological disorders and little influence on neurotoxic processes [28]. The in silico tool also made it possible to identify whether the structures are substrates or inhibitors of the mitochondrial enzymatic metabolism system-CYP450. This system constitutes an important family of monooxygenase capable of biotransforming 90% of xenobiotics, for which CYPA4, CYPB6, CYP3C19 and CYP2D6 stand out as being the main ones involved in the metabolic process [29]. In the table, it was possible to identify which isoforms are substrates and/or inhibitors of the lignans used in the study, emphasizing the importance of possible pharmacokinetic interactions of possible drugs which may act as enzyme inducers or inhibitors. Finally, the behavior of lignans with regard to excretion vis-à-vis organic cation transporters 2-an important renal uptake protein in the clearance process, and a point subject to pharmacokinetic interactions between drugs-was also predicted. As observed, lignans are not substrates for the transporter, in addition to presenting satisfactory renal clearance (proportionality constant (clot by the combination of hepatic and renal clearance)), which demonstrates good elimination capacity [30]. The preclinical safety profile is one of the main concerns in the development of new drugs that should present preserved efficacy and good topical and systemic tolerability. In assessing toxicity through virtual approaches, the toxicological profile of lignans was evaluated against the mutagenic potential in bacteria (AMES), potassium channels encoded by hERG, hepatotoxicity and skin sensitization. Based on the results obtained, it was seen that lignans do not present toxicity according to the AME model, do not inhibit hERG I channels and do not promote skin sensitization. However, L5, L6 and L7 can inhibit hERG II. This condition may be related to echocardiographic changes called acquired long QT syndrome, which promotes ventricular arrhythmias [27]. Therefore, it is important to point out that the pharmacokinetic determination may substantially contribute to the preclinical research of these derivatives and early recognition of possible ADMET failures.

Discussion
We found no structural correlation between subclasses of lignans and anticancer activity in the literature. This is the first work that correlates the structure of lignan subclasses with anticancer activity. Among the ten subclasses of lignans, three were considered the most promising for anticancer activity: furofurans, dibenzocyclooctadiens and aryltretalins. This result was based on the amount of lignans active against antitumor activity for each subclass. The pActivity values were approximated between lignans but are significant when greater than 0.50. These values refer to the prediction of inhibition based on the data of biological activities obtained for the construction of the models.
Furofuran lignans contain the backbone of 2,6-diaryl-3,7-dioxabicyclo[3.3.0]octane and represent one of the major subclasses of the lignan family. Furofuran lignans have a wide variety of structures due to different substituents on the aryl groups and diverse configurations on the furofuran ring. Antioxidant, anti-inflammatory, cytotoxic and antimicrobial activities are among the main activities reported for furofuran lignans [23]. Our study contributes to further explore furofuran lignans as anticancer agents.
One example is a study by Cheng et al. [31], who isolated five new furofuran lignans, brasesquilignan A-E (1-5), from Selaginella braunii Baker. All of these compounds were evaluated for antiproliferative activities against various human cancer cells in vitro but showed weak inhibition. In addition, Vitória et al. [32] isolated two new tetrahydrofuran lignans, taungtangyiols A and B, and eight known furofuran lignans from Premna integrifolia wood. Taungtangyiols A and B were observed to inhibit melanin deposition in B16F10 mouse melanoma cells, with IC 50 values of 50.7 and 40.9 µM, respectively, without notable cytotoxicity.
Another study isolated seven dibenzocyclooctadiene lignans from the fruits of Schisandra chinensis. Cell viability assays verified the cytotoxicity of isolated dibenzocyclooctadiene lignans against AGS, HeLa and HT-29 human cancer cells [33].
Podophyllotoxin (PTOX, 1) is an aryltetralin-type lignan, first discovered in the plant Podophyllum peltatum. It has been used in biosynthesis and total synthesis as a prospective alternative due to its potent anticancer activities [34,35]. Another lignan structurally and closely related to podophyllotoxin is deoxypodophyllotoxin. This aryltetralin is a potent antitumor and anti-inflammatory agent and is especially used as a precursor for the semisynthesis of etoposide phosphate and teniposide cytostatic drugs. These analogues are also used in cancer therapy [36].
Correlation studies between the chemical structure and anticancer activities are important to reduce costs in identifying new drugs and to develop more potent drugs. An example of this is the study carried out by Scotti et al. [38], who tabulated the most important examples of determined virtual screening for anticancer flavonoids and highlighted the structural determinants. Like lignans, flavanoids have a structural variety and different described biological activities. The aforementioned study identified the action mode, the most potent anticancer flavonoids and tips for the structural design of anticancer flavonoids in a review.

Data Collection and Curation
This study selected and investigated five proteins involved in the pathogenesis of cancer with sufficient and available biological activity and involved in the development of different types of cancer in different pathways. Compounds with known biological activity for ABL, EGFR, HDAC, mTOR and PARP were obtained (https://www.ebi.ac.uk/chembl/ (accessed on 14 February 2023)) [39,40]. Details of the pools can be found in Table 6. Compounds were ranked based on pIC 50 [−log IC 50 (mol/L)]. The IC 50 value represents the concentration required for 50% inhibition. The compounds and these data were used to build predictive models with biological activity for the indicated proteins. In addition, 495 lignan subclasses were evaluated by ligand-based virtual screening to identify molecules with potential activity against the selected proteins. The lignan subclasses are formed by (35) Dibenzylbutanes, (12)

QSAR Modelling
Compounds with known biological activity towards the proteins ABL, EGFR, HDAC, mTOR and PARP were saved in special data file format (SDF) and imported into the Dragon 7.0 program [41] to generate descriptors. A predictive model was built for each protein bank. The descriptors referring to each bank and biological activity information were imported into the Knime 3.5.3 program (KNIME 3.5.3, Konstanz Information Miner Copyright, 2018, www.knime.org (accessed on 22 February 2023)) to generate the predictive models. The data were divided in a "Partitioning" tool using the "Stratified sample" option in the program, which separated the data into training and test sets representing 80% and 20% of all compounds, respectively. The sets were randomly selected, but the proportions of active and inactive substances were maintained in both databases. We used the random forest (RF) algorithm to build predictive models. Cross-validation was performed to estimate the predictive power of the developed models.
The external performances of the selected models were analyzed for sensitivity, specificity and accuracy. In addition, receiver operating character (ROC) curve sensitivity and specificity were used because they describe actual performance more clearly than accuracy. The Matthews correlation coefficient (MCC) was used to evaluate the model overall based on the results obtained in the confusion matrix [42]. The applicability domain (APD) was used to analyze the compounds in the test sets, if the predictions are reliable [43,44]. The lignans considered active against the selected proteins were submitted to the other methodologies.

Mapping SNPs
Identifying single nucleotide polymorphisms (SNPs) in important regions of proteins involved in the progression of different types of tumors can help rational drug design. In addition, it contributes to developing new therapies based on the genetic profile. Genetic variations in target proteins were identified by searching two databases. The National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/ (accessed on 8 March 2023)) and Ensembl (https://www.ensembl.org/index.html (accessed on 8 March 2023)) databases [45] were consulted to obtain essential information about genes, phenotype and SNPs in the investigated proteins. SNPs with varied allele frequencies in catalytic domain and active site regions were considered in the study. Mutations with the polymorphic variants were designed using the UCSF Chimera program (Visualization and Informatics-RBVI, San Francisco, CA, USA) [46].

Molecular Docking
Molecular docking was performed using Molegro Virtual Docker v6.0.1 (MVD) program (Molexus IVS Rørth Ellevej 3, Odder, Denmark) [47] on the five targets selected for anchoring studies ( Table 7). The MolDock score algorithm was used as a scoring function to predict the best interactions between ligand and receptor. The 3D structures of the proteins used in this study were obtained from the Protein Data Bank (PDB) [48,49] (Table 7). All water molecules were initially removed from the crystalline structure and the root-meansquare deviation (RMSD) was calculated from the poses to indicate the degree of reliability of the fit. RMSD values less than 2.0 Å were considered successful.

Conclusions
It is not easy to correlate anticancer activity with a particular subclass due to the structural diversity of lignans and the various pharmacological properties. Therefore, the present study used a set of lignans distributed in the 10 known subclasses and submitted them to in silico methodologies to verify the anticancer potential of these compounds. Five predictive models were built against important targets in cancer development. The RF model was able to select a potentially active lignan against the EFGR receptor, 86 lignans considered active against HDAC, 155 lignans active against the mTOR protein and 156 lignans active against the PARP protein. Overall, the predictions of biological activities showed inhibition values ranging from 50-63%. No compound was active against the ABL protein. The results also showed that the subclasses with the most active compounds were dibenzocyclooctadiene, furofuran and aryltetralin for all proteins. These subclasses are interesting for anticancer activity. The other subclasses are correlated with other pharmacological properties known for lignans, such as anti-inflammatory, antioxidant and trypanocidal activity. The lignans considered most active for each protein (colocasinol A for EGFR; longipedlignan H, longipedlignan I and sinolignan A for HDAC8; (1R,2R)-2-{4-[(1R,3aS,4R,6aS)-4-(4-hydroxy-3,5-dimethoxyphenyl)-hexahydrofuro [3,4-c]furan-1-yl]-2,6-dimethoxyphenoxy, bizanthplanispine B and ligraminol A for mTOR and longipedlignan H, longipedlignan I and longipedlignan J for PARP1) were subjected to molecular docking.
Genomic analysis selected 4 clinically relevant SNPs for the EGFR receptor, 11 for the HDAC8 protein, 11 for mTOR1 and 2 for PARP1. The projected mutations and molecular docking results showed differences in binding energy values between proteins and between selected lignans. mTOR presented the most discrepant energy values. From these results, we suggest that mutations in the catalytic domain of target proteins can generate strong or low inhibition depending on the type of variant present in the individual.
We also suggest with this study exploit the use of lignans analogs in order to generate preliminary data from the structure-activity relationship (SAR), which could provide more information on the relationship between structural modifications and the anticancer activity of lignans.