Thiophenes—Naturally Occurring Plant Metabolites: Biological Activities and In Silico Evaluation of Their Potential as Cathepsin D Inhibitors

Naturally, thiophenes represent a small family of natural metabolites featured by one to five thiophene rings. Numerous plant species belonging to the family Asteraceae commonly produce thiophenes. These metabolites possessed remarkable bioactivities, including antimicrobial, antiviral, anti-inflammatory, larvicidal, antioxidant, insecticidal, cytotoxic, and nematicidal properties. The current review provides an update over the past seven years for the reported natural thiophene derivatives, including their sources, biosynthesis, spectral data, and bioactivities since the last review published in 2015. Additionally, with the help of the SuperPred webserver, an AI (artificial intelligence) tool, the potential drug target for the compounds was predicted. In silico studies were conducted for Cathepsin D with thiophene derivatives, including ADMET (drug absorption/distribution/metabolism/excretion/and toxicity) properties prediction, molecular docking for the binding interaction, and molecular dynamics to evaluate the ligand–target interaction stability under simulated physiological conditions.


Structural Characterization of Thiophenes
The structures of the reported thiophenes were elucidated by various spectral tools such as 1D (one dimensional) ( 1 H and 13 C) and 2D NMR (two-dimensional nuclear magnetic resonance spectroscopy) techniques, COSY (homonuclear correlation spectroscopy), HSQC (heteronuclear single quantum coherence), HMBC (heteronuclear multiple bond correlation), and NOESY (nuclear Overhauser effect spectroscopy) combined with other methods (UV (ultraviolet), IR (infra-red), MS (mass spectroscopy), elemental analysis). The reported spectral and physical data of the newly reported thiophenes are listed in Table S1. The relative configuration was determined by NOESY and ROESY (rotating frame Overhauser effect spectroscopy), as well as by [α] D measurement [34]. The exciton coupled circular dichroism (ECCD) analysis and electronic circular dichroism (ECD) calculations were utilized to assess the absolute configuration by comparing the theoretical and experimental CD spectra [16,17,37,49]. Additionally, the determination of the absolute configuration was carried out using Mosher's method and analyzing chemical shift differences between (S)-and (R)-MTPA [16]. The X-ray structure crystallographic analysis of the crystalline derivatives is another tool utilized for the absolute configuration determination [49]. It was found that some compounds had no names; therefore, they are named here using the AUPAC system for nomenclature. Further, some compounds had the same molecular formulae and structures with different nomenclatures. On the other hand, some metabolites had more than one name.

Biosynthesis of Thiophenes
The detailed biosynthesis of thiophenes was discussed previously [8]. In this work, the recently reported biosynthetic pathways was discussed.
Wu et al. reported the biogenetic pathways of dimeric bithiophenes 68-70 (Scheme 1). These compounds had an unparalleled dimeric bithiophene skeleton containing two bithiophene units linked by uncommon cyclic diether units. It was proposed that they may be originated from arctinol-b (49). For 68 and 69, the formation of the 1,3-dioxolane ring may be obtained from an aldol condensation. Firstly, a key intermediate (I) is produced from 49 by dehydration and keto-enol tautomerism. After that, an aldol condensation among 49 and I would give 68 and 69. Additionally, an intermolecular dehydration reaction between two 49 molecules forms the 1,4-dioxane unit to give 70 [17].

Biological Activities of Thiophenes
The reported thiophenes were investigated for various bioactivities. In this regard, these metabolites are associated with some types of biological actions, including antimicrobial, antiviral, anti-inflammatory, larvicidal, antioxidant, insecticidal, cytotoxic, and nematicidal effects. The results of the most active metabolites are summarized.

Anti-Inflammatory Activity
Inflammation is a host body defense mechanism that enables the body to survive during injury or infection and maintains the homeostasis of tissues in noxious conditions [55].
Endogenous NO (nitric oxide) plays a critical role in maintaining the homeostasis of varied cellular functions. NO local concentrations are highly dynamic, as independent enzymatic pathways regulate the synthesis. NO has been shown to modulate inflammation, decreasing the secretion of pro-inflammatory cytokines in human alveolar macrophages challenged with bacterial lipopolysaccharides (LPS) while not altering the basal cytokine levels. Drugs used for managing inflammatory disorders relieve these ailments, but they may have life-threatening consequences [56]. Therefore, there is great enthusiasm in developing new and safe remedies for treating inflammation from natural sources. The reported studies revealed that the anti-inflammatory potential of thiophenes could be due to inhibiting the activation of the NF-κB (nuclear factor-κB) pathway that regulates the expression of pro-inflammatory cytokines and chemokines [57].
In vitro anti-inflammatory assay, compounds 23-26 obtained from Pluchea indica aerial parts possessed significant inhibitory potential toward NO production caused by LPS in RAW 264.7 macrophages at a concentration of 40 µM with % inhibition ranging from 83.4% to 90.1% compared to dexamethasone (62.2%) [35] ( Figure 5).
On the other side, the two new thiophene polyacetylene glycosides, atracthioenynesides A (29) and B (30) isolated from Atractylodes lancea rhizomes did not show any activity in LPS-induced NO production in BV2 cells [37].

Cytotoxic Activity
Cancer is a crucial cause of death globally, accounting for ≈10 million deaths in 2020 [48,60]. There are many available medications for treating various types of cancer. However, none of them are entirely safe and effective. Many of the reported thiophenes have been assessed for cytotoxic effectiveness toward various cancer cell lines.
Compounds 11, 18, and 22 isolated from Pluchea indica aerial parts were assayed for inhibitory potential on coumarin 7-hydroxylation induced by CYP2A6 (cytochrome P450 2A6) and CYP2A13 (cytochrome P450 2A13) enzymes, using enzymatic reconstitution assay [31]. The human liver cytochrome P450 (CYP) 2A13 and 2A6 enzymes had a crucial function in nicotine metabolism and the activation of tobacco-specific nitrosamine carcinogens. Their prohibition could represent a strategy for smoking abstinence and decreasing risks of lung cancer and respiratory complaints. It was found that 18, 11, and 22 irreversibly prohibited CYP2A6-and CYP2A13-induced coumarin 7-hydroxylation (IC 50 values 3.90 and 2.40 µM, respectively, for 18; IC 50 6.43 and 6.18 µM, respectively for 11, and IC 50 4.44 and 2.94 µM, respectively for 22). These metabolites could aid in smoking stoppage and lessened risks of lung cancer and respiratory illnesses [31].
Additionally, Preya et al. reported that 77 isolated Eclipta prostrata was a more potent cell growth inhibitor (IC 50 s 0.20-18.82 µM) than cisplatin (IC 50 10.80 to 43.05 µM) toward a panel of human ovarian cancer cell lines; OVCAR3, SKOV3, A2780, and ES2 in the MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide) assay. It caused changes in S phase-linked proteins (cyclins A and D2 and cyclin-dependent kinase 2) and induced an intracellular increase in ROS that increased the levels of p-H2AX (H2A histone family member X), resulting in DNA (deoxyribonucleic acid) damage [14,20]. A mechanism study indicated that 77 caused S-phase cell cycle arrest by inducing ROS stress and DNA damage. Therefore, 77 could be a potential therapeutic lead for treating ovarian cancer.
Compounds 76, 77, and 79-81 isolated from Eclipta prostrate showed prominent cytotoxic effectiveness toward Hec1A (IC 50 ranging from 0.38 to 129.85 µM) and Ishikawa (IC 50 ranging from 0.35 to 9.68 µM) cells compared to cisplatin (IC 50 120.4 and 10.11 µM, respectively). Notably, 77 had a potent effect on Ishikawa and Hec1A cells (IC 50 0.35 and 0.38 µM, respectively) [37,47]. The inhibitory effect of 77 was mediated by the induction of apoptosis, triggering caspase activation and cytochrome c release into the cytosol. Additionally, it increased the ROS intracellular level and decreased GSH (glutathione). Therefore, its apoptotic effect was attributed to the generation of reactive oxygen species via NADPH (nicotinamide adenine dinucleotide phosphate) oxidase in human endometrial cancer cells [47].

Antimicrobial Activity
Infectious diseases continue to be a serious worldwide health concern. Multidrugresistant (MDR) pathogens significantly increased morbidity and mortality rates [61]. The continuous emergence of MDR pathogens drastically reduced the efficacy of the utilized antibiotics resulting in a growth rate of therapeutic failure [62]. Accordingly, new and effective antimicrobial agents to tackle microbial infections are needed [50].
In 2017, Postigo et al. reported the separation and structural elucidation of 37, 43, 44, and 75 the from n-hexane extract of Porophyllum obscurum by preparative CTL (centrifugal thin layer) and TL (thin-layer) chromatography that were assayed for their fungicidal potential against C. albicans ATCC-10231 and 25 clinical strains of Candida spp. isolates as causative agents of oropharyngeal candidiasis using broth microdilution. They exhibited fungicidal effectiveness with minimum fungicidal concentrations (MFC) ranging from 0.24 to 7.81 µg/mL under UV-A irradiation, whereas 32 with (MFC 0.24 µg/mL) and 43 with (MFC 3.90 µg/mL) were the most active metabolites [40]. In 2019, Postigo et al. evaluated their photoinactivation towards C. albicans in parallel under darkness and light conditions. The results revealed that these thiophenes exhibited the highest potential under normal-light/oxygen atmosphere (MFCs ranged from 0.24 to 7.81 µg/mL). However, their effects decreased >200 times (MFCs ranged from 7.81 to 250 µg/mL) with lowoxygen conditions. On the other hand, all tested thiophenes had no antifungal potential in darkness under both oxygen conditions (MFC > 250 µg/mL). It was found that 75 was the most active photosensitizer and was the only one that generated a single oxygen at MFC. Furthermore, it did not elevate sensitivities to oxidative and osmotic stressors and did not produce leakage or apoptosis [59]. Therefore, their antifungal mechanism was proposed to be photodynamic, considering that the absence of oxygen had a passive effect on the antifungal photosensitivity capacity. Therefore, these features could encourage further assessments to confirm their potential application as photosensitizers in photodynamic antimicrobial therapy toward fungal infections [59].

Antimalarial Activity
Malaria represents a significant parasitic disease worldwide, which is accountable for the death of at least half a million people yearly [63]. Globally, the estimated malaria cases in 2020 are 241 million in 85 malaria-endemic countries [64]. There is currently a vast augmentation of resistance to the available antimalarial drugs, which necessitates the search to pinpoint new drugs to combat malaria [65].
Bitew et al. evaluated the antimalarial activity of 9 and 14 isolated from CH 2 Cl 2 fraction of Echinops hoehnelii roots utilizing the standard suppressive method in Plasmodium berghei-affected mice. Compounds 9 and 14 at 50 and 100 mg/kg concentrations decreased parasitemia levels by 43.2% and 50.2% and 18.8% and 32.7%, respectively, compared to chloroquine. It was suggested that the ester functional group produced a two-fold decrease in the activity as in 14 [32].

Larvicidal Activity
Currently used larvicides are synthetic pesticides with high toxic effects on humans and other non-targeted organisms. Several reports revealed that thiophenes demonstrated toxic effect toward insects, especially larval mosquitoes. It was proposed that thiophenes showed the promising possibility to be set as natural larvicides for controlling mosquitoes.

Nematicidal Activity
Nematodes and plant pathogenic fungi cause diseases that can lessen the yield and quality of several crops [66]. Chemical control utilizing synthetic-produced pesticides is a commonly used way to manage these diseases. The possible imperilment of synthetic chemicals toward non-target organisms and pesticide resistance rationalized the development of eco-friendly and safe pesticides [67]. Discovering efficient and less toxic natural pesticides has given rise to a top preference in the contemporaneous pesticide industry [68].
Compound 74 previously reported from Tagetes patula aerial parts was synthesized by Politi et al. It had a marked in vitro anthelmintic effect toward Haemonchus contortus, exhibiting 100% efficacy in the larval development and egg hatch tests with EC 50 (effective concentration 50%) 0.3243 mg/mL and 0.1731 mg/mL, respectively, compared to levamisole (EC 50 1.88 mg/mL) [46].
Two new thiophene derivatives, rupestriene D (95) and rupestriene E (96), along with rupestriene A (86) isolated from the whole plants of Artemisia rupestris using SiO 2 CC and RP-HPLC. They exhibited neuraminidase inhibitory potential with IC 50 values ranging from 351.15 to 986.54 µM in the fluorescence-based assay compared to oseltamivir acid (IC 50 77.91 µM). Compounds 86 and 96 were more potent than 95, indicating that a free OH group at the C-3 side chain might enhance the activity [15].

AI Target-Based Prediction vs. (Virtual Screening), and MD (Molecular Dynamics) for Thiophene Derivatives
Cathepsin D is one of the most abundant lysosomal proteases. It is implicated in protein turnover and favored apoptosis in proteostasis disruption [69,70]. The disturbance in its regulation can lead to various health disorders. Its excessive levels outside the cell membrane and lysosomes result in the growth of tumors, migration, invasion, and angiogenesis [71,72]. Many of the available inhibitors have non-specific inhibitory effects that may cause serious side effects [73]. Therefore, the currently tested thiophene derivatives as cathepsin D inhibitors could provide marked diagnostic benefits and a new therapeutic approach.
In order to detect the suitable protein targets for the thiophene derivatives, ligandbased tools were utilized for in silico target prediction [74]. In the current study, Super-Pred, a prediction webserver, was used for the anatomical therapeutic chemical (ATC) code and target predication of these compounds [75]. Based on the analysis of the results for all the predicted targets, cathepsin D with PDB (protein data bank) code 4OD9 was selected, which is considered a common target for most of the thiophene derivatives with high probability and model accuracy percent (Table 3). All the listed compounds were docked, using extra precision for maximum accuracy; the docking method was validated by redocking the inhibitor N-(3,4-dimethoxybenzyl)-Nalpha-{N-[(3,4dimethoxyphenyl)acetyl]carbamimidoyl}-D-phenylalaninamide (2RZ) that co-crystallized with 4OD9, and RMSD values were found in an acceptable range. All the redocked inhibitors revealed the same binding interaction with the active site as the original pose. Further, an in silico ADMET properties prediction of the investigated compounds was carried out. Eventually, MD simulation was conducted to assess the ligand/target interaction under simulated physiological circumstances for compound 30, which showed high docking scores. Table 3. The probability and model accuracy prediction for thiophene derivatives against Cathepsin D using SuperPred target prediction webserver.

9
Probability * Model Accuracy ** * The input structure binds with the specific target, as determined by the respective target machine learning model; ** Since the model performances vary between different targets, additionally, the 10-fold cross-validation score of the respective logistic regression model is displayed.

In Silico ADMET Properties of Selected Ligands
The reported 96 thiophene derivatives were processed using the LigPrep of the Schrodinger suite [76]. The OPLS3 force field generated the 3D (three-dimensional) models with ionization states at 7.0 ± 0.2 pH. The QikProp module of the Schrodinger suite was utilized for predicting the ADME properties [77]. The predicted ADMET properties are summarized in Table 4. The ADMET analysis describes and determines the biological function, drug-likeness, physicochemical characters, and expected toxicity of the compounds. This is translated in terms of evaluating the usefulness of the molecules. The examined descriptors, such as drug likeness, solvent accessible surface area, dipole moment, molecular weight, hydrogen bond acceptor, and donor traits, aqueous solubility, octanol-water coefficient, number of likely metabolic reactions, brain/blood partition coefficient, human oral absorption, binding to human serum albumin, central nervous system activity, IC 50 value for blockage of HERG K + (human ether-a-go-go-related gene potassium) channels, and number of reactive functional groups were predicted for the reported thiophene derivatives. Most of the predicted values obtained for the compounds are in the recommended range, except for some highlighted parameters with yellow color.

Ligands and Proteins Preparations
Using LigPrep converted 2D structures to 3D, tautomerization, and ionization gave 146 minimized 3D structures that were utilized for docking with the Cathepsin D crystal structure (PDB: 4OD9). The 4OD9 prepared by the protein preparation wizard tool minimized the geometry and optimized the H-bond network. Specifying the proper force field treatment and the formal charge was accomplished by the addition of correct ionization states and missing hydrogens (Figure 12).

Molecular Docking Studies
After designating the grid box in the prepared protein through Glide's Receptor-Grid Generation tool in Maestro [78], the obtained 3D molecular structures were docked into the cathepsin D co-crystallized inhibitor binding site. Table 5 shows the results of the docked ligands that were selected owing to their most negative docking scores. These scores demonstrated the best-bonded ligand relative binding affinities and conformations. Compounds 29 and 30 displayed the highest negative docking scores of −9.439 and −9.178 kcal/mol in complex with 4OD9, respectively, while the reference inhibitors (2RZ) had a score of −6.895 kcal/mol in complex with the same protein.
Analysis of the docking of 29 and 30 compared with the redocked reference 2RZ indicated that they interacted through hydrogen bonds (Figures 13-16

Molecular Dynamics Simulation
The docking operation is a static view for the molecule's binding in the active site of the specific protein. MD simulation computes the time versus atoms motions. By using Desmond software [79][80][81], the stability and frequency of compound 30 complex with Cathepsin D with PDB codes 4OD9, MD simulation was run with simulation time 100 ns. The complex structure was optimized at pH 7.0 ± 2.0. Complex stability was examined by analyzing the interaction map and the RMSD (root mean square deviation) plots of the ligand and protein. The RMSD plot in Figure 17 for the compound 30 complexed with Cathepsin D indicated the complexes tend to stabilize during simulation (100 ns) with regard to a reference frame at time 0 ns. There was a slight fluctuation during the simulation, but it lay under the permitted range of 1-3 Å; hence, it can be regarded as nonsignificant. Since the RMSD plots of compound 29 and protein backbone were lying over each other, the stable complex formation can be inferred. Figure 18 showed the schematic of detailed ligand atom interactions of compound 30 with Cathepsin D. The docked poses were maintained through the simulation time of 100 ns, i.e., molecular interactions with residues VAL 31, SER 36, ASN 38, TRP 40, and TYR 78.            Figure 19 represents the ligand-protein interactions that are characterized into four types: ionic, hydrophobic, hydrogen bonds, and water bridges. Each interaction type includes more specified subtypes, which can be investigated via the 'Simulation Interactions Diagram' panel [82][83][84][85][86]. The stacked bar charts were normalized throughout the trajectory: for example, a value of 0.7 suggests that 70% of the simulation time, the specific interaction, is maintained. Values over 1.0 are possible, as some protein residue may make multiple contacts of the same subtype with the ligand. Hydrogen bonding with residues VAL 31, TRP 40, and TYR 78 was retained for more than 80% of the simulation time.     Interactions that occur more than 30.0% of the simulation time in the selected trajectory (0.00 through 100.00 ns). It is possible to have interactions with >100% as some residues may have multiple interactions of a single type with the same ligand atom.

Preparation of PDB Structures
The PDB structure (PDB IDs: 4OD9) was downloaded from the Protein Data Bank [69], prepared and optimized utilizing the "Protein preparation wizard" [70] tool of Schrödinger suite [76,87]. For this reason, the bond orders for known HET groups and untemplated residues were identified, and hydrogens were added. Then, breaking bonds to metals, adding zero-order bonds among metals and nearby atoms, and correcting the formal charges to metals and neighboring atoms were carried out. From HET groups, water molecules beyond 5 Å were deleted. Disulfide bonds were generated. For ligands, metal HET states and cofactors were generated at 7.0 ± 2.0 pH using LigPrep [76]. Finally, Hbonds optimization at pH 7.0 using PROPKA [88], the removal of water molecules beyond 3 Å from HET groups, and restrained minimization using the OPLS4 force field were done.

ADME Properties Prediction
The drug likeness and ADME properties of the chosen compounds were estimated via the Maestro Schrodinger QikProp module in terms of metabolism, distribution, excretion, absorption, etc. [77].

Receptor Grids Generation and Docking
Glide [78] was utilized for both grid generation and ligands docking. For docking of the 96 thiophene derivatives, the grid was generated using the PDB: 4OD9, the region of binding was specified by selecting 2RZ. The non-polar atoms were set for the VdW radii scaling factor by 1.0 and the partial charge cut-off 0.25. The ligands docking was performed using the "ligand docking" tool of the Schrödinger suite [78,85]. The selected protocol was standard precision (SP), the ligand sampling method was flexible, and all the other settings were default.

MD Simulations of Compound 30 in Complex with 4OD9
MD simulations were run using the Schrödinger suite [88]. The system of compound 30 in complex with 4OD9 was retrieved from the docking results and first tuned through the "System Builder" tool. The TIP3P solvent model and then orthorhombic-shape box shape was chosen. The system was neutralized by adding Na + ions, and the side distances box was set at 10 Å. The MD calculations were run for 100 ns per trajectory, the number of atoms, pressure, and temperature were kept constant (NPT ensemble). In contrast, the pressure was set at 1.01325 bar and temperature at 300.0 K, and the force field was set as OPLS4.

Conclusions
Natural products are featured by enormous scaffold diversity and structural complexity that contribute to drug discovery. Sulfur-containing natural metabolites are a large class of significant functional molecules with potent biological activities and pharmacological properties; some of them have been developed into essential drugs. In the current work, 96 naturally occurring thiophenes have been reported from 2015 till now. Most of them had one to three thiophene rings. However, dimeric bithiophenes with four thiophene rings and quinquethiophenes with five thiophene rings were extremely rare. These metabolites have been mainly evaluated for their cytotoxic, antimicrobial, anti-inflammatory, and nematicidal capacities. On the other side, there are limited reports on their antimalarial, larvicidal, antioxidant, and anti-influenza activities. Some of them showed remarkable cytotoxic effects. Therefore, they could be a potential therapeutic agent for treating various cancers. However, in vivo studies and their detailed mechanism of action need to be determined.
Furthermore, they showed marked antifungal potential against various plant pathogenic fungi and had a remarkable nematicidal effect. Hence, they could provide worthy insights for discovering effective, eco-friendly nematicides and fungicides. More studies on formulations development are needed to upgrade stability and efficacy and cut down costs. Additionally, field assessment and research on the effects of these compounds on non-target organisms are compulsory. Their antifungal mechanism was proposed to be photodynamic. However, further assessments are needed to confirm their potential for application as photosensitizers in photodynamic antimicrobial chemotherapy toward fungal infections. The structure-activity relationship study revealed that the presence of the acetylene group in the side chain increased the larvicidal and antimalarial potential. However, the attachment of the acetylenic side chain to an ester and acyl functional groups lowered the activity [19,32,34,41]. Additionally, increasing the number of acetylene groups in the side chain increased the anti-inflammatory, nematocidal, and antifungal capacities [19,32,34,41]. Further, the attachment of this chain with the chlorine group enhanced the activity [19] ( Figure 20). Estimation of other potential bioactivities and derivatization of these metabolites, besides the mechanistic and in vivo studies of these metabolites, should be the target of future research. The precursor-based combinatorial biosynthesis (PCB) could be used for enhancing the production and structural modification of these metabolites. In this technique, a precursor analog has to be fed to the producing microorganism, resulting in the production of novel thiophenes with a potential pharmaceutical significance that is effective and ecologically friendly [89]. Based on the in silico, including ADMET properties predication, molecular docking for the protein-ligands binding interaction, and molecular dynamics, these metabolites were identified as potential inhibitors for the Cathepsin D, which will be helpful as potential leads for the treatment of several diseases that are affected by the dysregulation of this enzyme. The results of the studies described in this review are undoubtedly significant. They constitute the first stage of searching for new drug candidates and observing the strength of the tested metabolites concerning the currently used drugs. Furthermore, this review provides an overview of the research progress on naturally isolated thiophenes with special highlighting of their bioactivities that could attract the attention of many natural product researchers to further investigate and explore their mechanism, efficacy, and safety through in vivo studies. The structural diversity of thiophenes could be of considerable synthetic interest as novel chemical entities for drug discovery. Also, this work may encourage further research for isolation, characterization, and bio-evaluation of these thiophenes that may provide more candidates for the pharmaceutical industry. Additionally, it aims at providing a reference for researchers that they can use for the rapid identification of isolated thiophenes through a comparison of their physical and spectral data. Future investigations such as combinatorial chemistry and drug design will inevitably expose new avenues for the advancement of drug discovery.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants11040539/s1, Table S1: Physical and spectral data of newly reported naturally occurring thiophenes from 2015 to 2021.

Conflicts of Interest:
The authors declare no conflict of interest.