Structure-Based Discovery of ABCG2 Inhibitors: A Homology Protein-Based Pharmacophore Modeling and Molecular Docking Approach

ABCG2 is an ABC membrane protein reverse transport pump, which removes toxic substances such as medicines out of cells. As a result, drug bioavailability is an unexpected change and negatively influences the ADMET (absorption, distribution, metabolism, excretion, and toxicity), leading to multi-drug resistance (MDR). Currently, in spite of promising studies, screening for ABCG2 inhibitors showed modest results. The aim of this study was to search for small molecules that could inhibit the ABCG2 pump. We first used the WISS MODEL automatic server to build up ABCG2 homology protein from 655 amino acids. Pharmacophore models, which were con-structed based on strong ABCG2 inhibitors (IC50 < 1 μM), consist of two hydrophobic (Hyd) groups, two hydrogen bonding acceptors (Acc2), and an aromatic or conjugated ring (Aro|PiR). Using molecular docking method, 714 substances from the DrugBank and 837 substances from the TCM with potential to inhibit the ABCG2 were obtained. These chemicals maybe favor synthesized or extracted and bioactivity testing.


Introduction
During the last decades, multi-drug resistance (MDR) is one of the main problems that challenge clinical treatment, especially chemotherapy for various cancers. One of the most crucial agents of drug resistance is related to the ABC membrane protein reverse transport pumps. ABCG2 was described by Dean M. et al. as the second member of the G subfamily, a subfamily in the ABC family of proteins [1]. The protein, also known as BCRP, plays a role as a toxic removal channel, in which stranger substances, including medicines, are metabolized and excreted out of cells. It is presented throughout the body, in different tissues such as intestines, liver, kidneys, and cancerous tissue. Consequently, drug bioavailability is an unexpected change and negatively influences the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of many medications as well as drug-like molecules. In cancer cells, an increase in ABCG2 expression was associated with therapy failure. Therefore, the discovery of ABCG2 inhibitors could improve the bioavailability of the desired drug, minimize multidrug resistance, and result in more effective therapy.
There are many studies regarding the structure and effects of ABCG2 inhibitors. According to some studies, the hydrophobic properties in flavonoids and the derivatives of fumitremorgin C are supposed to be important agents for ABCG2 inhibitory activity [2][3][4]. In addition, a study on tariquidar and propafenon derivatives showed that a flat ring structure enhances ABCG2's inhibitory activity, which can be seen in two strong ABCG2 inhibitors namely purvalanol A and WHI-P180 (3-[(6,7-dimethoxyquinazolin-4-yl) amino] phenol) [5]. Despite of these promising results, screening for ABCG2 inhibitors is still limited because of the low resolution 3D-structure of this protein. Regarding some studies on ABCG2 inhibitors, the potential candidates were Sitravatinib, BMS-599626 and PZ-39 with various mechanism [6][7][8]. While virtual screening from DrugBank databases revealed cisapride and roflumilas could well inhibit BCRP on PLB985 expressed cell [9], SAR and QSAR studies showed dissimilar outcomes [2,10,11].
In the study, in silico models was built for screening ABCG2 pump inhibitors. The pharmacophore model was created based on ligands which are substances with a strong inhibitory activity (IC 50 < 1 µM), while molecular docking model was contributed on ABCG2 homology protein model, which was built up from 655 amino acids using WISS MODEL automatic server. We use two large structure libraries for in silico screening, which are 15,464 substances from the DrugBank database [12] and 57,424 substances from the Traditional Chinese Medicine Collection (TCM) [13]. A pharmacophore was firstly used to find out the lead compounds, and then these ligands were docked in an ABCG2 homology protein model to examine protein-ligand interaction. Substances with good results after screening throughout this process may favor synthesis or extraction and bioactivity testing. Table S1), the study has built two four-point pharmacophore models, P1 and P2, with accuracies of 15/15 and 14/15, respectively, including 2 hydrophobic centers, 1 aromatic ring, and 1 hydrogen bonding acceptor (Table 1). The spatial position and the distance of the models P1 and P2 are described in Figure 1, and these two models were preliminarily estimated with a decoy set ( Table 2).     Table 2. Evaluation results of pharmacophore models P1 and P2 by decoy set. The results showed that both models were not selective even though their sensitivities were over 90 percent, but their low specificity and index E together with GH score <0. 5 indicated that the screening models were not good. This issue was solved by the Pharmacophore Query Editor tool in MOE 2015. 10. Before overcoming this limitation, the importance is seeking the relatively selective query. The activity value of overlap parameter is 0.9 while building automatic pharmacophore, all queries are generated with the accuracy of nearly 10,000. With this accuracy, the queries meet 100% both weak-medium binding set and non-binding set. Implementing to overlap 118 substances with the lowest IC 50 value from the strong inhibitory set. As the number of substances in the training set is large, with the diversity of structures (118 substances), finding a query that satisfies over 90 percent of the constructive set, but does not satisfy the non-bonding set is very complicated. Therefore, building the activity of the overlap parameter decreasing to 0.8 for the purpose of generating the model that was more selective for weak-binding and non-binding set in spite of not achieving the high accuracy. The two obtained models aligning with 118 strong inhibitors ABCG2 were called P3 and P4 (Table 3). The obtained model P3 had the high repeatability, 107 over 118 molecules constituting 90.68%. In addition, after alignment, it remained one more position for adding the fifth pharmacophore factor as described in Figure 2. This model became a five-point pharmacophore model (P5), including 2 hydrophobic groups (Hyd, H), an aromatic ring or π conjugated ring (Aro/PiR), and 2 hydrogen bonding acceptor (Acc2). Thus, the models would comprise of 3 hydro-phobic features, 2H and 1 Aro/PiR. Evaluation results of 5-point pharmacophore model P5 are shown in Table 4. The results show that this model is relatively selective on strong inhibitors.

Model Total Molecules (N) TP/A TN/(N-A) Se (%) Sp (%) Ya (%) Index E GH Score
Conducting a test of pharmacophore model RHHaa_1 on the known biological activity set. The trial set had 375 substances, including 50 inactive substances and 325 active substances. There were 213 over 325 substances satisfying the pharmacophore model (Supplementary Table S2), occupying 65.54%. Distribution of substances depended on biological activity (strong, weak, inactive) in the satisfied or unsatisfied pharmacophore model set ( Figure 3). Substances with strong biological activity (pIC 50 ≥ 6) satisfying the pharmacophore model accounted for 62%, which is almost 3 times higher than in the set of unsatisfied pharmacophore model substances (62/21). Meanwhile, the ratio of substances with biological activity below 6 was 2 times lower (38/79), proving that the pharmacophore model was selective on strong inhibitors of ABCG2. The factors of the pharmacophore model were described in Figure 4. Therefore, the P5 model will be used to screen for strong ABCG2 inhibitors.   of unsatisfied pharmacophore model substances (62/21). Meanwhile, the ratio of substances with biological activity below 6 was 2 times lower (38/79), proving that the pharmacophore model was selective on strong inhibitors of ABCG2. The factors of the pharmacophore model were described in Figure 4. Therefore, the P5 model will be used to screen for strong ABCG2 inhibitors.

Homology Model
Homology model was created from 655 amino acids of ABCG2 in www.uniprot. (accessed on 17 Sep. 2019) by SWISS-MODEL server [14] ( Figure 5). This model was b from the model 6hbu.1.A, which was an electron microscopy model [15]. The model str ture consisted of two protein chains, each of which forms half of the ABCG2 transp pump. Each half-channel included 6 integral membrane fragment. A short fragment ATP-binding was located inside the cell, while the area outside the cell obtained disulf bonds. In general, this structure was similar with others that had been built by elect microscope photography [16]. The model's evaluation is shown in Figure 6. The GM

Homology Model
Homology model was created from 655 amino acids of ABCG2 in www.uniprot.org (accessed on 17 September 2019) by SWISS-MODEL server [14] ( Figure 5). This model was built from the model 6hbu.1.A, which was an electron microscopy model [15]. The model structure consisted of two protein chains, each of which forms half of the ABCG2 transport pump. Each half-channel included 6 integral membrane fragment. A short fragment of ATP-binding was located inside the cell, while the area outside the cell obtained disulfide bonds. In general, this structure was similar with others that had been built by electron microscope photography [16]. The model's evaluation is shown in Figure 6. The GMQE (Global Model Quality Estimation) value of the homology model was 0.85, while the QMEAN (Qualitative Model Energy Analysis) value was −2.25 (above −4). Therefore, this model could be used to construct molecular docking.     In a Ramachandran map, the most favorable region was illustrated in the red area, whilst the less favorable regions were showed in progressively lighter tones ( Figure 7). Among 1104 non-glycine and non-proline residues of the homology protein, 91.1% (1006 In a Ramachandran map, the most favorable region was illustrated in the red area, whilst the less favorable regions were showed in progressively lighter tones ( Figure 7). Among 1104 non-glycine and non-proline residues of the homology protein, 91.1% (1006 residues), 7.3% (81 residues), 1.4% (16 residues), and 0.1% (1 residue) were, respectively, found in the most favored regions [A,B,L]; the additional allowed regions [a, b, l, p]; the generously allowed regions [~a,~b,~l,~p]; and the disallowed regions [XX]. These figures were also similar in the Ramachandran plot of the ATP-binding cassette sub-family G member-2 sample model (6hbu.1.A). Moreover, the statistics indicated that the selected homology model had more than 90% of residues in the core regions [A, B, L], meaning this high quality model was appropriate for further in silico screening research.   The green area represents the hydrophobic zone, the blue area represents the slightly polarized zone, the pink area represents the hydrophilic zone.

Docking Models
The docking scores of 15 strong inhibitors ranged from −5.15 kJ.mol −1 to −22.74 kJ.mol −1 . In the binding pocket, these substances could bind well and interact with important amino acids (Table 5). Besides, other 325 substances, including 155 strong inhibitors and 170 weak ones, were also docked into the binding site. Figure 9 shows that all 325 substances were successfully docked into ABCG2 protein with docking score varied from −4.1 to −28.2 KJ.mol −1 (Supplementary Table S2).  Figure 8. The binding site of the inhibitor on ABCG2. The green area represents the hydrophobic zone, the blue area represents the slightly polarized zone, the pink area represents the hydrophilic zone.

Docking Models
The docking scores of 15 strong inhibitors ranged from −5.15 kJ·mol −1 to −22.74 kJ·mol −1 . In the binding pocket, these substances could bind well and interact with important amino acids (Table 5). Besides, other 325 substances, including 155 strong inhibitors and 170 weak ones, were also docked into the binding site. Figure 9 shows that all 325 substances were successfully docked into ABCG2 protein with docking score varied from −4.1 to −28.2 kJ·mol −1 (Supplementary Table S2).
By using PLIF tool in MOE, the frequency of amino acids interacting with amino acids results could be seen in Table 6 and Figure 10, in which Phe 182, Asn 391, Gln 393, Glu 446, Ser 443, Val 533, Val 534, Val536, and Leu 539 were determined as important amino acids with a high frequency of interaction. As can been seen from docking models, a good bonding ligand requires the following: The hydrogen-bond donors and acceptors that can combine with the polar amino acids (Asn 391, Gln 393, Glu 446); the hydrophobic group that can go deep into the binding site to interact with hydrophobic amino acids (Ser 443, Val 533, Val 534, Val536, Leu 539) and the interaction between aromatic ring and Phe182 in the binding site.  By using PLIF tool in MOE, the frequency of amino acids interacting with amino acids results could be seen in Table 6 and Figure 10, in which Phe 182, Asn 391, Gln 393, Glu 446, Ser 443, Val 533, Val 534, Val536, and Leu 539 were determined as important amino acids with a high frequency of interaction. As can been seen from docking models, a good bonding ligand requires the following: The hydrogen-bond donors and acceptors that can combine with the polar amino acids (Asn 391, Gln 393, Glu 446); the hydrophobic group that can go deep into the binding site to interact with hydrophobic amino acids (Ser 443, Val 533, Val 534, Val536, Leu 539) and the interaction between aromatic ring and Phe182 in the binding site.    The docking result of a derivative of Fumitremorgin C (Ko143) in the binding site shows that this structure does not go deep into the binding site. The tertbutyl group was a donor in a hydrogen bond with Gln 393 and hydrophobic interacted with Ala 397, Val 536, Leu 539 in 2 regions. Isopropyl group that interacted with Phe 182 played an important role in this group of compounds. As the case of tetrahydroxy-β-carbolin derivative, JMC_2016_59_6121_51 (Figure 11), the -NH-group on the carbolin skeleton interacted with Gln 393, and the C=O group interacted with Asn 391, which were two important functional groups in this group of compounds. This result was consistent with the biological activity test of tetrahydro-β-carbolin group when ethylated -NH-group on this molecule, the biological activity decreased significantly as shown in Figure 12. The docking result of a derivative of Fumitremorgin C (Ko143) in the binding site shows that this structure does not go deep into the binding site. The tertbutyl group was a donor in a hydrogen bond with Gln 393 and hydrophobic interacted with Ala 397, Val 536, Leu 539 in 2 regions. Isopropyl group that interacted with Phe 182 played an important role in this group of compounds. As the case of tetrahydroxy-β-carbolin derivative, JMC_2016_59_6121_51 (Figure 11), the -NH-group on the carbolin skeleton interacted with Gln 393, and the C=O group interacted with Asn 391, which were two important functional groups in this group of compounds. This result was consistent with the biological activity test of tetrahydro-βcarbolin group when ethylated -NH-group on this molecule, the biological activity decreased significantly as shown in Figure 12. The docking result of a derivative of Fumitremorgin C (Ko143) in the binding site shows that this structure does not go deep into the binding site. The tertbutyl group was a donor in a hydrogen bond with Gln 393 and hydrophobic interacted with Ala 397, Val 536, Leu 539 in 2 regions. Isopropyl group that interacted with Phe 182 played an important role in this group of compounds. As the case of tetrahydroxy-β-carbolin derivative, JMC_2016_59_6121_51 (Figure 11), the -NH-group on the carbolin skeleton interacted with Gln 393, and the C=O group interacted with Asn 391, which were two important functional groups in this group of compounds. This result was consistent with the biological activity test of tetrahydro-β-carbolin group when ethylated -NH-group on this molecule, the biological activity decreased significantly as shown in Figure 12.  In the Chalcon derivatives, the hydrogen bonds (donor or acceptor) of Asn 391 and Gln 393 with 2 methoxy groups on ring A played a vital role. The hydrophobic groups of ring B interacted with the hydrophobic amino acids (Leu 539, Ala 397, Val 536). This was consistent with the bioactive test. The molecule JMC_2013_67_115_17 belongs to the group of Flavone and benzoflavone derivatives, the oxygen atoms of the C ring acted as a hydrogen bond acceptor with Glu 393. The hydrophobic groups such as prenyl, methoxyl on the A ring of the chromone could have good activity by interacting with the hydrophobic amino acids of the binding site such as Leu539, Ala397, and Val536. In addition, the hydrogen bond acceptor of the ring B (such as the methoxyl group) could be a way to increase the interaction. The docking score of Quinazoline derivatives group was quite low, showing good interaction with the binding site. In the structure BMC_2013_21_7858_20 ( Figure 13), quinazoline had 2 nitrogen atoms as hydrogen bond acceptors with Gln181, Asn391, and Gln 393; the secondary amine of quinazoline with phenyl group may donate hydrogen to have hydrogen bond with Glu 446 and Gln 393. The nitrite group, which was involved in the hydrogen bond acceptor with Glu446, can be replaced by the hydrogen bond donor or acceptor. The phenyl group was required to create hydrophobic interactions with amino acids such as Phe181. The result was better when added to the methoxy group. In the Chalcon derivatives, the hydrogen bonds (donor or acceptor) of Asn 391 and Gln 393 with 2 methoxy groups on ring A played a vital role. The hydrophobic groups of ring B interacted with the hydrophobic amino acids (Leu 539, Ala 397, Val 536). This was consistent with the bioactive test. The molecule JMC_2013_67_115_17 belongs to the group of Flavone and benzoflavone derivatives, the oxygen atoms of the C ring acted as a hydrogen bond acceptor with Glu 393. The hydrophobic groups such as prenyl, methoxyl on the A ring of the chromone could have good activity by interacting with the hydrophobic amino acids of the binding site such as Leu539, Ala397, and Val536. In addition, the hydrogen bond acceptor of the ring B (such as the methoxyl group) could be a way to increase the interaction. The docking score of Quinazoline derivatives group was quite low, showing good interaction with the binding site. In the structure BMC_2013_21_7858_20 (Figure 13), quinazoline had 2 nitrogen atoms as hydrogen bond acceptors with Gln181, Asn391, and Gln 393; the secondary amine of quinazoline with phenyl group may donate hydrogen to have hydrogen bond with Glu 446 and Gln 393. The nitrite group, which was involved in the hydrogen bond acceptor with Glu446, can be replaced by the hydrogen bond donor or acceptor. The phenyl group was required to create hydrophobic interactions with amino acids such as Phe181. The result was better when added to the methoxy group. In the Tariquidar derivatives, the smallest docking score substances in the groups, the inhibitor molecule went deep into the binding site. CMC_2012_7_650_nilotinib had a docking score of −26.408 kJ.mol −1 and hydrogen-acceptor interactions of the amino acids Asn 391 and Gln 393 still played an important role. In the Tariquidar derivatives, the smallest docking score substances in the groups, the inhibitor molecule went deep into the binding site. CMC_2012_7_650_nilotinib had a docking score of −26.408 kJ·mol −1 and hydrogen-acceptor interactions of the amino acids Asn 391 and Gln 393 still played an important role.
There were 927/950 substances of the decoy set that were successfully docked into the protein, with the docking scores ranging from −23.37 kJ·mol −1 to +15.6 kJ·mol −1 . Meanwhile, all 325 active substances were docked to the binding site with the docking scores ranged between −28.34 kJ·mol −1 and −4.12 kJ·mol −1 . As a potential inhibitor should have docking score lower than −20 kJ·mol −1 , the AUC were 0.92 and the EF value were 3.62 (Table 7) (Figure 14).

In Silico Screening
The docking and pharmacophore models were correlated. Hydrogen-bond acceptor factors F2: Acc2 and F3: Acc2 on the pharmacophore model correspond to the amino acid regions Asn 391, Gln 393, and Glu 446, which were polar amino acids with the ability to

In Silico Screening
The docking and pharmacophore models were correlated. Hydrogen-bond acceptor factors F2: Acc2 and F3: Acc2 on the pharmacophore model correspond to the amino acid regions Asn 391, Gln 393, and Glu 446, which were polar amino acids with the ability to form hydrogen bonds. The 2 hydrophobic elements F5: Hyd and F1: Aro|PiR correspond to the hydrophobic region created by 3 amino acids: Ala 397, Val 536, and Leu 539. The hydrophobic region F2: Hyd corresponds to the interaction area of the Phe 181.
The databases used for screening included 57,724 compounds from the TCM (Traditional Chinese Medicine) [13] and 15,464 compounds from the DrugBank database [12]. These compounds were formulated by MOE 2015.10 software, then screened with pharmacophore model RHHaa_1 with a "druglike" filter to eliminate substances not meeting the Lipinski's "Rule of five". The results obtained 742 compounds belonging to the DrugBank database and 2743 com-pounds belonging to the TCM database. As the number of the TCM database's compounds was very large, the filtration of compounds with an 0.7 of RMSD compared to the pharmacophore model was conducted to yield 950 compounds. These compounds were docked into the binding site as described in the research method to evaluate the results. The number of compounds from the DrugBank and TCM database successfully docked into the binding pocket were 714 and 837, respectively. The screening steps are described in Figure 15. Based on docking score values, selected compounds with inhibitory potential for ABCG2 are listed in Table 8. Among the compounds selected from DrugBank, there were chemicals containing pyrimidine derivative such as 1, 3, 5, tetrahydro-β-carbolin frame derivative as number 10 or bisphenyl frame as substance 9. Among the compounds screened from the TCM set, most of the chemicals belonged to the flavonoid structure or the purine backbones. These compounds were presented in many plants such as: compound 11 is Rosmarinic acid, which is abundant in the Lamiaceae family, while compound 19 is 6-Prenyleriodictyol, which is abundant in the Glycyrrhiza uralensis.  Based on docking score values, selected compounds with inhibitory potential for ABCG2 are listed in Table 8. Among the compounds selected from DrugBank, there were chemicals containing pyrimidine derivative such as 1, 3, 5, tetrahydro-β-carbolin frame derivative as number 10 or bisphenyl frame as substance 9. Among the compounds screened from the TCM set, most of the chemicals belonged to the flavonoid structure or the purine backbones. These compounds were presented in many plants such as: compound 11 is Rosmarinic acid, which is abundant in the Lamiaceae family, while compound 19 is 6-Prenyleriodictyol, which is abundant in the Glycyrrhiza uralensis. frame derivative as number 10 or bisphenyl frame as substance 9. Among th compounds screened from the TCM set, most of the chemicals belonged to the flavono structure or the purine backbones. These compounds were presented in many plan such as: compound 11 is Rosmarinic acid, which is abundant in the Lamiaceae famil while compound 19 is 6-Prenyleriodictyol, which is abundant in the Glycyrrhiza uralensi frame derivative as number 10 or bisphenyl frame as substance 9. Among th compounds screened from the TCM set, most of the chemicals belonged to the flavono structure or the purine backbones. These compounds were presented in many plan such as: compound 11 is Rosmarinic acid, which is abundant in the Lamiaceae famil while compound 19 is 6-Prenyleriodictyol, which is abundant in the Glycyrrhiza uralensi

Discussion
In this research, pharmacophore models were constructed based on substances wit strong ABCG2 inhibitory activity (IC50 ≤ 1 μM) with a sensitivity of 73.33% and a specifi ity of 92.95%. From these models, it is possible to indirectly infer molecular properties o ABCG2 inhibitors and the interaction characteristics of compounds with the target.

Discussion
In this research, pharmacophore models were constructed based on substances wit strong ABCG2 inhibitory activity (IC50 ≤ 1 μM) with a sensitivity of 73.33% and a specifi ity of 92.95%. From these models, it is possible to indirectly infer molecular properties ABCG2 inhibitors and the interaction characteristics of compounds with the target.

Discussion
In this research, pharmacophore models were constructed based on substances wit strong ABCG2 inhibitory activity (IC50 ≤ 1 μM)

Discussion
In this research, pharmacophore models were constructed based on substances wit strong ABCG2 inhibitory activity (IC50 ≤ 1 μM) with a sensitivity of 73.33% and a specific ity of 92.95%. From these models, it is possible to indirectly infer molecular properties o

Discussion
In this research, pharmacophore models were constructed based on substances with strong ABCG2 inhibitory activity (IC 50 ≤ 1 µM) with a sensitivity of 73.33% and a specificity of 92.95%. From these models, it is possible to indirectly infer molecular properties of ABCG2 inhibitors and the interaction characteristics of compounds with the target.
The potent pharmacophore models consist of 5 features including 2 hydrophobic groups (Hyd), 1 aromatic ring (Aro|PiR) and 2 hydrogen bonding acceptors (Acc2). The proportion of hydrophobic groups is 3/5, consistent with studies on hydrophobicity of the binding site. The hydrophobic feature (including aromatic ring and hydrophobic groups), the hydrogen bonding acceptors and the distances between features have similarities with many previous studies.
The protein homology model of ABCG2 was built from 655 amino acids by server SWISS MODEL with GMQE value = 0.85 (the closer to 1, the more similar to the original model), the QMEAN value is −2. The molecular docking model of ABCG2 can be used to analyze the interaction between inhibitors and target. However, the model has some limitations when applied to screening. The model was applied to screen 57,724 substances from the TCM database and 15,464 substances from the DrugBank database. As a result, 714 substances from the DrugBank and 837 substances from the TCM with potential to inhibit the ABCG2 were obtained.

Databases for Pharmacophore Modelling of ABCG2 Inhibitors
In the study, databases from different studies were collected in order to build ABCG2 inhibitors pharmacophore models (Table 9). After eliminating duplication, there were 375 obtained substances, which were then divided into three groups based on IC 50 value (pIC 50 ) ( Table 10). The training set including 15 substances was selected from substances with strong biological activity (IC 50 value ≤ 1 µM or pIC 50 ≥ 6), belonging to different skeletal structures and could interact with substrate by a concurrency mechanism (Table S1). The decoy set with 950 substances was generated from 15 strong inhibitors, using the DUDE database (http://dude.docking.org/ accessed date: 17 September 2019). When testing, the mean value would be used in case the biological activity of a substance had many values with the same method. The data from Hoechst 33,342 quantitative method on MDCK II cell line increased ABCG2 expression was chosen because various bioactivity testing on this model had shown an increase in the ABCG2 expression.  In this work, we utilized 15,464 substances from the DrugBank [12] for in silico screening to seek novel substances might help inhibit ABCG2 pump. Furthermore, 57,424 compounds were also downloaded from Traditional Chinese Medicine (TCM) [13] for the same purpose.

Building Pharmacophore Models
In this study, the conformation of substances were constructed by using Conformation Import tool in MOE 2015. 10. The program used algorithms to find the low-energy conformation of substances and save them in a data set. Substances were washed and removed (salts and solvents) by using Wash tool in MOE 2015.10. Input molecules were classified according to the capability to overlap of the fragments. The conformation of fragments was searched randomly and closely. The search results would be saved in a fragmented database to be used for the next time. Conformation of complete molecules were formed by grouping conformations of fragments. The conformation would be removed if the van der Waals interaction is not good or the configuration groups are not suitable. The energy of the complete conformation is the sum of the energy of the fragments created by superposition of 3 atoms or more. Finally, conformation and energy are recorded in the output data. Pharmacophore model was constructed by Pharmacophore Elucidation in MOE 2015.10, which generate pharmacophore queries overlapping well with most of the molecules in the construction set.

Model Evaluation
The Pharmacophore Search tool was used to re-evaluate the pharmacophore model constructed or to apply pharmacophore for searching for active substances in the application set. Figure 16 describes the parameters used for evaluating the pharmacophore model.

Model Evaluation
The Pharmacophore Search tool was used to re-evaluate the pharmacophore model constructed or to apply pharmacophore for searching for active substances in the application set. Figure 16 describes the parameters used for evaluating the pharmacophore model. Enrichment factor: This index evaluates the ability of the model to give a correct prediction compared to random selection. For example, if a data set has 30% active substances, the model predicts that 30% of hits are active so this is just a random prediction.
= / / The Gunner-Henry "Goodness of hit list" score: Gunner-Henry score is used for evaluating the model. The better the model is, the higher the Gunner-Henry score is.
The most selective pharmacophore model was conducted a test on the known biological activity set with 375 substances, including 50 inactive and 325 active substances.

Building Homology Models
To construct a homology model of ABCC2, it is necessary to find the amino acid sequence of this protein. The amino acid sequence of the ABCG2 protein was obtained from Y a = TP n = TP TP + FP Enrichment factor: This index evaluates the ability of the model to give a correct prediction compared to random selection. For example, if a data set has 30% active substances, the model predicts that 30% of hits are active so this is just a random prediction.
The Gunner-Henry "Goodness of hit list" score: Gunner-Henry score is used for evaluating the model. The better the model is, the higher the Gunner-Henry score is.
The most selective pharmacophore model was conducted a test on the known biological activity set with 375 substances, including 50 inactive and 325 active substances.

Building Homology Models
To construct a homology model of ABCC2, it is necessary to find the amino acid sequence of this protein. The amino acid sequence of the ABCG2 protein was obtained from the website www.uniprot.org (accessed on 17 September 2019) (ID: Q9UNQ0), consisting of 655 amino acids (Supplementary Figure S1). It was then uploaded to the SWISS MODEL server at website https://swissmodel.expasy.org (accessed date: 17 September 2019) [14] to form a homology model [33]. After uploading amino acid sequence to the server, it would automatically find protein samples close to the amino acid sequence of ABCG2 and generate the homology model.

Model Evaluation
The homology model generated by SWISS MODEL server was evaluated through 2 parameters GMQE and QMEAN. QMQE (Global Model Quality Estimation) reflects the quality of the combination of the sample and the generated model, the sample search method. The value of QMQE is from 0 to 1. The higher the QMQE value, the higher the reliability of the model. QMEAN parameter estimates based on different geometries for the entire structure or a particular position of the model and pattern. The closer the QMEAN value to 0, the more reliable the model. The failed model is those with a QMEAN value below −4.
The selected homology model was also stereo-chemical quality tested using PROCHECK Tool [34], in which the structure in PDB format was uploaded into PDBsum to generate its Ramachandran plot. The phi-psi torsion angles of all protein residues were illustrated in the Ramachandran map, except for the chain termini position, where glycine residues were not restricted to any particular region of the plot and were separately identified by triangles. A high quality model would be expected to have over 90% in the core regions, also called the most favored regions [A, B, L].

Docking
Prepared proteins were upload to Lead IT 2.3.2 software [36]. Binding pocket were selected with radius of 20 Å for fully overlapping all protein surface. Docking process was then per-formed with followed parameters: Number of poses retained = 10, the maximum number of repetitions = 1000, the number of defragments = 200. The docking results were recorded as *.sdf file and read by MOE 2015. 10.
Docking scores (KJ.mol −1 ) were evaluated based on interaction between ligands and protein including ionic bonds, hydrogen bonds, van der Waals, π-π, . . . The docking results showed the binding affinity of ligands to protein and interaction of ligands with surrounding amino acids, which illustrated inhibitory potential of these substances.

Receiver Operating Characteristic (ROC) Analysis
To validate the docking models, receiver operating curve (ROC) value was analyzed with active and inactive compounds [39]. The decoy set of 950 structures and the active set of 325 structures were docked to ABCG2 protein. As a result, Se and Sp values were calculated based on docking scores, which were then used for plotting ROC curve (Equation (1)). The area under ROC curve (AUC) value was crucial for evaluating the docking models, in which a good model would show ROC nearly to 1. (1)

Virtual Screening
The docking model and pharmacophore model built in this research were used for screening ABCG2 inhibitors from the DrugBank database with 15,464 substances and the Traditional Chinese Medicine (TCM) with 57,424 substances. The query matching compounds was screened by pharmacophore model, while docking process investigated the interactions of these compounds with the target. Potential substances would be priorities to synthesize and test in vitro. The whole virtual screening process for ABCG2 inhibitors was described in Figure 18.

Conclusions
This research developed two screening models for ABCG2 potent inhibitors, namely pharmacophore models (on ABCG2 potent inhibitors) and molecular docking models (based on the homology model of ABCG2). Pharmacophore models were constructed based on substances with strong ABCG2 inhibitory activity (IC50 < 1 μM) with a sensitivity of 73.33% and a specificity of 92.95%. The potent inhibitor models consist of two hydrophobic (Hyd) groups, two hydrogen bonding acceptors (Acc2), and an aromatic or conjugated ring (Aro|PiR). In the study, a homology model of ABCG2 was built from 655 amino acids by server SWISS MODEL with GMQE value = 0.85, the QMEAN value is −2.25 > −4. The model was eligible to conduct molecular docking model. The research also identified the binding regions of ABCG2 inhibitors with important amino acids namely Gln 181, Phe 182, Asn 391, Gln 393, Glu 446, Ser 443, Val 533, Val 534, Val 536, and Leu 539. Three amino acids Asn 391, Gln 393, and Glu 446 formed hydrogen bonds with inhibitors, which correspond to hydrogen bonding acceptors in the pharmacophore model. The remaining amino acids form a binding site with hydrophobic regions. The newly constructed model was applied to dock substances that could inhibit ABCG2 pump. The docking results showed that these substances have a high binding affinity to the binding site (93.54% of the substances have docking score ≤−10 KJ.mol −1 ). Before a better resolution structure of ABCG2 presents, the docking model of this homology protein can be used to predict potential ABCG2 inhibitors.
Regarding in silico testing, the study also used these models for screening ABCG2 inhibitors from DrugBank and TCM databases. The abovementioned models was applied to screen 57,724 substances from the TCM database and 15,464 substances from the Drug-Bank database. As a result, 714 substances from the DrugBank and 837 substances from the TCM with potential to inhibit the ABCG2 were obtained. Substances with promised results in the study can be priority synthesized or extracted and conducted further tested for bioactivity.

Conclusions
This research developed two screening models for ABCG2 potent inhibitors, namely pharmacophore models (on ABCG2 potent inhibitors) and molecular docking models (based on the homology model of ABCG2). Pharmacophore models were constructed based on substances with strong ABCG2 inhibitory activity (IC 50 < 1 µM) with a sensitivity of 73.33% and a specificity of 92.95%. The potent inhibitor models consist of two hydrophobic (Hyd) groups, two hydrogen bonding acceptors (Acc2), and an aromatic or conjugated ring (Aro|PiR). In the study, a homology model of ABCG2 was built from 655 amino acids by server SWISS MODEL with GMQE value = 0.85, the QMEAN value is −2.25 > −4. The model was eligible to conduct molecular docking model. The research also identified the binding regions of ABCG2 inhibitors with important amino acids namely Gln 181, Phe 182, Asn 391, Gln 393, Glu 446, Ser 443, Val 533, Val 534, Val 536, and Leu 539. Three amino acids Asn 391, Gln 393, and Glu 446 formed hydrogen bonds with inhibitors, which correspond to hydrogen bonding acceptors in the pharmacophore model. The remaining amino acids form a binding site with hydrophobic regions. The newly constructed model was applied to dock substances that could inhibit ABCG2 pump. The docking results showed that these substances have a high binding affinity to the binding site (93.54% of the substances have docking score ≤−10 KJ.mol −1 ). Before a better resolution structure of ABCG2 presents, the docking model of this homology protein can be used to predict potential ABCG2 inhibitors.
Regarding in silico testing, the study also used these models for screening ABCG2 inhibitors from DrugBank and TCM databases. The abovementioned models was applied to screen 57,724 substances from the TCM database and 15,464 substances from the DrugBank database. As a result, 714 substances from the DrugBank and 837 substances from the TCM with potential to inhibit the ABCG2 were obtained. Substances with promised results in the study can be priority synthesized or extracted and conducted further tested for bioactivity. The following are available online, Table S1. ABCG2 strong inhibitors in the training set, Table S2. Docking scores of ABCG2 inhibitors in test set of the pharmacophore model, Figure S1. The amino acid sequence of the ABCG2 protein.