Identification of Influenza PAN Endonuclease Inhibitors via 3D-QSAR Modeling and Docking-Based Virtual Screening

Structural and biochemical studies elucidate that PAN may contribute to the host protein shutdown observed during influenza A infection. Thus, inhibition of the endonuclease activity of viral RdRP is an attractive approach for novel antiviral therapy. In order to envisage structurally diverse novel compounds with better efficacy as PAN endonuclease inhibitors, a ligand-based-pharmacophore model was developed using 3D-QSAR pharmacophore generation (HypoGen algorithm) methodology in Discovery Studio. As the training set, 25 compounds were taken to generate a significant pharmacophore model. The selected pharmacophore Hypo1 was further validated by 12 compounds in the test set and was used as a query model for further screening of 1916 compounds containing 71 HIV-1 integrase inhibitors, 37 antibacterial inhibitors, 131 antiviral inhibitors and other 1677 approved drugs by the FDA. Then, six compounds (Hit01–Hit06) with estimated activity values less than 10 μM were subjected to ADMET study and toxicity assessment. Only one potential inhibitory ‘hit’ molecule (Hit01, raltegravir’s derivative) was further scrutinized by molecular docking analysis on the active site of PAN endonuclease (PDB ID: 6E6W). Hit01 was utilized for designing novel potential PAN endonuclease inhibitors through lead optimization, and then compounds were screened by pharmacophore Hypo1 and docking studies. Six raltegravir’s derivatives with significant estimated activity values and docking scores were obtained. Further, these results certainly do not confirm or indicate the seven compounds (Hit01, Hit07, Hit08, Hit09, Hit10, Hit11 and Hit12) have antiviral activity, and extensive wet-laboratory experimentation is needed to transmute these compounds into clinical drugs.


Introduction
Influenza is an infectious disease caused by the influenza virus. It usually affects the upper respiratory tract and lungs. Seasonal and pandemic influenza viruses can be transmitted from animals to humans, which makes them particularly dangerous and leads to global health problems [1][2][3]. Though vaccines as a prophylactic have been recommended to reduce infection rate and epidemic possibility, it is not effective for everyone [4]. This may be due to the designing approaches of vaccines (such as liveattenuated or inactivated influenza vaccines) or the continual antigenic drift variation of influenza viruses. Moreover, it is not possible to generate prophylactic options against potential pandemic virus strains [5]. Three classes of clinical agents approved by the U.S. Food and Drug Administration (FDA), including agents that targeted the matrix 2 (M2) ion-channel, neuraminidase (NA), and the cap-snatching endonuclease activity of the polymerase acidic protein (PA), have been developed for the prophylaxis and treatment of influenza infection. However, the resistance to the M2 ion-channel inhibiting drugs makes amantadine and rimantadine invalidated their clinical utility [6,7]. It was also observed resistance to NA inhibitors (oseltamivir) in some seasonal influenza A strains [8,9]. Inhibitors of the cap-snatching endonuclease activity of PA have been made intensive research as a new series of anti-influenza inhibitors. Baloxavir marboxil is the first drug approved to target the cap-snatching endonuclease with a significant decrease in viral fitness [10,11]. Therefore, new drugs are essential for the treatment of drug-resistant and future pandemic flu strains.
Influenza A consisted of eight negative-stranded RNA genomic segments. The viral RNA-dependent RNA polymerase (RdRP) proteins were encoded by the three largest genomic RNA segments, the polymerase acidic protein (PA), polymerase basic protein 1 (PB1) and polymerase basic protein 2 (PB2) subunits. The influenza RdRP is essential for viral transcription and replication. It is also highly conserved among all influenza strains and subtypes. For the PA subunit, it has three main functions, which include endonuclease activity, involving in viral RNA (vRNA)/complementary RNA (cRNA) promoter binding and interacting with the PB1 subunit [12]. N-terminal fragments of PA (PA N , ≈25 kDa N-terminal domain; residues 1-197) and C-terminal fragments of PA (PA C , ≈55 kDa Cterminal domain; residues 239-716) are two domains of PA. The crystal structure of PA C is solved in complexes with N-terminal fragments of PB1 (PB1 N ) [13]. The crystal structures of PA N with both various ligands and unliganded have also been elucidated [14][15][16][17]. Biochemical and structural studies elucidated that PA N might contribute to the host protein shutdown observed during influenza A infection [14,18,19]. Thus, inhibition of the endonuclease activity of influenza RdRP is an attractive target for novel antiviral therapy.
Computer aided drug design (CADD) is an important method of drug discovery, including virtual screening and pharmacophore design [20]. CADD can overcome certain difficulties from experiments in the laboratory via a virtual approach with a relatively low cost [21]. For example, Sourav et al. had built up a typical pharmacophore in the discovery of potential topoisomerase I inhibitors by 'Common Features Pharmacophore' techniques where the common features were only presented in the active compounds [22]. To generate the available pharmacophore, mostly active and moderately active compounds are considered as the training set molecules. In this manuscript, we collected 37 known PA N endonuclease inhibitors [23][24][25][26] with diverse molecular structural patterns and created the training set and test set. We also constructed 10 pharmacophore models with activity prediction ability by utilizing the three-dimensional Quantitative Structure-Activity Relationship (3D-QSAR) Pharmacophore (HypoGen algorithm) technique. Then we screened a chemical database containing 1916 compounds (71 human immunodeficiency virus type 1 (HIV-1) integrase inhibitors [27], 37 antibacterial inhibitor, 131 antiviral inhibitors and 1677 approved drugs by FDA) based on the best pharmacophore Hypo1. Among the 1916 small-molecule inhibitors, part of them could form a chelate with two divalent metal ions, such as the HIV-1 integrase inhibitors. We assumed that they could also interact with the metal-chelating active site of influenza PA N endonuclease well. Subsequently, we docked 6 selected compounds (Hit01-Hit06) with RNA endonuclease protein. At last, Hit01 (raltegravir's derivative) with significant screened results was selected as the 'hit' compound to target PA N endonuclease for molecular optimization and transformation to obtain 197 novel pyrimidinone candidate molecules. According to the screening results based on the pharmacophore model Hypo1, six compounds with better estimated activity values than Hit01 were selected. Our study may provide profound theoretical guidance and practical significance for the design and experimental synthesis of influenza virus inhibitors in the near future.

Pharmacophore Model Generation
Chemically diverse 25 training set compounds containing active and moderately active compounds with corresponding IC 50 values ranging from 0.011 µM to 37 µM, were selected to generate a pharmacophore model (Table 1) by Hypogen algorithm 3D-QSAR Pharmacophore protocol. The features of training set compounds, such as HBA, HBD, RA, HYD, PI and NI, were well distributed on the selected molecules ( Figure 1). The significant statistical parameters such as cost, correlation coefficient, and RMSD of generated pharmacophore have been enlisted in Table 1. 10 pharmacophore models containing HBA, HBD, HYD and RA features were generated. The total cost of the generated pharmacophore models ranged from 130.623 to 172.486, with a null cost of 403.577 and a fixed cost of 71.464 bits. The difference between the null cost and total cost was used to describe the cost difference. The best hypothesis usually had the highest cost difference, a good correlation coefficient, the least RMSD, and a significant total cost value. Thus, the best pharmacophore Hypo1 enlisted in Table 1 was characterized by the lowest total cost value (130.623), the highest cost difference (272.95), the lowest RMSD (2.16926), and the best correlation coefficient (0.910618). The low RMSD and large correlation coefficient signified that Hypo1 had a better ability to predict the experimental activity of training set compounds.

Pharmacophore Model Generation
Chemically diverse 25 training set compounds containing active and moderately active compounds with corresponding IC50 values ranging from 0.011 μM to 37 μM, were selected to generate a pharmacophore model (Table 1) by Hypogen algorithm 3D-QSAR Pharmacophore protocol. The features of training set compounds, such as HBA, HBD, RA, HYD, PI and NI, were well distributed on the selected molecules ( Figure 1). The significant statistical parameters such as cost, correlation coefficient, and RMSD of generated pharmacophore have been enlisted in Table 1. 10 pharmacophore models containing HBA, HBD, HYD and RA features were generated. The total cost of the generated pharmacophore models ranged from 130.623 to 172.486, with a null cost of 403.577 and a fixed cost of 71.464 bits. The difference between the null cost and total cost was used to describe the cost difference. The best hypothesis usually had the highest cost difference, a good correlation coefficient, the least RMSD, and a significant total cost value. Thus, the best pharmacophore Hypo1 enlisted in Table 1 was characterized by the lowest total cost value (130.623), the highest cost difference (272.95), the lowest RMSD (2.16926), and the best correlation coefficient (0.910618). The low RMSD and large correlation coefficient signified that Hypo1 had a better ability to predict the experimental activity of training set compounds.  Furthermore, all the molecules selected from previously published research articles were divided into four groups of magnitude according to their experimental activity value (IC50) which were categorized in most active (≤0.1 μM, ++++), active (0.1 to 1.0 μM, +++),  Table 2. In the training set, compound T25 (Figure 2A) was estimated as the most active molecules as they were nicely mapped with all the essential features of the pharmacophore, whereas compound T05 ( Figure 2B) was considered as least active molecules due to the essential features were not mapped with Hypo1.

Validation of the Pharmacophore Models
The best pharmacophore model (Hypo1) was then validated by three distinct methods: (a) cost analysis, (b) Fischer's randomization test and (c) test set analysis.

Validation of the Pharmacophore Models
The best pharmacophore model (Hypo1) was then validated by three distinct methods: (a) cost analysis, (b) Fischer's randomization test and (c) test set analysis.

Cost Analysis
The cost values of the total cost, fixed cost, and null cost were produced by the HypoGen algorithm in Discovery Studio and were utilized to analyze the ability to predict the experimental activity of training set compounds. If the cost difference was between 40 and 60 bits, the predicted correlation probability was assumed to be 75-90%. If the difference was greater than 60 bits, the predicted correlation probability was assumed to be more than 90%. The highest cost difference value of Hypo1 suggested that it could predict the experimental IC 50 values of training set compounds with >90% statistical significance. The fixed cost displayed a model that fit all data perfectly. However, the null cost presumed that there was no relationship between the data and that the experimental activities were normally distributed around their average value. Thus, the significance of Hypo1 also depended on the total cost, fixed cost and null cost. In this study, the best hypothesis Hypo1 showed the fixed, total and null cost scores to be 71.464, 130.623 and 403.577, respectively.

Fischer's Randomization Test
The experimental activity values of training set compounds were scrambled randomly. With a 95% confidence level, these values were used in pharmacophore generation and put forth 19 random spreadsheets. Then we compared these results with the originally generated pharmacophore (Hypo1). Figure 3 revealed the differences of correlations ( Figure 3A) and costs values ( Figure 3B) between the HypoGen and Fischer's randomizations. None of the randomly generated pharmacophores obtained a better statistical value than Hypo1.

Test Set Analysis
The reliability of the selected pharmacophore model was determined by its ability to predict the biological activities of test set compounds. 12 chemically diverse compounds were selected as test set, and their corresponding IC 50 values ranged from 0.047 µM to 22 µM. To test the predictability ability of the pharmacophore model, we used the protocol in Discovery Studio to map the test set molecules. The estimated activity values were calculated for individual test set compounds and enlisted in Table 3. The correlation between experimental and estimated activity values was analyzed by simple regression. The strongest correlation coefficient between experimental and estimated PA N endonuclease inhibitory activity values for the training set (r 2 = 0.910618) and the test set (r 2 = 0.844000) were shown in Figure 4. It was noticeable that the test set of 12 compounds was mapped properly with the generated pharmacophore. Among the 12 compounds, the best active compound T32 was greatly mapped on the four essential features ( Figure 5). The least active molecule T29 did not map with the essential HBD feature, which signified the robustness of the pharmacophore model ( Figure 5).   The employed three validation strategies exhibited that the Hypo1 model could be chosen as a significant pharmacophore model for further screening chemical databases with diverse structural entities. The employed three validation strategies exhibited that the Hypo1 model could be chosen as a significant pharmacophore model for further screening chemical databases with diverse structural entities. The employed three validation strategies exhibited that the Hypo1 model could be chosen as a significant pharmacophore model for further screening chemical databases with diverse structural entities.

Database Screening
Chemically novel and potential lead compounds could be possibly identified from the database containing 1916 compounds (71 HIV-1 integrase inhibitors [27], 37 antibacterial inhibitors, 131 antiviral inhibitors and 1677 other approved drugs by FDA) by virtual screening based on the generated pharmacophore [10]. Pharmacophore-based virtual screening was used to initiate the identification of novel scaffolds as PA N endonuclease inhibitors and the validated pharmacophore Hypo1 was utilized as a 3D query for screening the database [18]. The diverse chemical databases were selected with the BEST search option to identify the prospective lead molecules. The potential lead compounds should fit with all the possible features of the validated pharmacophore Hypo1. Therefore, we obtained 16 molecules mapped on all features present in the Hypo1 model. We assumed that the most active compounds were estimated by activity value less than 10 µM. Subsequently, six compounds (Hit01-Hit06) were obtained with estimated activity of less than 10 µM (Supplementary Materials Table S1). Then we further evaluated them through ADMET and toxicity prediction.

ADMET and Toxicity Prediction
It would be beneficial to solve the problem which could cause a lead compound loss in preclinical and clinical trials, such as poor pharmacokinetic profile and toxic complications. The application of in-silico methodology for the prediction of the possible pharmacokinetic parameters and toxicity of the hit compounds would be advisable from an economic point of view. Then, the six compounds obtained after virtual screening were subjected to ADMET and various toxicity modules.
The toxicity results of compounds (Hit01-Hit06) were enlisted in Table 5. NTP carcinogenicity prediction had been carried out on both female and male rats, and no compounds were found to be carcinogenic in nature on both male and female rats. Toxicity risk assessment results showed that compounds Hit02, Hit03, Hit04 and Hit05 might have car-cinogenic properties against the male mouse while Hit01 and Hit06 are non-carcinogenic in nature. Non-carcinogen was found on the female mouse. Furthermore, the Ames mutagenicity and skin irritation tests had been performed against all the six potential hits and none of these potential hits showed any mutagenicity or skin irritation. Except for Hit02 and Hit06, all the hit compounds did not exhibit developmental reproductive toxicity. Rat oral maximum lethal dose was also calculated for individual hit compounds and was enlisted in Table 5. Based on the ADME and toxicity profiling, compound Hit01 was regarded as the potential hit compound and was retained for docking study.

Molecular Docking Study
The molecular docking calculations were performed using Glide program in 2015 Schrödinger software package. The influenza A virus RNA polymerase complex (PDB ID: 6E6W) with 3-hydroxy-6-(2-methyl-4-(1H-tetrazol-5-yl)phenyl)-4-oxo-1,4-dihydropyridine-2-carboxylic acid (T38) [28] was chosen as the target protein for molecular docking. It was obtained from RCSB Protein Date Bank (RCSB PDB, http://www.rcsb.org (accessed on 7 November 2021).) and was prepared by Protein preparation wizard. The default values of the docking parameters were accepted. The active site was defined based on the co-crystal structure of T38 bound to PA N endonuclease complex. The docking results in our study were visualized using the PyMOL Molecular Graphics System (version 1.3).
Some important components, such as the binding modes, molecular interactions with the active site, binding energy and docking scores, were considered as the criterion in selecting the best poses of the docked compounds. In this study, docking compounds were ranked according to their docking scores, hydrogen bond interactions, and estimated activity. T38 was chosen as the control compound to analyze the docking results and docked on the same active site of the PA N endonuclease protein (PDB ID: 6E6W).
The specific docking interactions of the two compounds (T38 and Hit01) with 6E6W were enlisted in Table 6 and Figure 6. The filtered molecule bonded in the binding sites of PA N endonuclease protein (PDB ID 6E6W) with docking scores of −6.622 kcal/mol. However, the docking score of T38 with 6E6W was −8.227 kcal/mol. Previous studies demonstrated that the metal-chelating active site of PA N was a negatively charged pocket, which consisted of a histidine (His41), a conserved lysine (Lys134), and a cluster of three acidic residues (Glu80, Asp108, and Glu119), and it could bind two divalent metal ions (Mn 2+ ) [14,15,[29][30][31][32][33][34]. It was noticeable that compounds T38 and Hit01 were located in the same pocket and generated hydrogen bonding with the same key amino acids Lys134 (1.9 Å and 2.8 Å). The N atom on tetrazole and oxygen atom on hydroxyl of carboxylic acid generated salt bridge with amino acids Arg124 and Mn 2+ , respectively. In addition, tetrazole ring generated Pi-cation interaction with Lys34. The analysis of the docking results and binding pattern implied that compound Hit01 formed the same H-bond interaction with 6E6W compared with T38. The divalent metal ions (Mn 2+ cations) present at the catalytic site were critical for the endonuclease activity. Interestingly, the interactions between divalent metal ions and two compounds (T38 and Hit01) were similar to the interaction between diketo acid derivatives and divalent metal ions. The carboxylate of diketo acid (DKA) moiety chelates the first Mn 2+ ion, while the second Mn 2+ ion of PA N interacts with the oxygen atom of its carboxylate moiety and the α-hydroxyl group, or the two Mn 2+ ions simply form a chelate with the β-diketone group. Functional groups that chelate the Mn 2+ ion of PA N were crucial for inhibitors targeting the metal-chelating active site of PA N . This implied that pyrimidinone was an important moiety for interaction with divalent metal ions.    As the pyrimidinone derivative, raltegravir was initially approved by FDA in late 2007 as the first available agent targeted HIV integrase [35]. We considered Hit01 (raltegravir's derivative) as the hit compound and structured a compound database to detect more effective PA N endonuclease inhibitors.

New Designed Compounds
We selected Hit01 for Lead optimization, and 197 molecules were produced as derivatives. Subsequently, these 197 molecules were docked to the influenza PA N endonuclease protein again to calculate the docking scores, estimate activity values, and fit values. The pathway to elicit compounds (Hit07-Hit12) from template molecule (Hit01) using sidechain hopping were shown in Figure 7. The top six PA N inhibitors (Hit07-Hit09 and Hit10-Hit12) were presented in Table 7 based on the molecular docking results and the predictable activity results.  The docking scores of selected candidate compounds (Hit07-Hit09 and Hit10-Hit12) were lower than Hit01, except Hit07 and Hit11. Furthermore, the estimated activity values of all compounds were more declined than Hit01 (estimated activity value was 1.16424 Figure 7. The pathway to elicit compounds (Hit07-Hit12) from template molecule (Hit01) using side-chain hopping. As the pyrimidinone derivative, raltegravir was initially approved by FDA in late 2007 as the first available agent targeted HIV integrase [35]. We considered Hit01 (raltegravir's derivative) as the hit compound and structured a compound database to detect more effective PAN endonuclease inhibitors.

New Designed Compounds
We selected Hit01 for Lead optimization, and 197 molecules were produced as derivatives. Subsequently, these 197 molecules were docked to the influenza PAN endonuclease protein again to calculate the docking scores, estimate activity values, and fit values. The pathway to elicit compounds (Hit07-Hit12) from template molecule (Hit01) using side-chain hopping were shown in Figure 7. The top six PAN inhibitors (Hit07-Hit09 and Hit10-Hit12) were presented in Table 7 based on the molecular docking results and the predictable activity results. The docking scores of selected candidate compounds (Hit07-Hit09 and Hit10-Hit12) were lower than Hit01, except Hit07 and Hit11. Furthermore, the estimated activity values of all compounds were more declined than Hit01 (estimated activity value was 1.16424 µM) based on the prediction results. Table 7 enlisted the structures of the six raltegravir's derivatives selected after Lead optimization and molecular docking. The functional groups were circled in blue and pink after Lead optimization and the control compound remained unchanged. It was noticeable that the six potent compounds which had good results in estimated activity could fit the values and docking scores after Lead optimization. The specific pharmacophore mapping and docking poses of the six compounds were shown in Supplementary Materials Figures S1 and S2.
Hence, Hit01 and the novel six compounds (Hit07-Hit09 and Hit10-Hit12) could be further developed and forwarded to be effective drugs for the treatment of influenza (Figure 7).

Compound Preparations
The two-dimensional (2D) structures of the 37 known PA N endonuclease inhibitors were drawn with BIOVIA Draw 2016 [23][24][25][26]. Then, the 2D structures were converted into their corresponding 3D form using Discovery Studio [36]. We used the Generate Training and Test Data algorithm in Discovery Studio to split 37 objects into a training set and test set. The split method was set as random. Subsequently, 37 known PA N endonuclease inhibitors with diverse molecular structural patterns were input into the "Input ligands" item. The training set percentage was set as 70 [37,38]. Subsequently, we obtained 25 training set compounds and 12 test set compounds. The structures of these training set compounds and test set compounds were given in Figures 8 and 9, respectively. The training set was used to generate the pharmacophore model and the test set was used to evaluate the predictive ability of the generated pharmacophore model. The activity of training set (0.011 µM to 37 µM) and test set (0.047 µM to 22 µM) spanned over 5 orders.
μM) based on the prediction results. Table 7 enlisted the structures of the six raltegravir's derivatives selected after Lead optimization and molecular docking. The functional groups were circled in blue and pink after Lead optimization and the control compound remained unchanged. It was noticeable that the six potent compounds which had good results in estimated activity could fit the values and docking scores after Lead optimization. The specific pharmacophore mapping and docking poses of the six compounds were shown in Supplementary Materials Figures S1 and S2.
Hence, Hit01 and the novel six compounds (Hit07-Hit09 and Hit10-Hit12) could be further developed and forwarded to be effective drugs for the treatment of influenza (Figure 7).

Compound Preparations
The two-dimensional (2D) structures of the 37 known PAN endonuclease inhibitors were drawn with BIOVIA Draw 2016 [23][24][25][26]. Then, the 2D structures were converted into their corresponding 3D form using Discovery Studio [36]. We used the Generate Training and Test Data algorithm in Discovery Studio to split 37 objects into a training set and test set. The split method was set as random. Subsequently, 37 known PAN endonuclease inhibitors with diverse molecular structural patterns were input into the "Input ligands" item. The training set percentage was set as 70 [37,38]. Subsequently, we obtained 25 training set compounds and 12 test set compounds. The structures of these training set compounds and test set compounds were given in Figures 8 and 9, respectively. The training set was used to generate the pharmacophore model and the test set was used to evaluate the predictive ability of the generated pharmacophore model. The activity of training set (0.011 μM to 37 μM) and test set (0.047 μM to 22 μM) spanned over 5 orders.
We used Smart Minimizer algorithm for further minimization of each compound based on CHARMM force field method. In addition, each compound formed up to 255 different conformations in order to generate pharmacophore hypothesis or predict the activity of the newly found compounds [39][40][41]. We used Smart Minimizer algorithm for further minimization of each compound based on CHARMM force field method. In addition, each compound formed up to 255 different conformations in order to generate pharmacophore hypothesis or predict the activity of the newly found compounds [39][40][41].

Generation of Pharmacophore Models
The pharmacophore models which can be generated using 3D-QSAR Pharmacophore Generation protocol are correlated with the specific chemical features that are necessary for the biological activity of the molecules. We used the Feature Mapping protocol in Discovery Studio to seek the different chemical features presented on the training set molecules. These features, including hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), hydrophobic (HYD), positive ionizable (PI), ring aromatic (RA) and negative ionizable (NI), were selected for the 3D-QSAR Pharmacophore Generation protocol. FAST algorithm method was applied to generate acceptable conformations for each compound with 10 kcal/mol as the energy threshold, and maximum generated conformations were set to 255. The uncertainty values of the training set and test set were set to 1.5 and the IC 50 values of individual training set compounds were selected as an active property and the energy threshold was maintained at 20 kcal/mol during the pharmacophore generation. The minimum interfeature distance was changed from 2.97 to 2.0 and the maximum excluded volume was set to zero [2,42].
The pharmacophore models were generated according to significant statistical parameters such as total cost value, cost difference, error, root-mean-square deviation (RMSD), correlation coefficient (r 2 ) and pharmacophore features.

Pharmacophore Validation
Cost analysis, Fischer's randomization test and test set analysis are the three methods to validate the pharmacophore models.
There are three kinds of costs reported in Hypogen, including fixed cost, null cost and total cost. ∆Cost (Null cost-Total cost) is considered as a key parameter for the quality of the pharmacophore model. It infers a significant correlation if the cost difference is more than 60 bits. The model should fall in a prediction range of 70-90% if their cost difference is in the range of 40-60 bits regions. If the cost difference is less than 40 bits, it will be problematic during predicting correlation probability [43].
Fischer's randomization technique acts as a fundamental role in making a correlation between the structural and biological activity in training set compounds. The validation of the selected pharmacophore hypothesis in this randomization technique produced 19 random spreadsheets in 95% confidence levels by shuffling the activity values of the training set compounds [44,45]. The processes of building and minimizing all test set compounds are similar to that of all training set molecules. The test set with 12 molecules was selected to validate the pharmacophore model. Ligand Pharmacophore Mapping protocol in Discovery Studio was utilized to overlap the validated pharmacophore with the active molecules [46].

Database Screening
Well validated pharmacophore Hypo1 was used as a 3D query for identifying potential leading molecules from a chemical database which containing 71 HIV-1 integrase inhibitors [27] as previously reported and additional 1845 drugs (37 antibacterial inhibitors, 131 antiviral inhibitors and 1677 other approved drugs by FDA) that obtained from ZINC database (ZINC, http://zinc.docking.org (accessed on 7 November 2021).). The fast search method was carried out to obtain the 'hit' compounds matching with the features in the best pharmacophore Hypo1. Then, the filtered molecules with estimated activity values <10.0 µM were implied for further screening of absorption, distribution, metabolism, excretion and toxicity (ADMET) properties.

Determination of In-Silico Pharmacokinetic Properties
Screened and selected compounds were subjected to analyze the ADMET properties and the drug-likeness. Nowadays, a popular method to predict ADMET properties is the in silico methodology, such as Discover Studio 2016 ADMET PREDICT [47]. Although there are certain limitations, it is a vital method to cut down the actual costs for in vitro analysis. In this study, we used Discover Studio 2016 ADMET PREDICT to analyze the ADMET properties and the drug-likeness. Then, the compounds with better ADME properties and lower toxicity were subjected to the following molecular docking study.

Molecular Docking
The Gild-XP (extra precision) was applied to perform virtual screening and study the interactions between the selected molecular candidates and the protein structure [48,49]. We selected the influenza PA N endonuclease protein (PDB ID: 6E6W) based on the previous literature sources [22][23][24][25][26] and retrieved protein crystal structure (PDB ID: 6E6W) from the RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org/ (accessed on 7 November 2021)). Schrödinger's protein preparation wizard was used to remove the cofactors, co-crystallized ligand and water molecules. We also used it to add missing residues and hydrogens, generate Het states and optimize the selected protein [39]. The prepared protein structure was further processed for grid generation [50]. The active site was decided by default parameters like centroid of inbound co-crystal ligand and looking at key residues of in-bound ligand. The box size was set to 20 Å. The selected ligands from the results of 3D-QSAR models were implied to the molecular docking by Gild-XP (extra precision). The docking results were ranked on the basis of the Gild score, and the top-ranked molecules were visually analyzed.

New Designed Compounds
In the protocol of Replace Fragment, Hit01 was selected as the 'hit' compound, and we hop its side-chain by fusing the same fragments onto the molecules to improve the activity. Keeping the original molecular skeleton (Pyrimidinone), 4-fluorobenzyl and 1,3,4-oxadiazole-2-carboxamide were named as Group and Group 1 to imply side-chain hopping. The fragment library was set as the default parameter. The Generate fragment conformation and Parallel processing were applied. The designed compounds, which fit all the possible features of the validated pharmacophore Hypo1, were docked back to the active site of PA N endonuclease again for evaluating binding poses and binding affinity.

Conclusions
This study provided the development of ligand-based pharmacophore model by 3D-QSAR Pharmacophore Generation protocol using Discovery Studio.
25 diverse compounds were considered as a training set for the development of the new pharmacophores model. We selected the best quantitative pharmacophore (Hypo1) from 10 other pharmacophores based on the highest cost difference (272.95), lowest total cost value (130.623) and best correlation coefficient (0.910618). The selected Hypo1 model consisted of four features (HBA, HBD, HYD and RA) and had been cross-validated by cost analysis, test set predictions, and Fischer's randomization test. The test set utilized for evaluating the predictive ability of Hypo1 model consisted of 12 compounds. Then, we obtained the resulting correlation coefficient between the estimated activity and experimental activity for the 12 test set compounds, which was 0.844000. The Hypo1 model was used as a 3D query for the virtual screening of 1916 compounds (71 HIV-1 integrase inhibitors, 37 antibacterial inhibitors, 131 antiviral inhibitors and 1677 other approved drugs by FDA). The 6 compounds with estimated activity less than 10 µM were hit. The 6 compounds were subjected to further ADMET studies and were carried out toxicity assessment studies under TOPKAT program to obtain candidate compounds. Subsequently, Hit01 was selected based on the filtration and retained for docking study, which exhibited better docking scores than the reported control compound (T38) according to the docking studies. Then we compared the docking scores and interaction with the active site residues of Hit01 (−6.622 kcal/mol) with the standard compound (T38). Based on our findings, the hit compound (Hit01) was utilized for designing a future class of potential PA N endonuclease inhibitors. Hit01 was gone for lead optimization, and then 197 molecules were produced as derivatives. After lead optimization, 6 potent compounds obtained good results in estimate activity, fit values and docking scores. Therefore, we speculate that these 7 compounds (Hit01, Hit07, Hit08,  Hit09, Hit10, Hit11 and Hit12) will target PA N endonuclease to exhibit good anti-influenza virus activity. Extensive wet-laboratory experimentation is needed to transmute these seven pyrimidinone derivatives (Hit01, Hit07, Hit08, Hit09, Hit10, Hit11 and Hit12) into clinical drugs.

Supplementary Materials:
The following are available online. The structures, estimated activity, specific pharmacophore mapping and docking poses of the hit compounds were supported in Table S1, Figures S1 and S2.
Author Contributions: C.Z., J.X., Q.X., J.Z., H.Z. and E.H. contributed to the data collection and investigation. C.Z. and X.L. contributed to the original draft preparation. C.Z., H.Z., P.S. and C.H. proposed the studies and contributed to the writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding authors.