A Workflow Combining Machine Learning with Molecular Simulations Uncovers Potential Dual-Target Inhibitors against BTK and JAK3

The drug development process suffers from low success rates and requires expensive and time-consuming procedures. The traditional one drug–one target paradigm is often inadequate to treat multifactorial diseases. Multitarget drugs may potentially address problems such as adverse reactions to drugs. With the aim to discover a multitarget potential inhibitor for B-cell lymphoma treatment, herein, we developed a general pipeline combining machine learning, the interpretable model SHapley Additive exPlanation (SHAP), and molecular dynamics simulations to predict active compounds and fragments. Bruton’s tyrosine kinase (BTK) and Janus kinase 3 (JAK3) are popular synergistic targets for B-cell lymphoma. We used this pipeline approach to identify prospective potential dual inhibitors from a natural product database and screened three candidate inhibitors with acceptable drug absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. Ultimately, the compound CNP0266747 with specialized binding conformations that exhibited potential binding free energy against BTK and JAK3 was selected as the optimum choice. Furthermore, we also identified key residues and fingerprint features of this dual-target inhibitor of BTK and JAK3.


Introduction
In the 20th century, the doctrine "one molecule, one target, one disease" served as a guiding principle for the pharmaceutical industry.However, this paradigm was recognized to be unsatisfactory for therapeutic effects of multifactorial diseases such as tumors and immune system diseases [1,2].Therefore, it is crucial to discover drugs that simultaneously manipulate multiple targets and interrupt the pathogenesis process of multifactorial diseases [3].Studies have highlighted the overexpression of kinases in many cancers [4], and different kinase inhibitors have gained popularity as potential antitumor agents [5].However, concerns of drug resistance and off-targeting toxicity are yet unaddressed [6,7], and multitarget drugs that can overcome these limitations are warranted.For instance, Bruton's tyrosine kinase (BTK) and Janus kinase 3 (JAK3) are two validated and therapeutically amenable targets to effectively treat B-cell lymphomas and can be used to develop a dual-target inhibitor [8].As with most kinases, BTK and JAK3 share similar structures in the binding pocket, including the hinge (connecting C-terminal and N-terminal), glycine-rich loop (GRL), αC helix, and highly conserved DEG motif (Figure A1a) [9,10].
BTK belongs to the nonreceptor Tec tyrosine kinase family, widely expressed in hematopoietic cells.BTK plays an extremely important role in signaling through the Fcγ receptor (FcγR) and B-cell antigen receptor (BCR) [11,12], and its deregulation has been associated with many B cell-related malignancies such as multiple myeloma (MM) and lymphocytic leukemia (CLL) [13,14].A few BTK inhibitors, including ibrutinib [15,16], orelabrutinib [17], and pirtobrutinib [18,19], have been approved by the US Food and Drug Administration (FDA), and several new ones are at different stages of trials (Figure A1b).Pirtobrutinib is a third-generation noncovalent inhibitor with better safety and improved selectivity for many B cell-derived diseases [18].JAKs are members of the nonreceptor tyrosine kinase family that mediate growth factor production and cytokine and play crucial roles in immune signaling [20].JAKs comprise tyrosine kinase TYK2, JAK1, JAK2, and JAK3.JAK3 exhibits a binding pocket region that is highly conserved with other JAK family kinases except at residues CYS909 and ALA966 [21,22].JAK3 is mainly expressed in hematopoietic cells, and was proved to play a crucial role in the mediation of the antiapoptotic phosphoinositide 3-kinase (PI3K)-protein kinase B (AKT) pathway and survival of leukemic B-cell precursors [23][24][25].Therefore, JAK3 is an appealing target for lymphoid malignancies and a potential target for autoimmune diseases, given its important functions in the immune system.The FDA has approved several drugs such as tofacitinib [26,27] and peficitinib [28] (Figure A1b).Given their synergistic effects, the simultaneous inhibition of the BTK/JAK3 signaling pathways can be an optimal therapy as compared with drugs against single targets [29,30].A common issue with kinase inhibitors is toxic side effects.Due to their low toxicity and wide availability, natural products are a meaningful source for the exploration of BTK and JAK3 dual-target inhibitors.Since natural product inhibitors against BTK and JAK3 have been largely unreported, the research on active natural product inhibitors is promising and valuable.However, the restrictions of traditional screening methods have made it hard to keep up with the rapid pace of drug development.
Computer-aided drug design (CADD) has evolved into a necessary tool for drug discovery [31] and has remarkable potential for single-target discovery [32,33].However, the selection of target combinations to achieve the desired efficacy is still challenging.Many computational derivative tools developed over the last few decades have successfully transformed molecular structural information from experimental data into a molecular characterization.The application of these algorithms to the drug discovery pipeline may reduce resource expenditures and provide a direction for the design of drugs specific for dual targets.Examples of BTK and JAK3 dual targets have been previously described where simultaneous inhibition of BTK and JAK3 not only effectively inhibited the signaling pathway of malignancy growth but also addressed the concern of drug resistance [30].
Herein, we introduce a general pipeline that integrates machine learning methods, the interpretable model SHapley Additive exPlanation (SHAP) [34], and molecular dynamics simulations for discovering potential dual inhibitors (Figure 1).We applied this pipeline to the discovery of potential dual inhibitors of BTK and JAK3 from a natural product database (the Coconut database is a generalized natural product database which has been consolidated from 53 different databases and the literature).The machine learning models were built using random forests (RFs), extra trees (ETs), and extreme gradient boosting (XGB) and validated for prediction of inhibitor activities.Later, we applied these three models to predict active compounds from the natural products to discover potential dual inhibitors of BTK and JAK3.We used the interpretable model SHAP to interpret the effect of individual active molecular fingerprint features on outcome prediction.Next, three compounds against BTK and JAK3 were identified via molecular dynamics and binding free energy.These compounds have potential to serve as dual-target inhibitors against BTK and JAK3.Finally, we selected the optimal compound CNP0266747 among the screened compounds as the most promising potential inhibitor.
as dual-target inhibitors against BTK and JAK3.Finally, we selected the optimal compound CNP0266747 among the screened compounds as the most promising potential inhibitor.

Spatial Diversity
The chemical spatial distributions reflect the reasonability of the data, which influences the model construction process.We evaluated the rationality of the data by calculating 2048 extended connectivity fingerprints (ECFPs) for the training and test sets.We applied the downscaling method of uniform manifold approximation and projection (UMAP) to visually represent their chemical spatial distribution [35].As displayed in Figure 2, the UMAP diagram clearly demonstrates the wide chemical spatial distribution of the training and test data.The quality of the data is an important issue to consider before constructing machine learning models.Therefore, the high diversity in the training and test sets proves that our data have excellent robustness.Our analysis indicates that the compounds used to build this model were reasonable and differed in their chemical structures.

Establishment and Validation of Models
To obtain the optimal combination of super parameters, we used the Bayesian optimization method.We applied them to the classification models, employed three machine learning models (RF, ET, and XGB) for model construction, and performed tenfold cross-validation.The scores for each model are displayed in Table 1 and Table 2.The three classification models had high values (more than 0.9) of recall, precision, F1 score, and accuracy for the test set.In addition, we also plotted the receiver operating

Spatial Diversity
The chemical spatial distributions reflect the reasonability of the data, which influences the model construction process.We evaluated the rationality of the data by calculating 2048 extended connectivity fingerprints (ECFPs) for the training and test sets.We applied the downscaling method of uniform manifold approximation and projection (UMAP) to visually represent their chemical spatial distribution [35].As displayed in Figure 2, the UMAP diagram clearly demonstrates the wide chemical spatial distribution of the training and test data.The quality of the data is an important issue to consider before constructing machine learning models.Therefore, the high diversity in the training and test sets proves that our data have excellent robustness.Our analysis indicates that the compounds used to build this model were reasonable and differed in their chemical structures.against BTK and JAK3.Finally, we selected the optimal compound CNP0266747 among the screened compounds as the most promising potential inhibitor.

Spatial Diversity
The chemical spatial distributions reflect the reasonability of the data, which influences the model construction process.We evaluated the rationality of the data by calculating 2048 extended connectivity fingerprints (ECFPs) for the training and test sets.We applied the downscaling method of uniform manifold approximation and projection (UMAP) to visually represent their chemical spatial distribution [35].As displayed in Figure 2, the UMAP diagram clearly demonstrates the wide chemical spatial distribution of the training and test data.The quality of the data is an important issue to consider before constructing machine learning models.Therefore, the high diversity in the training and test sets proves that our data have excellent robustness.Our analysis indicates that the compounds used to build this model were reasonable and differed in their chemical structures.

Establishment and Validation of Models
To obtain the optimal combination of super parameters, we used the Bayesian optimization method.We applied them to the classification models, employed three machine learning models (RF, ET, and XGB) for model construction, and performed tenfold crossvalidation.The scores for each model are displayed in Tables 1 and 2. The three classification models had high values (more than 0.9) of recall, precision, F1 score, and accuracy for the test set.In addition, we also plotted the receiver operating characteristic (ROC) curves of the three models for two targets and provided the corresponding area under the curve

Establishment and Validation of Models
To obtain the optimal combination of super parameters, we used the Bayesian optimization method.We applied them to the classification models, employed three machine learning models (RF, ET, and XGB) for model construction, and performed tenfold crossvalidation.The scores for each model are displayed in Tables 1 and 2. The three classification models had high values (more than 0.9) of recall, precision, F1 score, and accuracy for the test set.In addition, we also plotted the receiver operating characteristic (ROC) curves of the three models for two targets and provided the corresponding area under the curve (AUC) values (Figure 3).The ROC curve is a significant indicator for evaluation of the prediction and classification capability of the models.The closer to 1 the AUC value is, the better its capacity for classification.As seen in Figure 3, the AUC values of the test set for RFs, ETs, and XGB were more than 0.95.Thus, the evaluation indicators of the three models were on the same level.Together, these results indicate that all of our models exhibit high predictive power and robust classification capabilities for the compounds.Therefore, we could apply these models to predict active molecules from the database.(AUC) values (Figure 3).The ROC curve is a significant indicator for evaluation prediction and classification capability of the models.The closer to 1 the AUC va the better its capacity for classification.As seen in Figure 3, the AUC values of the t for RFs, ETs, and XGB were more than 0.95.Thus, the evaluation indicators of the models were on the same level.Together, these results indicate that all of our mod hibit high predictive power and robust classification capabilities for the compo Therefore, we could apply these models to predict active molecules from the datab

Explanatory Analysis of Models
As fingerprint space was used to construct machine learning models, it is poss determine the biological activity of compounds from different chemical groups or m ular skeletons.To gain a better understanding of the importance of fingerprint frag on the specific direction of decisions of the models, we employed a feature density s plot in SHAP as a holistic approach to interpretation.This scatter plot sorted the Sh value of each feature into the corresponding position coordinates.

Explanatory Analysis of Models
As fingerprint space was used to construct machine learning models, it is possible to determine the biological activity of compounds from different chemical groups or molecular skeletons.To gain a better understanding of the importance of fingerprint fragments on the specific direction of decisions of the models, we employed a feature density scatter plot in SHAP as a holistic approach to interpretation.This scatter plot sorted the Shapley value of each feature into the corresponding position coordinates.As shown in Figure 4, the y-axis indicates the importance of the model's predictive features and the x-axis shows the effect on model predictions (red color indicates the sample point has a large Shapley value and blue color indicates a small Shapley value).We combined the Shapley values with sample point colors to investigate the relationship between feature variation and decision direction.Shapley values greater than 0 indicate a positive impact, and vice versa for negative impact.Based on this description, we found fingerprint fragments with large Shapley values using the scatter plot (Figure 4).
with sample point colors to investigate the relationship between feature variation and cision direction.Shapley values greater than 0 indicate a positive impact, and vice v for negative impact.Based on this description, we found fingerprint fragments with l Shapley values using the scatter plot (Figure 4).3).A larger Shapley valu creases the probability of the molecule being active.Therefore, the Shapley value of gerprint fragment 339 was located at the top of the three models, which gathered ou tention.
JAK3: The top 10 Shapley values of the fingerprint fragments are shown in Figu Only the fingerprint fragments 1589, 1535, and 1114 appeared in the top 10 Shapley va of all three models (Table 3) and were the focus objects, especially 1589, which had highest value of the three fingerprints.These findings not only provide an insight into impact of fingerprint features on the model but also serve as the foundation for fragm based drug design.3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.with sample point colors to investigate the relationship between feature variation and decision direction.Shapley values greater than 0 indicate a positive impact, and vice versa for negative impact.Based on this description, we found fingerprint fragments with large Shapley values using the scatter plot (Figure 4).3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.with sample point colors to investigate the relationship between feature variation and decision direction.Shapley values greater than 0 indicate a positive impact, and vice versa for negative impact.Based on this description, we found fingerprint fragments with large Shapley values using the scatter plot (Figure 4).3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.

Virtual Screening
Given the same level of the predictive categorization capacity of the three machine learning models, we employed the equal weights of the RF, ET, and XGB models to screen the natural compounds from the Coconut database.To further increase the accuracy of classification and exclude redundant and incompatible molecules, we calculated fingerprints for the molecules from the database and eliminated those with correlation coefficients < 0.1.Finally, we obtained a dataset containing 123,398 molecules.The three models were used to virtually screen Coconut, which yielded 14,465 candidate inhibitors for further molecular docking experiments.
Molecular docking describes the pose and location of the ligand at the binding pocket of the protein.To validate the accuracy of the docking results, AutoDock Vina was used to redock the co-crystallized compounds of BTK (PDB ID: 8FLL) and JAK3 (PDB ID: 6AAK).In Figure A2, the two co-crystallized proteins reproduced the original docking's consistent spatial orientation with affinities of −11.2 and −10.5 kcal/mol for BTK and JAK3, respectively.Thus, our docking results are reliable and can be applied to determine potential inhibitors of BTK and JAK3.
Next, we used AutoDock Vina to complete the docking operations of kinases (BTK and JAK3) and molecules from the database.We positioned the docking box at the site of the eutectic small molecule, and the size of the two docking boxes was set to 20 × 20 × 20 Å.Low affinity indicates the likeliness of the molecule to bind better to the target protein.Therefore, the value of docking affinity can allow us to further eliminate inactive compounds.According to the docking binding energy, we individually ranked the compounds of BTK and JAK3 and removed those with positive and high affinities.
We speculated that selection of kinase inhibitor candidates from those with high ranks in binding energy might lead to redundant analogs.Therefore, we performed clustering [36] to obtain low-affinity representative molecules to increase molecular structural diversity.The downscaled molecular fingerprints were used as inputs for clustering analysis (Figure 5).To discover the dual-target inhibitors of BTK and JAK3, we identified molecules with low docking binding energy toward both BTK and JAK3.Finally, three natural product molecules from three different clusters and with low binding energy were selected.

Virtual Screening
Given the same level of the predictive categorization capacity of the three machine learning models, we employed the equal weights of the RF, ET, and XGB models to screen the natural compounds from the Coconut database.To further increase the accuracy of classification and exclude redundant and incompatible molecules, we calculated fingerprints for the molecules from the database and eliminated those with correlation coefficients < 0.1.Finally, we obtained a dataset containing 123,398 molecules.The three models were used to virtually screen Coconut, which yielded 14,465 candidate inhibitors for further molecular docking experiments.
Molecular docking describes the pose and location of the ligand at the binding pocket of the protein.To validate the accuracy of the docking results, AutoDock Vina was used to redock the co-crystallized compounds of BTK (PDB ID: 8FLL) and JAK3 (PDB ID: 6AAK).In Figure A2, the two co-crystallized proteins reproduced the original docking's consistent spatial orientation with affinities of −11.2 and −10.5 kcal/mol for BTK and JAK3, respectively.Thus, our docking results are reliable and can be applied to determine potential inhibitors of BTK and JAK3.
Next, we used AutoDock Vina to complete the docking operations of kinases (BTK and JAK3) and molecules from the database.We positioned the docking box at the site of the eutectic small molecule, and the size of the two docking boxes was set to 20 × 20 × 20 Å.Low affinity indicates the likeliness of the molecule to bind better to the target protein.Therefore, the value of docking affinity can allow us to further eliminate inactive compounds.According to the docking binding energy, we individually ranked the compounds of BTK and JAK3 and removed those with positive and high affinities.
We speculated that selection of kinase inhibitor candidates from those with high ranks in binding energy might lead to redundant analogs.Therefore, we performed clustering [36] to obtain low-affinity representative molecules to increase molecular structural diversity.The downscaled molecular fingerprints were used as inputs for clustering analysis (Figure 5).To discover the dual-target inhibitors of BTK and JAK3, we identified molecules with low docking binding energy toward both BTK and JAK3.Finally, three natural product molecules from three different clusters and with low binding energy were selected.

Virtual Screening
Given the same level of the predictive categorization capacity of the three machine learning models, we employed the equal weights of the RF, ET, and XGB models to screen the natural compounds from the Coconut database.To further increase the accuracy of classification and exclude redundant and incompatible molecules, we calculated fingerprints for the molecules from the database and eliminated those with correlation coefficients < 0.1.Finally, we obtained a dataset containing 123,398 molecules.The three models were used to virtually screen Coconut, which yielded 14,465 candidate inhibitors for further molecular docking experiments.
Molecular docking describes the pose and location of the ligand at the binding pocket of the protein.To validate the accuracy of the docking results, AutoDock Vina was used to redock the co-crystallized compounds of BTK (PDB ID: 8FLL) and JAK3 (PDB ID: 6AAK).In Figure A2, the two co-crystallized proteins reproduced the original docking's consistent spatial orientation with affinities of −11.2 and −10.5 kcal/mol for BTK and JAK3, respectively.Thus, our docking results are reliable and can be applied to determine potential inhibitors of BTK and JAK3.
Next, we used AutoDock Vina to complete the docking operations of kinases (BTK and JAK3) and molecules from the database.We positioned the docking box at the site of the eutectic small molecule, and the size of the two docking boxes was set to 20 × 20 × 20 Å.Low affinity indicates the likeliness of the molecule to bind better to the target protein.Therefore, the value of docking affinity can allow us to further eliminate inactive compounds.According to the docking binding energy, we individually ranked the compounds of BTK and JAK3 and removed those with positive and high affinities.
We speculated that selection of kinase inhibitor candidates from those with high ranks in binding energy might lead to redundant analogs.Therefore, we performed clustering [36] to obtain low-affinity representative molecules to increase molecular structural diversity.The downscaled molecular fingerprints were used as inputs for clustering analysis (Figure 5).To discover the dual-target inhibitors of BTK and JAK3, we identified molecules with low docking binding energy toward both BTK and JAK3.Finally, three natural product molecules from three different clusters and with low binding energy were selected.

Virtual Screening
Given the same level of the predictive categorization capacity of the three machine learning models, we employed the equal weights of the RF, ET, and XGB models to screen the natural compounds from the Coconut database.To further increase the accuracy of classification and exclude redundant and incompatible molecules, we calculated fingerprints for the molecules from the database and eliminated those with correlation coefficients < 0.1.Finally, we obtained a dataset containing 123,398 molecules.The three models were used to virtually screen Coconut, which yielded 14,465 candidate inhibitors for further molecular docking experiments.
Molecular docking describes the pose and location of the ligand at the binding pocket of the protein.To validate the accuracy of the docking results, AutoDock Vina was used to redock the co-crystallized compounds of BTK (PDB ID: 8FLL) and JAK3 (PDB ID: 6AAK).In Figure A2, the two co-crystallized proteins reproduced the original docking's consistent spatial orientation with affinities of −11.2 and −10.5 kcal/mol for BTK and JAK3, respectively.Thus, our docking results are reliable and can be applied to determine potential inhibitors of BTK and JAK3.
Next, we used AutoDock Vina to complete the docking operations of kinases (BTK and JAK3) and molecules from the database.We positioned the docking box at the site of the eutectic small molecule, and the size of the two docking boxes was set to 20 × 20 × 20 Å.Low affinity indicates the likeliness of the molecule to bind better to the target protein.Therefore, the value of docking affinity can allow us to further eliminate inactive compounds.According to the docking binding energy, we individually ranked the compounds of BTK and JAK3 and removed those with positive and high affinities.
We speculated that selection of kinase inhibitor candidates from those with high ranks in binding energy might lead to redundant analogs.Therefore, we performed clustering [36] to obtain low-affinity representative molecules to increase molecular structural diversity.The downscaled molecular fingerprints were used as inputs for clustering analysis (Figure 5).To discover the dual-target inhibitors of BTK and JAK3, we identified molecules with low docking binding energy toward both BTK and JAK3.Finally, three natural product molecules from three different clusters and with low binding energy were selected.
As shown in Table 4, all selected candidates followed the Lipinski rule and exhibited good synthesizability (SA score < 6; the smaller the value, the easier it is to synthesize compounds).In terms of the physicochemical properties of the screened molecules, the LogP value was in the range of 0.7-6.0,indicating their hydrophobic nature and the easy accessibility of the hydrophobic pocket of the proteins.The LogS value was around −6 (insoluble < −10 < not easily soluble < −6 < soluble), which indicated the solubility of the molecules in water.Considering the pharmacokinetic properties of HIA, all molecules showed a high probability of being absorbed in the intestine while not being easily permeable through the skin (the more negative the LogKp value, the lower the skin permeability).These prediction results indicate the acceptable ADMET properties of these selected potential inhibitors.

Molecular Dynamics Simulations
To explore the stability of the docking complex and its specific interaction, we conducted molecular dynamics simulations using Gromacs 2020.First, we performed molecular dynamics simulations of BTK and JAK3 with the inhibitors pirtobrutinib and pefi-
As shown in Table 4, all selected candidates followed the Lipinski rule and exhibited good synthesizability (SA score < 6; the smaller the value, the easier it is to synthesize compounds).In terms of the physicochemical properties of the screened molecules, the LogP value was in the range of 0.7-6.0,indicating their hydrophobic nature and the easy accessibility of the hydrophobic pocket of the proteins.The LogS value was around −6 (insoluble < −10 < not easily soluble < −6 < soluble), which indicated the solubility of the molecules in water.Considering the pharmacokinetic properties of HIA, all molecules showed a high probability of being absorbed in the intestine while not being easily permeable through the skin (the more negative the LogKp value, the lower the skin permeability).These prediction results indicate the acceptable ADMET properties of these selected potential inhibitors.

Molecular Dynamics Simulations
To explore the stability of the docking complex and its specific interaction, we conducted molecular dynamics simulations using Gromacs 2020.First, we performed molecular dynamics simulations of BTK and JAK3 with the inhibitors pirtobrutinib and peficitinib, respectively, for 100 ns, which served as a reference for subsequent complex system simula-tions.Next, using the docked conformation as the initial structure, we simulated the three selected molecules for 100 ns in BTK and JAK3 complexes.Finally, a total of eight sets of simulation results were used for subsequent analysis.
Root Mean Square Deviation (RMSD) RMSD measures the stability of a protein-ligand bond.In general, high values of RMSD are indicative of more dramatic alterations during the simulation process.We analyzed changes in complexes from the start conformation to the end location using RMSD.All complexes exhibited low RMSD values (less than 0.3 nm) throughout the process of simulation.We observed smooth RSMD curves over a long time period in the entire complex system (Figure 6), which implied an equilibrium.These eight RMSD curves remained largely consistent, except for some slight fluctuations and immediate rebalancing.In summary, overall RMSD analyses demonstrated that the complexes were at equilibrium after 10 ns of simulation.
Molecules 2023, 28, x FOR PEER REVIEW 8 of 21 citinib, respectively, for 100 ns, which served as a reference for subsequent complex system simulations.Next, using the docked conformation as the initial structure, we simulated the three selected molecules for 100 ns in BTK and JAK3 complexes.Finally, a total of eight sets of simulation results were used for subsequent analysis.
Root Mean Square Deviation (RMSD) RMSD measures the stability of a protein-ligand bond.In general, high values of RMSD are indicative of more dramatic alterations during the simulation process.We analyzed changes in complexes from the start conformation to the end location using RMSD.All complexes exhibited low RMSD values (less than 0.3 nm) throughout the process of simulation.We observed smooth RSMD curves over a long time period in the entire complex system (Figure 6), which implied an equilibrium.These eight RMSD curves remained largely consistent, except for some slight fluctuations and immediate rebalancing.In summary, overall RMSD analyses demonstrated that the complexes were at equilibrium after 10 ns of simulation.

MM/PBSA Binding Free Energy
We compared the binding abilities of the compounds and obtained the binding free energy values using the MM/PBSA method.As listed in Table 5, the binding free energy values of the two co-crystal complex systems BTK and JAK3 were −30.021 and −34.152 kcal/mol, respectively.The binding free energies of the three screened compounds toward their respective target protein were less than or close to the values of the binding free energies of the corresponding co-crystal complex systems.Thus, the screened compounds were all stabilized in the corresponding complex systems.The total binding free energy comprises van der Waals energy (Evdw), electrostatic energy (Eele), polar solvation (GPB), and nonpolar solvation (GNP).Herein, van der Waals energy was the largest component for the binding of compounds with BTK and JAK3.

MM/PBSA Binding Free Energy
We compared the binding abilities of the compounds and obtained the binding free energy values using the MM/PBSA method.As listed in Table 5, the binding free energy values of the two co-crystal complex systems BTK and JAK3 were −30.021 and −34.152 kcal/mol, respectively.The binding free energies of the three screened compounds toward their respective target protein were less than or close to the values of the binding free energies of the corresponding co-crystal complex systems.Thus, the screened compounds were all stabilized in the corresponding complex systems.The total binding free energy comprises van der Waals energy (E vdw ), electrostatic energy (E ele ), polar solvation (G PB ), and nonpolar solvation (G NP ).Herein, van der Waals energy was the largest component for the binding of compounds with BTK and JAK3.We investigated the residue contribution of the binding free energies by performing binding free energy decomposition analysis and explored the interaction between the ligand and protein.For BTK, most residues except Met477, Asp539, and Phe540 showed a consistent trend in their contribution to the binding energy (Figure 7).The residue Met477 positively contributed to the CNP0266747, CNP0332171, and pirtobrutinib complexes.Only the residues Asp539 and Phe540 presented favorable contributions for binding to CNP0266747 and pirtobrutinib.For JAK3, we observed a consistent trend in the contribution of most residues to binding energy, with the exception of residues Cys909, Arg953, and Asp967 (Figure 7).As shown in Figure 7, residues Cys909 and Asp953 displayed positive contributions for binding to CNP0266747 and CNP0332171, and the residue Ala966 displayed a large energy gap in the contribution to the binding of CNP0266747.In conclusion, the binding free energy findings prove that all the compounds bound tightly to the target proteins in the simulation process.The analysis of residue contributions exposed the differences in contributions of key active site residues in the respective targets.We investigated the residue contribution of the binding free energies by performing binding free energy decomposition analysis and explored the interaction between the ligand and protein.For BTK, most residues except Met477, Asp539, and Phe540 showed a consistent trend in their contribution to the binding energy (Figure 7).The residue Met477 positively contributed to the CNP0266747, CNP0332171, and pirtobrutinib complexes.Only the residues Asp539 and Phe540 presented favorable contributions for binding to CNP0266747 and pirtobrutinib.For JAK3, we observed a consistent trend in the contribution of most residues to binding energy, with the exception of residues Cys909, Arg953, and Asp967 (Figure 7).As shown in Figure 7, residues Cys909 and Asp953 displayed positive contributions for binding to CNP0266747 and CNP0332171, and the residue Ala966 displayed a large energy gap in the contribution to the binding of CNP0266747.In conclusion, the binding free energy findings prove that all the compounds bound tightly to the target proteins in the simulation process.The analysis of residue contributions exposed the differences in contributions of key active site residues in the respective targets.

Interactions of the Screened Compounds with Their Protein Targets
To study the reason underlying the differences in the binding affinities of compounds to their protein targets, we used the structure with the lowest free energy to analyze the interaction [37] (Figure A3).For BTK, the active sites included Val416, Ala428, Lys430, Phe442, Thr474, Glu475, Met477, Leu528, Asp539, and Phe540 residues, as detected from the interaction between BTK and pirtobrutinib.The key residues in the active sites included Met477, Lys430, Asp539, and Phe540 [38] (Figure 8a,b).The residue Met477 was located in the hinge region, while Asp539 and Phe540 were situated at the DFG motif.The hinge region forms important hydrogen bonds with ATP and ATP-competitive inhibitors, and the DFG motif domain comprises three conserved residues where D is involved in binding with activated-state Mg ions and F participates in the formation of activated-state R-spines.These findings further illustrate the importance of key amino acids in terms of structure.

Interactions of the Screened Compounds with Their Protein Targets
To study the reason underlying the differences in the binding affinities of compounds to their protein targets, we used the structure with the lowest free energy to analyze the interaction [37] (Figure A3).For BTK, the active sites included Val416, Ala428, Lys430, Phe442, Thr474, Glu475, Met477, Leu528, Asp539, and Phe540 residues, as detected from the interaction between BTK and pirtobrutinib.The key residues in the active sites included Met477, Lys430, Asp539, and Phe540 [38] (Figure 8a,b).The residue Met477 was located in the hinge region, while Asp539 and Phe540 were situated at the DFG motif.The hinge region forms important hydrogen bonds with ATP and ATP-competitive inhibitors, and the DFG motif domain comprises three conserved residues where D is involved in binding with activated-state Mg ions and F participates in the formation of activated-state R-spines.These findings further illustrate the importance of key amino acids in terms of structure.Considering BTK, all interactions of the screened compound complexes at the active sites were consistent with those observed for the BTK-pirtobrutinib complex.H-bonds or hydrophobic interactions were found between the compounds and the hinge region, which provided them with stability in the active pocket (Figure 8a,b).The compound CNP0266747 formed one H-bond with Asp539 and six hydrophobic interactions with residues Leu408, Val416, Val458, Met477, Leu460, and Phe540 (Figure 8c).The residue Met477 located in the hinge region formed a hydrophobic interaction with the compound, while residues Asp539 and Phe540 from the DFG motif domain formed an H-bond and a hydrophobic interaction, respectively.MD simulation revealed the persistent presence of an H-bond with Asp539 (Table A1).The compound CNP0332171 formed three H-bonds with residues Gln412, Met477, and Cys481 and six hydrophobic interactions with residues Leu408, Val416, Leu483, Arg525, and Leu528 (Figure 8d).The compound CNP04151447 formed hydrophobic interactions with residues Leu408, Phe413, dal416, Ala428, Lys430, Met477, and Leu528.One pi-pi stacking interaction was formed between the benzene group of the compound and the residue Phe413 (Figure 8e).All these compounds can form sustained hydrogen bonds or hydrophobic interactions with the key residue Met477.Noteworthy, only the compound CNP0266747 formed interactions with the residues Asp539 and Phe540.This difference may be responsible for the different trends in the contribution of the residues to the binding free energy.
For the target JAK3, the active sites comprised the residues Leu828, Val836, Ala853, Lys855, Met902, Glu903, Leu905, Cys909, Arg953, Leu956, and Asp967 [39].The binding sites on JAK3 for all compounds lay in the active sites.The JAK3-peficitinib complex mainly showed three persistent H-bonds with residues Glu903 and Leu905 from the hinge region that maintained stability throughout the simulation (Figure 9a,b).The importance of the formation of interactions between compounds and active sites from the hinge domain was demonstrated.As shown in Figure 9, all screened compounds formed stable hydrogen bonds with Glu903, Leu905, or Cys909 in the hinge region and stabilized their own structures in the binding pocket (Table A1).JAK3 is highly conserved in the binding pocket with other JAK family members, except at residues Cys909 and Ala966.Given the Considering BTK, all interactions of the screened compound complexes at the active sites were consistent with those observed for the BTK-pirtobrutinib complex.H-bonds or hydrophobic interactions were found between the compounds and the hinge region, which provided them with stability in the active pocket (Figure 8a,b).The compound CNP0266747 formed one H-bond with Asp539 and six hydrophobic interactions with residues Leu408, Val416, Val458, Met477, Leu460, and Phe540 (Figure 8c).The residue Met477 located in the hinge region formed a hydrophobic interaction with the compound, while residues Asp539 and Phe540 from the DFG motif domain formed an H-bond and a hydrophobic interaction, respectively.MD simulation revealed the persistent presence of an H-bond with Asp539 (Table A1).The compound CNP0332171 formed three H-bonds with residues Gln412, Met477, and Cys481 and six hydrophobic interactions with residues Leu408, Val416, Leu483, Arg525, and Leu528 (Figure 8d).The compound CNP04151447 formed hydrophobic interactions with residues Leu408, Phe413, dal416, Ala428, Lys430, Met477, and Leu528.One pi-pi stacking interaction was formed between the benzene group of the compound and the residue Phe413 (Figure 8e).All these compounds can form sustained hydrogen bonds or hydrophobic interactions with the key residue Met477.Noteworthy, only the compound CNP0266747 formed interactions with the residues Asp539 and Phe540.This difference may be responsible for the different trends in the contribution of the residues to the binding free energy.
For the target JAK3, the active sites comprised the residues Leu828, Val836, Ala853, Lys855, Met902, Glu903, Leu905, Cys909, Arg953, Leu956, and Asp967 [39].The binding sites on JAK3 for all compounds lay in the active sites.The JAK3-peficitinib complex mainly showed three persistent H-bonds with residues Glu903 and Leu905 from the hinge region that maintained stability throughout the simulation (Figure 9a,b).The importance of the formation of interactions between compounds and active sites from the hinge domain was demonstrated.As shown in Figure 9, all screened compounds formed stable hydrogen bonds with Glu903, Leu905, or Cys909 in the hinge region and stabilized their own structures in the binding pocket (Table A1).JAK3 is highly conserved in the binding pocket with other JAK family members, except at residues Cys909 and Ala966.Given the very highly conserved active pocket residues of the JAK family, the only differences included the residue Cys909 from the hinge region and Ala966 close to the DFG motif, which gathered our interest (Figure 9c-e).The compounds CNP0266747 and CNP0332171 could form H-bonds with Cys909 from the hinge region, Arg953 from the loop domain, and Asp967 from the DFG motif (Figure 9c,d).The compound CNP0266747 formed a hydrophobic interaction with the residue Ala966, which explains the difference in the contributions of the binding free energies of residues Cys909, Arg953, and Asp967 in residue decomposition and the large energy gap in the value of the residue Ala966 from the DFG motif region.This was the reason for differences in the contributions of residue free energy in MM/PBSA analysis.
Molecules 2023, 28, x FOR PEER REVIEW 11 of 21 very highly conserved active pocket residues of the JAK family, the only differences included the residue Cys909 from the hinge region and Ala966 close to the DFG motif, which gathered our interest (Figure 9c-e).The compounds CNP0266747 and CNP0332171 could form H-bonds with Cys909 from the hinge region, Arg953 from the loop domain, and Asp967 from the DFG motif (Figure 9c,d).The compound CNP0266747 formed a hydrophobic interaction with the residue Ala966, which explains the difference in the contributions of the binding free energies of residues Cys909, Arg953, and Asp967 in residue decomposition and the large energy gap in the value of the residue Ala966 from the DFG motif region.This was the reason for differences in the contributions of residue free energy in MM/PBSA analysis.The remaining BTK-and JAK3-compound interactions were all hydrophobic.Based on the analysis of the above interactions, in our MD results, the interactions formed by the compounds were mainly hydrophobic.This observation explains why the contribution of van der Waals energy was the largest among the components of the total binding free energy in all complex systems.In conclusion, the screened compounds could form Hbonds with the hinge region and stabilize their own structures in the binding pocket.Although the interacting residues were slightly different, they were located in the active pocket.
We observed that the two-dimensional structure of the screened compounds comprised active fingerprint fragments of BTK and JAK3, which were mentioned in the context of the explanatory analysis of the models.The compound CNP0266747 included the 339 fingerprint fragment, displayed in cyan, and the 1589 fingerprint fragment, shown in green, against BTK and JAK3, respectively (Figure A4).It was interesting to observe that these fingerprint fragments could separately form both H-bonds and hydrophobic interactions with BTK and JAK3.Although CNP0332171 and CNP0415155 also contained fingerprint fragments 694 and 1984 for BTK and fingerprint fragments 1535 and 1114 for JAK3, respectively, they could only form hydrophobic interactions with BTK and JAK3.Only the active fingerprint fragments of CNP0266747 formed H-bonds and hydrophobic interactions with BTK and JAK3.Therefore, we hypothesized that this causes differences in the combining method with the protein-ligand complex.To sum up, each of the The remaining BTK-and JAK3-compound interactions were all hydrophobic.Based on the analysis of the above interactions, in our MD results, the interactions formed by the compounds were mainly hydrophobic.This observation explains why the contribution of van der Waals energy was the largest among the components of the total binding free energy in all complex systems.In conclusion, the screened compounds could form H-bonds with the hinge region and stabilize their own structures in the binding pocket.Although the interacting residues were slightly different, they were located in the active pocket.
We observed that the two-dimensional structure of the screened compounds comprised active fingerprint fragments of BTK and JAK3, which were mentioned in the context of the explanatory analysis of the models.The compound CNP0266747 included the 339 fingerprint fragment, displayed in cyan, and the 1589 fingerprint fragment, shown in green, against BTK and JAK3, respectively (Figure A4).It was interesting to observe that these fingerprint fragments could separately form both H-bonds and hydrophobic interactions with BTK and JAK3.Although CNP0332171 and CNP0415155 also contained fingerprint fragments 694 and 1984 for BTK and fingerprint fragments 1535 and 1114 for JAK3, respectively, they could only form hydrophobic interactions with BTK and JAK3.Only the active fingerprint fragments of CNP0266747 formed H-bonds and hydrophobic interactions with BTK and JAK3.Therefore, we hypothesized that this causes differences in the combining method with the protein-ligand complex.To sum up, each of the screened molecules contained active fingerprint fragments against both BTK and JAK3.This result further demonstrates that our models were reasonable and that the active fingerprint fragments we obtained via Shapley values were meaningful and accurate.

Active Fingerprint Fragments in CNP0266747 for Dual Targets
The compound CNP0266747 is a derivative of rutecarpine and exhibits anticancer and analgesic effects.Therefore, its binding to the two targets warrants further investigation.CNP0266747, with the lowest binding free energy against BTK and JAK3, had a special binding mode as compared with the other screened compounds (Figure 10a).Therefore, we thought it was worthwhile to explore the relationship of CNP0266747 with BTK and JAK3.For BTK, pirtobrutinib is a third-generation inhibitor that exhibits acceptable safety and selectivity profiles.The compound CNP0266747 had a consistent conformation in terms of spatial orientation with pirtobrutinib (Figure 10b,c).As shown in Figure 10c, we inserted the structure into the back pocket by bonding it with the surrounding residues.As we inserted the compound into the back pocket, its elution rate decreased and selectivity increased.Therefore, we suggest that the compound CNP0266747 showed selectivity for the BTK target, as observed with pirtobrutinib.Further analysis of the binding model of the compound CNP0266747 revealed that its head anchored to the molecule by forming a hydrophobic interaction with the hinge region residues Met477 and Leu408 (Figure 10d).The tail of CNP0266747 is an indole ring, which formed an H-bond with the residue Asp539 and hydrophobic interactions with residues Phe540 and Leu460.Interestingly, we found that the indole ring contained the fingerprint fragment 339 (cyan highlight) with top Shapley values.The fingerprint fragment 339 could bind to residues Asp539, Phe540, and Leu460, which facilitated the insertion of the molecular tail into the back pocket and consequently increased the selectivity of CNP0266747 (Figure 10d).Therefore, this fingerprint fragment has a decisive role for the BTK target in our opinion.
Molecules 2023, 28, x FOR PEER REVIEW 12 of 21 screened molecules contained active fingerprint fragments against both BTK and JAK3.This result further demonstrates that our models were reasonable and that the active fingerprint fragments we obtained via Shapley values were meaningful and accurate.

Active Fingerprint Fragments in CNP0266747 for Dual Targets
The compound CNP0266747 is a derivative of rutecarpine and exhibits anticancer and analgesic effects.Therefore, its binding to the two targets warrants further investigation.CNP0266747, with the lowest binding free energy against BTK and JAK3, had a special binding mode as compared with the other screened compounds (Figure 10a).Therefore, we thought it was worthwhile to explore the relationship of CNP0266747 with BTK and JAK3.For BTK, pirtobrutinib is a third-generation inhibitor that exhibits acceptable safety and selectivity profiles.The compound CNP0266747 had a consistent conformation in terms of spatial orientation with pirtobrutinib (Figure 10b,c).As shown in Figure 10c, we inserted the structure into the back pocket by bonding it with the surrounding residues.As we inserted the compound into the back pocket, its elution rate decreased and selectivity increased.Therefore, we suggest that the compound CNP0266747 showed selectivity for the BTK target, as observed with pirtobrutinib.Further analysis of the binding model of the compound CNP0266747 revealed that its head anchored to the molecule by forming a hydrophobic interaction with the hinge region residues Met477 and Leu408 (Figure 10d).The tail of CNP0266747 is an indole ring, which formed an H-bond with the residue Asp539 and hydrophobic interactions with residues Phe540 and Leu460.Interestingly, we found that the indole ring contained the fingerprint fragment 339 (cyan highlight) with top Shapley values.The fingerprint fragment 339 could bind to residues Asp539, Phe540, and Leu460, which facilitated the insertion of the molecular tail into the back pocket and consequently increased the selectivity of CNP0266747 (Figure 10d).Therefore, this fingerprint fragment has a decisive role for the BTK target in our opinion.For JAK3, the rutecarpine fragment of CNP0266747 interacted with residues Leu828, Leu905 (the hinge region residue to stabilize the molecule), and Leu956 as observed with peficitinib, which was not a selective inhibitor (Figure 11a).As CNP0266747 exhibits a long aliphatic chain connecting to the indole ring, it may interact with additional residues in the active pocket, including Cys909, Arg953, Ala966, and Asp967.The residue Asp967 caused the compound CNP0266747 to form an O-shaped spatial conformation through For JAK3, the rutecarpine fragment of CNP0266747 interacted with residues Leu828, Leu905 (the hinge region residue to stabilize the molecule), and Leu956 as observed with peficitinib, which was not a selective inhibitor (Figure 11a).As CNP0266747 exhibits a long aliphatic chain connecting to the indole ring, it may interact with additional residues in the active pocket, including Cys909, Arg953, Ala966, and Asp967.The residue Asp967 caused the compound CNP0266747 to form an O-shaped spatial conformation through hydrogen bonding and hydrophobic interaction with the head and tail of the compound (Figure 11b,c).This conformation allowed the compound to form a hydrophobic interaction with Ala966.Meanwhile, residues Cys909 and Arg953 further supported the O-conformation by forming hydrogen bonds with the long chain.As shown in Figure 11d, the fingerprint fragment 1589 (green highlight) with a high Shapley value played an important role in the formation of O-conformation, as it formed lasting H-bonds with Cys909, Arg953, and Ala966 and stabilized the special conformation.In summary, the compound CNP0266747 can form interactions with JAK3 in an O-conformation with the unique residues Cys909 and Ala966 via fingerprint fragmentation 1589.Although there was no co-crystallized structure for selective JAK inhibitors, CNP0266747 could interact with residues that were unique to JAK3.Therefore, we thought that CNP0266747 could possibly exhibit selectivity.The fingerprint fragmentation 1589 has an important role in the JAK3 structure.The compound CNP0266747 binds not only to BTK via fingerprint fragment 339 but also to JAK3 via fingerprint fragment 1589, and exhibits selectivity for both targets at the same time (Figure 12).In summary, the special binding mode of the compound CNP0266747 to BTK and JAK3 led to a free energy gap with other screened compounds, and its active fingerprint fragments 339 and 1589 played important roles in the formation of the special binding pattern.
Molecules 2023, 28, x FOR PEER REVIEW 13 of 21 hydrogen bonding and hydrophobic interaction with the head and tail of the compound (Figure 11b,c).This conformation allowed the compound to form a hydrophobic interaction with Ala966.Meanwhile, residues Cys909 and Arg953 further supported the O-conformation by forming hydrogen bonds with the long chain.As shown in Figure 11d, the fingerprint fragment 1589 (green highlight) with a high Shapley value played an important role in the formation of O-conformation, as it formed lasting H-bonds with Cys909, Arg953, and Ala966 and stabilized the special conformation.In summary, the compound CNP0266747 can form interactions with JAK3 in an O-conformation with the unique residues Cys909 and Ala966 via fingerprint fragmentation 1589.Although there was no co-crystallized structure for selective JAK inhibitors, CNP0266747 could interact with residues that were unique to JAK3.Therefore, we thought that CNP0266747 could possibly exhibit selectivity.The fingerprint fragmentation 1589 has an important role in the JAK3 structure.The compound CNP0266747 binds not only to BTK via fingerprint fragment 339 but also to JAK3 via fingerprint fragment 1589, and exhibits selectivity for both targets at the same time (Figure 12).In summary, the special binding mode of the compound CNP0266747 to BTK and JAK3 led to a free energy gap with other screened compounds, and its active fingerprint fragments 339 and 1589 played important roles in the formation of the special binding pattern.hydrogen bonding and hydrophobic interaction with the head and tail of the compound (Figure 11b,c).This conformation allowed the compound to form a hydrophobic interaction with Ala966.Meanwhile, residues Cys909 and Arg953 further supported the O-conformation by forming hydrogen bonds with the long chain.As shown in Figure 11d, the fingerprint fragment 1589 (green highlight) with a high Shapley value played an important role in the formation of O-conformation, as it formed lasting H-bonds with Cys909, Arg953, and Ala966 and stabilized the special conformation.In summary, the compound CNP0266747 can form interactions with JAK3 in an O-conformation with the unique residues Cys909 and Ala966 via fingerprint fragmentation 1589.Although there was no co-crystallized structure for selective JAK inhibitors, CNP0266747 could interact with residues that were unique to JAK3.Therefore, we thought that CNP0266747 could possibly exhibit selectivity.The fingerprint fragmentation 1589 has an important role in the JAK3 structure.The compound CNP0266747 binds not only to BTK via fingerprint fragment 339 but also to JAK3 via fingerprint fragment 1589, and exhibits selectivity for both targets at the same time (Figure 12).In summary, the special binding mode of the compound CNP0266747 to BTK and JAK3 led to a free energy gap with other screened compounds, and its active fingerprint fragments 339 and 1589 played important roles in the formation of the special binding pattern.

Discussion
In the field of kinase drug discovery, researchers are actively seeking new methodologies to minimize wastage and reduce drug expenses.Multitargeted therapies have been the focus of research in this direction.However, this increases the complexity of this study.Herein, we introduce a general pipeline that integrates machine learning, the interpretable SHAP model, and molecular dynamics simulations to discover a dual-target drug candidate.BTK and JAK3 are two important enzymes that can be potentially targeted to inhibit downstream signaling pathways related to cancer cell growth.This study aimed to discover dual-target potential inhibitors of BTK and JAK3 from a natural product database.We established three machine learning models (RF, ET, and XGB) and validated their excellent activity prediction abilities.The three most important fingerprint features were listed as per the Shapley values.A 1:1:1 weighting strategy was used to classify the activity or inactivity of compounds from the natural product database using the three models.Three compounds (CNP0266747, CNP0332171, and CNP0415155) were selected as candidate BTK-JAK3 dual-target inhibitors via molecular docking, clustering, and ADMET analyses.Finally, we performed molecular dynamics simulations and MM/PBSA to calculate their binding free energies.These compounds could stably bind to both targets by forming H-bonds and hydrophobic interactions.All of the screened compounds contained active fingerprint fragments against BTK and JAK3.The compound CNP0266747 was chosen as the focus of this work, as it exhibited potent binding free energy and different residue combinations against BTK and JAK3.For BTK, Met477, Asp539, and Phe540 were key amino acids that facilitated the insertion of the compound into the back pocket.For JAK3, the residues Cys909, Arg953, Ala966, and Asp967 were involved in the stabilization of the O-conformation of the protein−inhibitor complex.CNP0266747 has unique binding modes for two targets.Moreover, its fingerprint features 339 and 1589 played crucial roles during the process of binding with the key residues.In conclusion, this work is an attempt to develop a general pipeline that predicts the candidate dual inhibitors of BTK and JAK3 and provides helpful guidance for drug design.CNP0266747 displays huge potential as a dual-target inhibitor of BTK and JAK3 and is expected to undergo follow-up research.

Collection and Preparation of Data
We collected the IC 50 values of active compounds against BTK and JAK3 from the BindingDB database (https://bindingdb.org/bind/index.jsp,accessed on 5 October 2022).Duplicates and inactive compounds were excluded to obtain 15,438 BTK-active compounds and 8846 JAK3-active compounds.As activating molecules, compounds with IC 50 values < 100 nM were flagged as active molecules and those with IC 50 values > 100 nM were flagged as inactive molecules.Data were relatively balanced and then standardized with RDKit.The Coconut natural product database (https://coconut.naturalproducts.net/,accessed on 10 January 2023) was used for virtual screening of the compounds.

Model Construction
The molecular descriptor is the result of a process that converts chemical information into mathematical numbers.We used RDKit to calculate ECFPs [40], which are represented by a set of integer identifiers of indefinite length and act as the most primitive and accurate representation.Each identifier represents a specific substructure.ECFPs extract the features of the current layer by stitching the features in the neighborhood of the previous layer and then using a fixed hash function.The result is considered as an integer index, and then the vertex feature vector is filled in at the position corresponding to index 1.In total, 2048 molecular fingerprints were computed to construct machine learning models.
After preliminary exploration, the IC 50 value distribution was discrete.Thus, we converted IC 50 values to pIC 50 values.After this transformation, the data were split into training and test sets at a 4:1 ratio.The same random seed was used on the data to assure consistent data segmentation.Random forests (RFs) [41], extra trees (ETs) [42], and XGBoost

Figure 1 .
Figure 1.The pipeline proposed in this work for virtual screening of potential BTK and JAK3 dual inhibitors.

Figure 2 .
Figure 2. Diversity distribution of the modeling data.(a,b) Two-dimensional spatial distribution maps of BTK and JAK3, respectively.

Figure 1 .
Figure 1.The pipeline proposed in this work for virtual screening of potential BTK and JAK3 dual inhibitors.

Figure 1 .
Figure 1.The pipeline proposed in this work for virtual screening of potential BTK and JAK3 dual inhibitors.

Figure 2 .
Figure 2. Diversity distribution of the modeling data.(a,b) Two-dimensional spatial distribution maps of BTK and JAK3, respectively.

Figure 2 .
Figure 2. Diversity distribution of the modeling data.(a,b) Two-dimensional spatial distribution maps of BTK and JAK3, respectively.

Figure 3 .
Figure 3. (a,b) Plots of ROC curves for the BTK and JAK3 test sets.
As shown in Fig the y-axis indicates the importance of the model's predictive features and the x-axis the effect on model predictions (red color indicates the sample point has a large Sh value and blue color indicates a small Shapley value).We combined the Shapley v

Figure 3 .
Figure 3. (a,b) Plots of ROC curves for the BTK and JAK3 test sets.

Figure 4 .
Figure 4.The scatter plot of feature density of the top 10 molecular fingerprint fragments.Red means a sample point has a large Shapley value and blue color means a sample point has a Shapley value.BTK: Figure 4 shows the top 10 Shapley values of the fingerprint fragments.O three fingerprint fragments, 339, 694, and 1984, appeared in the top 10 Shapley valu all three models, especially fingerprint fragment 339 (Table 3).A larger Shapley valu creases the probability of the molecule being active.Therefore, the Shapley value of gerprint fragment 339 was located at the top of the three models, which gathered ou tention.JAK3: The top 10 Shapley values of the fingerprint fragments are shown in Figu Only the fingerprint fragments 1589, 1535, and 1114 appeared in the top 10 Shapley va of all three models (Table3) and were the focus objects, especially 1589, which had highest value of the three fingerprints.These findings not only provide an insight into impact of fingerprint features on the model but also serve as the foundation for fragm based drug design.

Figure 4 .
Figure 4.The scatter plot of feature density of the top 10 molecular fingerprint fragments.Red color means a sample point has a large Shapley value and blue color means a sample point has a small Shapley value.BTK: Figure 4 shows the top 10 Shapley values of the fingerprint fragments.Only three fingerprint fragments, 339, 694, and 1984, appeared in the top 10 Shapley values of all three models, especially fingerprint fragment 339 (Table 3).A larger Shapley value increases the probability of the molecule being active.Therefore, the Shapley value of fingerprint fragment 339 was located at the top of the three models, which gathered our attention.JAK3: The top 10 Shapley values of the fingerprint fragments are shown in Figure 4.Only the fingerprint fragments 1589, 1535, and 1114 appeared in the top 10 Shapley values of all three models (Table3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.

Figure 4 .
Figure 4.The scatter plot of feature density of the top 10 molecular fingerprint fragments.Red color means a sample point has a large Shapley value and blue color means a sample point has a small Shapley value.BTK: Figure 4 shows the top 10 Shapley values of the fingerprint fragments.Only three fingerprint fragments, 339, 694, and 1984, appeared in the top 10 Shapley values of all three models, especially fingerprint fragment 339 (Table 3).A larger Shapley value increases the probability of the molecule being active.Therefore, the Shapley value of fingerprint fragment 339 was located at the top of the three models, which gathered our attention.JAK3: The top 10 Shapley values of the fingerprint fragments are shown in Figure 4.Only the fingerprint fragments 1589, 1535, and 1114 appeared in the top 10 Shapley values of all three models (Table3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.

Figure 4 .
Figure 4.The scatter plot of feature density of the top 10 molecular fingerprint fragments.Red color means a sample point has a large Shapley value and blue color means a sample point has a small Shapley value.BTK: Figure 4 shows the top 10 Shapley values of the fingerprint fragments.Only three fingerprint fragments, 339, 694, and 1984, appeared in the top 10 Shapley values of all three models, especially fingerprint fragment 339 (Table 3).A larger Shapley value increases the probability of the molecule being active.Therefore, the Shapley value of fingerprint fragment 339 was located at the top of the three models, which gathered our attention.JAK3: The top 10 Shapley values of the fingerprint fragments are shown in Figure 4.Only the fingerprint fragments 1589, 1535, and 1114 appeared in the top 10 Shapley values of all three models (Table3) and were the focus objects, especially 1589, which had the highest value of the three fingerprints.These findings not only provide an insight into the impact of fingerprint features on the model but also serve as the foundation for fragmentbased drug design.

Figure 5 .
Figure 5.The three clusters are colored orange, green, and violet, respectively, for each target.

Figure 5 .
Figure 5.The three clusters are colored orange, green, and violet, respectively, for each target.

Figure 6 .
Figure 6.The changes in RMSD with time in the simulation.(a) RMSD values of BTK complex systems.(b) RMSD values of JAK3 complex systems.

Figure 6 .
Figure 6.The changes in RMSD with time in the simulation.(a) RMSD values of BTK complex systems.(b) RMSD values of JAK3 complex systems.

Figure 7 .
Figure 7. (a) Decomposition of the active site residues for binding free energy BTK complex systems.(b) Decomposition of the active site residues for binding free energy JAK3 complex systems.

Figure 7 .
Figure 7. (a) Decomposition of the active site residues for binding free energy BTK complex systems.(b) Decomposition of the active site residues for binding free energy JAK3 complex systems.

Figure 8 .
Figure 8.(a) The conformation of pirtobrutinib (shown in violet color and stick model) inside the active pocket of BTK (shown in green color and cartoon model).(b) Pirtobtutinib, (c) CNP0266747, (d) CNP0332171, (e) CNP0415155 inside in the active pocket of target protein.The compounds are shown as violet-colored sticks and the active residues are shown as cyan sticks.Hydrogen and hydrophobic bonds are formed between proteins and the compounds are shown as red and gray dotted lines, respectively.The pi-pi perpendicular and parallel stacking interactions are shown as yellow and green dotted lines, respectively.

Figure 8 .
Figure 8.(a) The conformation of pirtobrutinib (shown in violet color and stick model) inside the active pocket of BTK (shown in green color and cartoon model).(b) Pirtobtutinib, (c) CNP0266747, (d) CNP0332171, (e) CNP0415155 inside in the active pocket of target protein.The compounds are shown as violet-colored sticks and the active residues are shown as cyan sticks.Hydrogen and hydrophobic bonds are formed between proteins and the compounds are shown as red and gray dotted lines, respectively.The pi-pi perpendicular and parallel stacking interactions are shown as yellow and green dotted lines, respectively.

Figure 9 .
Figure 9. (a) The conformation of peficitinib (shown in violet color and stick model) inside the active pocket of JAK3 (shown in cyan color and cartoon model).(b) Peficitinib, (c) CNP0266747, (d) CNP0332171, (e) CNP0415155 inside the active site of target protein.The compounds are shown as violet-colored sticks and the active residues are shown as green sticks.Hydrogen and hydrophobic bonds are formed between proteins and the compounds are shown as red and gray dotted lines, respectively.

Figure 9 .
Figure 9. (a) The conformation of peficitinib (shown in violet color and stick model) inside the active pocket of JAK3 (shown in cyan color and cartoon model).(b) Peficitinib, (c) CNP0266747, (d) CNP0332171, (e) CNP0415155 inside the active site of target protein.The compounds are shown as violet-colored sticks and the active residues are shown as green sticks.Hydrogen and hydrophobic bonds are formed between proteins and the compounds are shown as red and gray dotted lines, respectively.

Figure 10 .
Figure 10.(a-c) The superimposition of the BTK-pirtobrutinib and BTK-CNP0226747 complexes.The protein is shown as gray cartoon.Cyan-colored sticks indicate pirtobrutinib and violet-colored sticks indicate compound CNP0226747.(d) Hydrogen and hydrophilic bonds that are formed between protein and the two-dimensional structure of the compound CNP0226747.Hydrogen and hydrophilic bonds are shown as red and gray dotted lines, respectively.The key molecular fingerprint fragment is shown as cyan highlight.

Figure 10 .
Figure 10.(a-c) The superimposition of the BTK-pirtobrutinib and BTK-CNP0226747 complexes.The protein is shown as gray cartoon.Cyan-colored sticks indicate pirtobrutinib and violet-colored sticks indicate compound CNP0226747.(d) Hydrogen and hydrophilic bonds that are formed between protein and the two-dimensional structure of the compound CNP0226747.Hydrogen and hydrophilic bonds are shown as red and gray dotted lines, respectively.The key molecular fingerprint fragment is shown as cyan highlight.

Figure 11 .
Figure 11.(a-c) The superimposition of the JAK3-peficitinib and JAK3-CNP0226747 complexes.The protein is shown as gray cartoon.Green-colored sticks represent pirtobrutinib and violet-colored sticks represent compound CNP0226747.(d) Hydrogen and hydrophilic bonds are formed between protein and the two-dimensional structure of the compound CNP0226747.Hydrogen and hydrophilic bonds are shown as red and gray dotted lines, respectively.The key molecular fingerprint fragment is shown as green highlight.

Figure 12 .
Figure 12.The two-dimensional structure of compound CNP0226747.The key molecular fingerprint fragments are shown as green and cyan highlights.

Figure 11 .
Figure 11.(a-c) The superimposition of the JAK3-peficitinib and JAK3-CNP0226747 complexes.The protein is shown as gray cartoon.Green-colored sticks represent pirtobrutinib and violet-colored sticks represent compound CNP0226747.(d) Hydrogen and hydrophilic bonds are formed between protein and the two-dimensional structure of the compound CNP0226747.Hydrogen and hydrophilic bonds are shown as red and gray dotted lines, respectively.The key molecular fingerprint fragment is shown as green highlight.

Figure 11 .
Figure 11.(a-c) The superimposition of the JAK3-peficitinib and JAK3-CNP0226747 complexes.The protein is shown as gray cartoon.Green-colored sticks represent pirtobrutinib and violet-colored sticks represent compound CNP0226747.(d) Hydrogen and hydrophilic bonds are formed between protein and the two-dimensional structure of the compound CNP0226747.Hydrogen and hydrophilic bonds are shown as red and gray dotted lines, respectively.The key molecular fingerprint fragment is shown as green highlight.

Figure 12 .
Figure 12.The two-dimensional structure of compound CNP0226747.The key molecular fingerprint fragments are shown as green and cyan highlights.

Figure 12 .
Figure 12.The two-dimensional structure of compound CNP0226747.The key molecular fingerprint fragments are shown as green and cyan highlights.

Figure A3 .
Figure A3.(a)The free energy landscape map of BTK-CNP0266747 complex system.(b) The free energy landscape map of BTK-CNP0332171 complex system.(c) The free energy landscape map of BTK-CNP0415155 complex system.(d) The free energy landscape map of JAK3-CNP0266747 complex system.(e) The free energy landscape map of JAK3-CNP0332171 complex system.(f) The free energy landscape map of JAK3-CNP0332171 complex system.

Figure A4 .
Figure A4.The active fingerprint fragments of the screen compounds are shown as cyan (BTK) highlights and green (JAK3) highlights.

Figure A3 .
Figure A3.(a)The free energy landscape map of BTK-CNP0266747 complex system.(b) The free energy landscape map of BTK-CNP0332171 complex system.(c) The free energy landscape map of BTK-CNP0415155 complex system.(d) The free energy landscape map of JAK3-CNP0266747 complex system.(e) The free energy landscape map of JAK3-CNP0332171 complex system.(f) The free energy landscape map of JAK3-CNP0332171 complex system.

Figure A3 .
Figure A3.(a)The free energy landscape map of BTK-CNP0266747 complex system.(b) The free energy landscape map of BTK-CNP0332171 complex system.(c) The free energy landscape map of BTK-CNP0415155 complex system.(d) The free energy landscape map of JAK3-CNP0266747 complex system.(e) The free energy landscape map of JAK3-CNP0332171 complex system.(f) The free energy landscape map of JAK3-CNP0332171 complex system.

Figure A4 .
Figure A4.The active fingerprint fragments of the screen compounds are shown as cyan (BTK) highlights and green (JAK3) highlights.

Figure A4 .
Figure A4.The active fingerprint fragments of the screen compounds are shown as cyan (BTK) highlights and green (JAK3) highlights.

Figure A5 .
Figure A5.Confusion matrix for the test sets of RF, ET, and XGB models against BTK and JAK3, respectively.

Table 1 .
Statistical results of BTK and JAK3 classification models on training sets (tenfold cross-validation).

Table 2 .
Statistical results of BTK and JAK3 classification models on test sets.

Table 1 .
Statistical results of BTK and JAK3 classification models on training sets (tenfold validation).

Table 2 .
Statistical results of BTK and JAK3 classification models on test sets.

Table 3 .
Molecular fingerprint fragments.The symbol * represents that other groups can be attached here.

Table 3 .
Molecular fingerprint fragments.The symbol * represents that other groups can be attached here.
C 1

Table A1 .
The key hydrogen bond occupancies at the protein-ligand binding sites.

Table A1 .
The key hydrogen bond occupancies at the protein-ligand binding sites.

Table A1 .
The key hydrogen bond occupancies at the protein-ligand binding sites.