Author Contributions
Conceptualization, H.H. and N.S.; methodology, H.H., F.S. and N.S.; software, H.H.; validation, B.B., I.N., A.T. and M.N.; formal analysis, H.H., B.B., I.N., A.T. and M.N.; investigation, H.H., B.B., I.N., A.T. and M.N.; resources, H.H., F.S. and M.N.; data curation, H.H., B.B. and A.T.; writing—original draft preparation, H.H., N.S.; writing—review and editing, H.H., F.S, N.S, I.N. and M.N.; visualization, H.H.; supervision, N.S. and F.S.; project administration, N.S. and F.S.; funding acquisition, F.S., N.S. and I.N. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Two examples showing the generation of a molecular fingerprint: (a) Dictionary-based fingerprint and (b) hashed-based fingerprint.
Figure 1.
Two examples showing the generation of a molecular fingerprint: (a) Dictionary-based fingerprint and (b) hashed-based fingerprint.
Figure 2.
A summary of the proposed CNN configuration that uses the Mol2Mat representation.
Figure 2.
A summary of the proposed CNN configuration that uses the Mol2Mat representation.
Figure 3.
Evaluation of eight fingerprints based on their (a) accuracy and (b) MSE performance.
Figure 3.
Evaluation of eight fingerprints based on their (a) accuracy and (b) MSE performance.
Figure 4.
Prediction accuracy values of the CNN model for the eight fingerprint representatives using the violin-plot charts.
Figure 4.
Prediction accuracy values of the CNN model for the eight fingerprint representatives using the violin-plot charts.
Figure 5.
A summary of the CNN configuration for a combination case named “K” using a Mol2mat representation.
Figure 5.
A summary of the CNN configuration for a combination case named “K” using a Mol2mat representation.
Figure 6.
A CNN Model configuration for a combination case named “K” using the Mol2mat representation.
Figure 6.
A CNN Model configuration for a combination case named “K” using the Mol2mat representation.
Figure 7.
Prediction accuracy values for the CNN model were applied to the 26 combination cases of the five best fingerprints with the help of violin-plot charts.
Figure 7.
Prediction accuracy values for the CNN model were applied to the 26 combination cases of the five best fingerprints with the help of violin-plot charts.
Figure 8.
A comparison of the prediction accuracies for the D, O, R, and T combination cases, plotted using the boxplot charts.
Figure 8.
A comparison of the prediction accuracies for the D, O, R, and T combination cases, plotted using the boxplot charts.
Figure 9.
Boxplot chart results based on comparing the sensitivity values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 9.
Boxplot chart results based on comparing the sensitivity values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 10.
Boxplot chart results based on comparing the specificity values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 10.
Boxplot chart results based on comparing the specificity values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 11.
Boxplot chart results based on the comparison of the AUC values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 11.
Boxplot chart results based on the comparison of the AUC values of different algorithms: CNNfp, NaiveB, RBFN, and LSVM.
Figure 12.
Examples of low-diversity molecules in the MDDR dataset.
Figure 12.
Examples of low-diversity molecules in the MDDR dataset.
Figure 13.
Examples of high-diversity molecules in the MDDR dataset.
Figure 13.
Examples of high-diversity molecules in the MDDR dataset.
Figure 14.
The average pairwise similarity (MPS) across each set of active molecules.
Figure 14.
The average pairwise similarity (MPS) across each set of active molecules.
Figure 15.
Comparison of MPS values of the three databases using boxplot.
Figure 15.
Comparison of MPS values of the three databases using boxplot.
Figure 16.
3D−scatter plots based on seven fingerprints and representations of descriptors: (a) ALogP, (b) CDKFp, (c) ECFP4, (d) EPFP4, (e) GraphOnly, (f) MDL, and (g) PubchemFp of 5083 different molecules that were selected from the 10 biological activity classes of the MDDR dataset.
Figure 16.
3D−scatter plots based on seven fingerprints and representations of descriptors: (a) ALogP, (b) CDKFp, (c) ECFP4, (d) EPFP4, (e) GraphOnly, (f) MDL, and (g) PubchemFp of 5083 different molecules that were selected from the 10 biological activity classes of the MDDR dataset.
Figure 17.
A summary of the newly proposed Mol2mat presentation process.
Figure 17.
A summary of the newly proposed Mol2mat presentation process.
Figure 18.
The general CNN configuration.
Figure 18.
The general CNN configuration.
Figure 19.
Different approaches used for fusing the information present in the CNN layers.
Figure 19.
Different approaches used for fusing the information present in the CNN layers.
Figure 20.
The configuration of the combined CNN that was used for 3 fingerprints.
Figure 20.
The configuration of the combined CNN that was used for 3 fingerprints.
Table 1.
Probable combination cases for the five best fingerprints.
Table 1.
Probable combination cases for the five best fingerprints.
Labels | Combination | CDK | ECFP4 | EPFP4 | Graph | ECFC4 |
---|
A | 2 | √ | √ | | | |
B | 2 | √ | | √ | | |
C | 2 | √ | | | √ | |
D | 2 | √ | | | | √ |
E | 2 | | √ | √ | | |
F | 2 | | √ | | √ | |
G | 2 | | √ | | | √ |
H | 2 | | | √ | √ | |
I | 2 | | | √ | | √ |
J | 2 | | | | √ | √ |
K | 3 | √ | √ | √ | | |
L | 3 | √ | √ | | √ | |
M | 3 | √ | √ | | | √ |
N | 3 | √ | | √ | √ | |
O | 3 | √ | | √ | | √ |
P | 3 | √ | | | √ | √ |
Q | 3 | | √ | √ | √ | |
R | 3 | | √ | √ | | √ |
S | 3 | | √ | | √ | √ |
T | 3 | | | √ | √ | √ |
U | 4 | √ | √ | √ | √ | |
V | 4 | √ | √ | √ | | √ |
W | 4 | √ | √ | | √ | √ |
X | 4 | √ | | √ | √ | √ |
Y | 4 | | √ | √ | √ | √ |
Z | 5 | √ | √ | √ | √ | √ |
Table 2.
Sensitivity, specificity, and AUC values for all the prediction models using an MDDR1 dataset.
Table 2.
Sensitivity, specificity, and AUC values for all the prediction models using an MDDR1 dataset.
Activity Index | CNNfp | NaïveB | RBFN | LSVM |
---|
Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC |
---|
7707 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.99 | 0.63 | 1.00 | 0.82 | 0.93 | 0.95 | 0.94 |
7708 | 1.00 | 1.00 | 1.00 | 0.97 | 1.00 | 0.99 | 0.51 | 1.00 | 0.75 | 0.96 | 0.96 | 0.96 |
31420 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 | 0.96 | 0.96 | 0.92 | 0.99 | 0.96 |
42710 | 0.99 | 0.99 | 0.99 | 0.94 | 1.00 | 0.97 | 0.43 | 1.00 | 0.72 | 0.95 | 0.99 | 0.97 |
64100 | 0.97 | 0.99 | 0.98 | 0.95 | 1.00 | 0.97 | 0.97 | 0.90 | 0.94 | 0.96 | 0.99 | 0.98 |
64200 | 0.96 | 0.99 | 0.98 | 0.87 | 0.95 | 0.91 | 0.43 | 1.00 | 0.71 | 0.94 | 1.00 | 0.97 |
64220 | 1.00 | 1.00 | 1.00 | 0.97 | 0.99 | 0.96 | 0.95 | 0.97 | 0.96 | 0.92 | 1.00 | 0.96 |
64500 | 1.00 | 1.00 | 1.00 | 0.91 | 0.93 | 0.92 | 0.44 | 1.00 | 0.72 | 0.84 | 0.95 | 0.90 |
64350 | 1.00 | 1.00 | 1.00 | 0.94 | 0.96 | 0.95 | 0.80 | 1.00 | 0.90 | 0.90 | 0.94 | 0.92 |
75755 | 1.00 | 1.00 | 1.00 | 0.94 | 0.98 | 0.96 | 0.76 | 1.00 | 0.88 | 0.94 | 0.97 | 0.96 |
mean | 0.98 | 0.99 | 0.99 | 0.94 | 0.98 | 0.96 | 0.69 | 0.98 | 0.84 | 0.93 | 0.97 | 0.95 |
Table 3.
Sensitivity, specificity, and AUC values for the prediction models using an MDDR2 dataset.
Table 3.
Sensitivity, specificity, and AUC values for the prediction models using an MDDR2 dataset.
Activity Index | CNNfp | NaïveB | RBFN | LSVM |
---|
Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC |
---|
9249 | 1.00 | 1.00 | 1.00 | 0.91 | 0.99 | 0.95 | 0.82 | 0.98 | 0.90 | 0.95 | 0.97 | 0.96 |
12455 | 1.00 | 1.00 | 1.00 | 0.88 | 0.97 | 0.92 | 0.66 | 0.98 | 0.82 | 0.93 | 0.96 | 0.94 |
12464 | 1.00 | 1.00 | 1.00 | 0.85 | 0.99 | 0.92 | 0.75 | 0.95 | 0.85 | 0.89 | 0.97 | 0.93 |
31281 | 1.00 | 1.00 | 1.00 | 0.94 | 1.00 | 0.97 | 0.53 | 1.00 | 0.76 | 0.95 | 0.97 | 0.96 |
43210 | 0.99 | 0.99 | 0.99 | 0.84 | 0.99 | 0.91 | 0.78 | 0.97 | 0.87 | 0.93 | 0.96 | 0.94 |
71522 | 1.00 | 1.00 | 1.00 | 0.82 | 0.99 | 0.91 | 0.75 | 0.97 | 0.86 | 0.91 | 0.97 | 0.94 |
75721 | 1.00 | 1.00 | 1.00 | 0.91 | 0.99 | 0.95 | 0.86 | 0.98 | 0.92 | 0.96 | 0.97 | 0.96 |
78331 | 0.98 | 0.99 | 0.99 | 0.81 | 0.96 | 0.89 | 0.79 | 0.93 | 0.86 | 0.81 | 0.96 | 0.88 |
78348 | 0.99 | 0.99 | 0.99 | 0.65 | 0.99 | 0.82 | 0.74 | 0.96 | 0.85 | 0.88 | 0.97 | 0.92 |
78351 | 0.99 | 0.99 | 0.99 | 0.82 | 0.94 | 0.88 | 0.59 | 0.96 | 0.78 | 0.91 | 0.95 | 0.93 |
mean | 0.99 | 0.99 | 0.99 | 0.84 | 0.98 | 0.91 | 0.73 | 0.97 | 0.85 | 0.91 | 0.97 | 0.94 |
Table 4.
Sensitivity, specificity, and AUC values for the prediction models using a Sutherland dataset.
Table 4.
Sensitivity, specificity, and AUC values for the prediction models using a Sutherland dataset.
Activity Class | CNNfp | NaïveB | RBFN | LSVM |
---|
Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC | Sens | Spec | AUC |
---|
Estrogen receptor | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.62 | 0.70 | 0.64 | 0.98 | 1.00 | 0.99 |
Dihydrofolate reductase | 0.99 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 | 0.86 | 0.80 | 0.84 | 0.90 | 0.98 | 0.94 |
Cyclooxygenase-2 inhibitors | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.93 | 0.76 | 0.84 | 1.00 | 0.99 | 0.99 |
Benzodiazepine receptor | 1.00 | 1.00 | 1.00 | 0.94 | 0.61 | 0.78 | 0.99 | 0.65 | 0.82 | 0.95 | 0.92 | 0.93 |
mean | 0.99 | 0.99 | 0.99 | 0.98 | 0.90 | 0.94 | 0.85 | 0.73 | 0.79 | 0.95 | 0.97 | 0.96 |
Table 5.
MDDR activity classes for DS1 dataset.
Table 5.
MDDR activity classes for DS1 dataset.
Activity Index | Activity Class | Active Molecules | Pairwise Similarity |
---|
07707 | Adenosine agonists A1 | 207 | 0.229 |
07708 | Adenosine agonists A2 | 156 | 0.305 |
31420 | Rennin inhibitors | 1130 | 0.290 |
42710 | CCK agonists | 111 | 0.361 |
64100 | Monocyclic_-lactams | 1346 | 0.336 |
64200 | Cephalosporins | 113 | 0.322 |
64220 | Carbacephems | 1051 | 0.269 |
64500 | Carbapenems | 126 | 0.260 |
64350 | Tribactams | 388 | 0.305 |
75755 | Vitamin D analogues | 455 | 0.386 |
Table 6.
MDDR activity classes for DS2 dataset.
Table 6.
MDDR activity classes for DS2 dataset.
Activity Index | Activity Class | Active Molecules | Pairwise Similarity |
---|
09249 | Muscarinic (M1) agonists | 900 | 0.111 |
12455 | NMDAreceptor antagonists | 1400 | 0.098 |
12464 | Nitric oxide synthase inhibitor | 505 | 0.102 |
31281 | Dopamine hydroxylase inhibitors | 106 | 0.125 |
43210 | Aldose reductase inhibitors | 957 | 0.119 |
71522 | Reverse transcriptase inhibitors | 700 | 0.103 |
75721 | Aromatase inhibitors | 636 | 0.110 |
78331 | Cyclooxygenase inhibitors | 636 | 0.108 |
78348 | Phospholipase A2 inhibitors | 617 | 0.123 |
78351 | Lipoxygenase inhibitors | 2111 | 0.113 |
Table 7.
Sutherland activity classes.
Table 7.
Sutherland activity classes.
Activity Class | Active Molecules | Pairwise Similarity |
---|
Estrogen receptor | 141 | 0.468 |
Ddihydrofolate reductase | 393 | 0.502 |
Cyclooxygenase-2 inhibitors | 303 | 0.687 |
Benzodiazepine receptor | 306 | 0.536 |
Table 8.
Details of every matrix size for every fingerprint.
Table 8.
Details of every matrix size for every fingerprint.
Fingerprint | Features Size | | Mol2mat Size n × n |
---|
ALOGP | 120 | 10.95 | 11 × 11 |
CDK | 1024 | 32 | 32 × 32 |
ECFC4 | 1024 | 32 | 32 × 32 |
ECFP4 | 1024 | 32 | 32 × 32 |
EPFP4 | 1024 | 32 | 32 × 32 |
GOFP | 1024 | 32 | 32 × 32 |
PCFP | 881 | 29.68 | 30 × 30 |
MDL | 166 | 12.88 | 13 × 13 |
Table 9.
Details of the first and second fully connected layers for every combination.
Table 9.
Details of the first and second fully connected layers for every combination.
Combined Case | Combined Layer Size | Number of Nodes in 1st Fully Connected Layer | Number of Nodes in 2nd Fully Connected Layer |
---|
2 Fingerprints | 6272 | 128 | 64 |
3 Fingerprints | 9408 | 256 | 128 |
4 Fingerprints | 12,544 | 512 | 256 |
5 Fingerprints | 15,680 | 1024 | 512 |