Next Article in Journal
The Ethnopharmacological Uses, Metabolite Diversity, and Bioactivity of Rhaponticum uniflorum (Leuzea uniflora): A Comprehensive Review
Previous Article in Journal
Bacterial and Cellular Response to Yellow-Shaded Surface Modifications for Dental Implant Abutments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen

by
Mohammed Khaldoon Altalib
1,2,* and
Naomie Salim
3
1
School of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia
2
Computer Science Department, Education for Pure Science Collage, University of Mosul, Mosul 41002, Iraq
3
UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Johor Bahru 81310, Malaysia
*
Author to whom correspondence should be addressed.
Biomolecules 2022, 12(11), 1719; https://doi.org/10.3390/biom12111719
Submission received: 26 October 2022 / Revised: 17 November 2022 / Accepted: 18 November 2022 / Published: 20 November 2022

Abstract

:
Information technology has become an integral aspect of the drug development process. The virtual screening process (VS) is a computational technique for screening chemical compounds in a reasonable amount of time and cost. The similarity search is one of the primary tasks in VS that estimates a molecule’s similarity. It is predicated on the idea that molecules with similar structures may also have similar activities. Many techniques for comparing the biological similarity between a target compound and each compound in the database have been established. Although the approaches have a strong performance, particularly when dealing with molecules with homogenous active structural, they are not enough good when dealing with structurally heterogeneous compounds. The previous works examined many deep learning methods in the enhanced Siamese similarity model and demonstrated that the Enhanced Siamese Multi-Layer Perceptron similarity model (SMLP) and the Siamese Convolutional Neural Network-one dimension similarity model (SCNN1D) have good outcomes when dealing with structurally heterogeneous molecules. To further improve the retrieval effectiveness of the similarity model, we incorporate the best two models in one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. Many designs of the hybrid models will be tested in this study. Several experiments on real-world data sets were conducted, and the findings demonstrated that the new approaches outperformed the previous method.

Graphical Abstract

1. Introduction

Drug discovery often involves multiple stages, beginning with the identification of a biological target, followed by the parallel screening of thousands of compounds and, finally, the production of the new drug. This technique is time-consuming, costly, and plagued with numerous difficulties. Virtual screening (VS) is a drug discovery computational technique that searches libraries of molecules for structures that are most able to belong to a drug target at a reasonable cost and time. Virtual screening is classified into two types: structure-based approaches, such as ligand-protein docking, and ligand-based approaches, such as similarity searching, machine learning, and pharmacophore mapping [1,2,3,4,5]. Similarity searching is the most effective and one of the maximal broadly used equipment for ligand-based virtual screening because it requires only a bioactive molecule, or reference structure, as the point to begin a database search. The fundamental concept underlying similarity searching states that structurally compatible molecules will show off similar physicochemical and organic properties. A similarity search compares target structure characteristics with each structure’s attributes in the database. The degree of resemblance of these two sets of features is used to measure the degree of closeness. Then, the database structures are usually ranked in decreasing order of similarity to the target. When discussing LBVS, various elements must be considered, including molecular representation and similarity coefficients, among others [6,7,8,9,10,11]. In cheminformatics, fingerprints are a crucial and widely used concept. Their primary goal is to create numerical representations of molecules’ structure or specific properties, allowing the comparison of two molecules to be quantified [1]. Molecular characteristics are spread from physicochemical attributes to structural features and are stored in various methods, referred to as molecular descriptors. The molecular descriptor aims to capture the most important features of the molecule. To compare the fingerprints of two chemicals, various similarity metrics can be used, such as Euclidean distance, Manhattan distance, and Dice coefficient, but the Tanimoto coefficient is the most preferred [5].
In recent years, data fusion has gained acceptance as one of the methods for improving the performance of existing systems for ligand-based virtual screening. Data fusion is the process of integrating numerous data sources into a single source using fusion techniques, assuming that the outcomes of the fused source will be more valuable than the individual input sources. For example, when many similarity coefficients were combined, it became more active than when individual coefficients were employed [1,4,12,13]. Dasarathy presented one of the most well-known data fusion classification systems, which is made up of the following five categories: (1) Data in-Data out, in which the raw data is inputted and outputted, the outcomes are often more accurate or dependable; (2) Data in-Feature out, in which the data fusion method uses the raw data from various participants at this level to extract the features or characteristics that describe an object or class; (3) Feature in-Feature out, in which the input and output of the data fusion procedure here are the features to enhance, hone, or create new features; (4) Feature in-Decision out, this level produces a set of decisions based on a collection of features that are obtained as input; (5) Decision In-Decision Out: is also called decision fusion, to fuse the input decisions and produce superior or novel judgments [14].
Several approaches were concerned with enhancing and increasing the retrieval effectiveness of the methods of similarity searching and the ways to calculate them. Several efforts were taken to improve and increase the retrieval efficacy of similarity searching methods, concluding that the Tanimoto coefficient is the industry-standard and outperforms others [15,16,17,18]. Some studies sought to include approaches from text document retrieval and apply them to molecular searches, such as Abdo et al., who used a Bayesian network that was originally used in the text field in document retrieval, and modified it as the retrieval model in the cheminformatic area [6]. Furthermore, Al Dabagh et al. applied quantum mechanics physics concepts to improve the molecular similarity searching and molecular ranking of chemical compounds in LBVS [9]. Some researchers, such as Ahmed et al., are working on weighting approaches to increase the retrieval effectiveness of a Bayesian inference network, allowing more weights to be added to relevant fragments while deleting the unnecessary ones [19,20,21]. Some studies looked into data fusion and proposed that similarity measurements be merged by combining the screening results obtained by employing multiple similarity measures. Nasser et al. fused several descriptors by selecting the best features from each descriptor and then merging them in the new descriptor [3,4,22]. Although the above methods outperform their predecessors, particularly when dealing with molecules with homogeneous active structural elements such as molecules’ classes in MDDR-DS2 dataset as will shown in Section 3.1, the performances are not good or satisfactory when dealing with molecules with a structurally heterogeneous nature such as molecules’ classes in MDDR-DS3 dataset as will shown in Section 3.1.
On the other hand, The Siamese network has been used for more complicated data samples, especially with heterogeneous data samples, and it is possible to employ deep learning methods with Siamese architecture, which deals efficiently with the vast volume of information stored in databases [23,24]. Altalib et al. employed many deep learning methods in Siamese architecture after enhancing with two similarity measures and one fusion layer to improve the retrieval effectiveness of molecules that have a structurally heterogeneous nature. The first study employed four methods of deep learning in Siamese architecture, which are: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Network-two dimensions (CNN-2D), and Convolutional Neural Network-one dimension (CNN-1D). The second study employed Multilayer Perceptron (MLP) in Siamese architecture [25,26]. This study continues to improve the effectiveness of similarity retrieval for molecules that have structurally heterogeneous by incorporating the two best previous models into one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. The following are the main contributions of this study:
  • The Siamese architecture for selected methods will be enhanced with three similarity measures to better improve the similarity measurements between molecules.
  • Incorporate many designs of a hybrid model from the selected two models. As mentioned before, each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall.
  • Compared to previous approaches, the proposed strategy yielded promising results in terms of overall performance, particularly when dealing with heterogeneous classes of molecules.

2. Methods

2.1. Siamese Architecture

Two identical artificial neural networks made up a Siamese neural network, each capable of handling the input data, which must be coupled to a last layer via a distance layer to foresee whether the two vectors belong to the same class. Because all the weights and biases in the Siamese architecture are related, they are referred to as twins. Both networks are symmetric as a result of this. Through training, the two neural networks also utilize feedforward perceptron and error backpropagation. As a result, it has been used on more complicated data samples, such as heterogeneous data samples with different dimensions and type attributes [23,24]. This work is considered extensions of our previous work [25,26]. Figure 1 shows steps for incorporating two enhanced Siamese similarity models into one hybrid model.
The main goal of this work is to improve the retrieval efficiency of molecular similarity searching, especially with structurally heterogeneous molecules, by incorporating two enhanced Siamese deep learning similarity models in one hybrid model. Thus, the steps for incorporating two enhanced Siamese similarity models in one hybrid model will be explained as follows:
  • We select the best two enhanced Siamese deep learning models from the previous studies according to the Kendall W significance test for ranking the methods. The two best methods in MDDR-DS3 (structurally heterogeneous) are the Siamese multi-layer perceptron similarity model (SMLP) and Siamese convolutional neural network -one dimension (SCNN1D) similarity model.
  • The Siamese architecture in the selected models will be enhanced with three similarity measures. The reason for this is to further improve the measurements of similarity between molecules in the hybrid model.
  • Incorporate the selected best two models in a hybrid model. Since each model gives good results in some classes, combining them in one hybrid model may improve the retrieval recall.
  • Testing many designs of hybrid models by using different types of data fusion to select the best hybrid model that will give good results of the recall metric when using with structurally heterogeneous molecules dataset. We select the best two enhanced Siamese deep learning models from the previous phase according to the Kendall W significance test for ranking the methods.

2.2. Hybrid Siamese Similarity Model Using Decision Fusion

The first design of the hybrid similarity model combines the two selected models from the previous studies. The first model is the SMLP similarity model, and the second is the SCNN1D similarity model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. The architecture of the SMLP consists of two twin neural networks, each of which has only one layer with 1024 neurons. The weights have also been linked in this architecture so that Network1 = Network2. The first network reads the fingerprint from the query, and the second reads the fingerprint from the database. The output of each network is a features vector with a fixed length (here, 1024 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 1024. The formula of this measure is [27]:
S A B = f A f B .  
where SAB is the similarity measure, fA is the feature victor of network-1, and fB is the feature vector of network-2. The second similarity measure is the exponential Manhattan distance [28]. The output of this measure is one value. The formula of the exponential Manhattan is:
E A B = e x p f A f B .  
where EAB is the exponential Manhattan distance, fA is the feature vector of network1, and fB is the feature vector of network2. The fusion layer has added the value of the second similarity measure with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the following layers, which contain 1024, 512, 256, 128, and 64 neurons, respectively. Each neuron is connected with all neurons in the previous layer. The ReLU activation function has been used for all layers. The Siamese architecture ends with the output layer, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function. The architecture of the SCNN1D similarity model consists of two twin neural networks, each of which has two layers of convolution neural network (1D-CNN). The layers are made up of 64 filters with a kernel size of 3; the activation function is a rectified linear unit (ReLU), and the maximum pooling size is 2. Then, there comes a flattened layer, followed by a dense layer with a sigmoid activation function. In this design, the weights are connected so that CNN1D-1 = CNN1D-2. The output of each network is a features vector with a fixed length (here, 512 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 512. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The fusion layer has added the value of the second similarity measure with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the end layer, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function.
The SMLP similarity model’s output (similarity score) will be fused with the output of the SCNN1D similarity model by using decision fusion (maximum). Figure 2 shows the details of the design of the hybrid Siamese similarity model with two similarity measures using the decision fusion.

2.3. Hybrid Siamese Similarity Model with Three Similarity Measures Using Decision Fusion

The second design of the hybrid similarity model same as the first design of the hybrid similarity model except using three similarity measures instead of two similarity measures. The second design of the hybrid similarity model is the same as the first design of the hybrid similarity model, except using three similarity measures instead of two similarity measures. The first is the SMLP similarity model, and the second is the SCNN1D similarity model. The third similarity measure will be added for each of them. The reason for that is it further improves the measurements between molecules. The Jaccard similarity measure will be added to the SMLP similarity model as the third similarity measure. The Russel similarity measure will be added to the SCNN1D similarity model as the third similarity measure. The selection for these measures is based on experiments. The formula of the Jaccard similarity measure [29]:
δ A B = i = 1 N f i A f i B i = 1 N ( f i A ) 2 + i = 1 N ( f i B ) 2 i = 1 N f i A f i B
where the features of the query’s molecular is fiA, the features of the dataset’s molecular is fiB, and N is the number of features in the vector. The formula of the Russel similarity measure [29] is:
δ A B = i = 1 N f i A f i B n
where fiA is the features of the query’s molecular, fiB is the features of the dataset’s molecular, and n is the number of features. The SMLP similarity model’s output (similarity score) will be fused with the output of the SCNN1D similarity model by using the decision fusion (maximum). Figure 3 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using the decision fusion.

2.4. Hybrid Siamese Similarity Model with Three Similarity Measures Using Feature Fusion Summation

The third design of the hybrid similarity model combines the two selected models: the first is the SMLP similarity model, and the second is the SCNN1D similarity model. The architecture here of the SMLP consists of two twin neural networks, each of which has only one layer with 1024 neurons. The weights have also been linked in this architecture so that Network1 = Network2. The first network reads the fingerprint from the query, and the second reads the fingerprint from the database. The output of each network is a features vector with a fixed length (here, 1024 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 1024. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The two formulas were covered in Section 2.2. The third similarity measure is the Jaccard measure. The formula was covered in Section 2.3. The output of this measure is one value. The feature fusion layer has added the value of the third similarity measure with the value of the second similarity measure. Then, the result has added with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the following layers, which contain 1024 512 neurons, respectively. Each neuron is connected with all neurons in the previous layer. The ReLU activation function has been used for all layers. The output of this model is a features vector with a length of 512.
The architecture of the SCNN1D similarity model consists of two twin neural networks, each of which has two layers of the convolution neural network (CNN1D). The layers are made up of 64 filters with a kernel size of 3; the activation function is a rectified linear unit (ReLU), and the maximum pooling size is 2. Then comes a flattened layer, followed by a dense layer with a sigmoid activation function. In this design, the weights are connected so that CNN1D-1 = CNN1D-2. The output of each network is a features vector with a fixed length (here, 512 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 512. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The third similarity measure is that the Russel similarity measure will be added to the SCNN1D similarity model as the third similarity measure. The feature fusion layer has added the value of the third similarity measure with the value of the second similarity measure. Then, the result has added with each value of the vector of the first similarity measure. The output of the fusion layer is the feature vector, which is considered as the output of second model. Figure 4 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using feature fusion summation.
The SMLP similarity model’s output (feature vector 512 bit) will be fused with the output of the SCNN1D similarity model (feature vector 512 bit) by using feature fusion (sum). The result of this layer will be passed to the last layer of the hybrid model, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function.

2.5. Hybrid Siamese Similarity Model with Three Similarity Measures Using Feature Fusion Maximum

The fourth design of the hybrid similarity model is similar to the previous design (third design), except using maximum operation instead of sum operation in the feature fusion between the SMLP similarity model and the SCNN1D similarity model in the hybrid model. Figure 5 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using the feature fusion maximum.

3. Experimental Design

3.1. Datasets

Here, we evaluate the search methods for similarity by using MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) data sets, which are the most common [30,31]. The MDDR datasets have been used by our research group and previous studies [3,4,6,9,10,19,20,21,22,25,26,32,33,34,35,36,37,38,39,40]. All molecules have been translated to ECFC-4 fingerprint by the Pipeline Pilot software, and our study community has used these databases. Ten reference structures were chosen randomly from each activity class. The MDDR contains three types of data sets, which are:
  • MDDR-DS1: This consists of 102,516 molecules divided into activity and inactivity groups. The activity molecules are split into 11 categories, with some having structurally homogeneous active elements and others having structurally heterogeneous active elements. Table 1 shows the activity classes of molecules in DS1.
  • MDDR-DS2: This contains 102,516 molecules divided into activity and inactivity groups. The activity molecules are split into 10 groups of homogeneous activity classes activity molecules. Table 2 shows the activity classes of molecules in DS2.
  • This contains 102,516 molecules divided into activity and inactivity groups; the activity molecules are split into 10 groups of heterogeneous activity classes activity molecules. Table 3 shows the activity classes of molecules in DS3.
Rohrer and Baumann documented the data gathering (MUV), as observed in Table 4. This data collection contains 17 interaction groups, with each class including up to 30 active and 15,000 inactive molecules. Our research team has utilized these data collections in prior papers.

3.2. Performance Evaluation Measures

The effectiveness of the proposed approaches is assessed as follows:
1.
The recall metric, which is the part of active chemical compounds that can be identified inside the top 1 and 5% of the ranking test set, is the first method for assessing the retrieval model’s performance. This metric has already been utilized in research [3,4,6,9,10,19,20,21,22,25,26,32,33,34,35,36,37,38,39,40]. Figure 6 shows the general steps of the experimental design of this study.
Here, the whole dataset is separated into K equal-sized sets, one of which is designated as a test set and the rest sets as training sets. The test set is changed in each iteration, and the final result is determined as the average of the recall values from all iterations. As observed in Figure 7, this procedure is known as k-fold cross-validation. Each iteration tests 10 questions chosen at random from the activity class, and the mean value of these ten searches is determined.
2.
Comparison Methods: The second strategy is to look at existing approaches that could be used to evaluate the outcomes of the proposed models and that use the same datasets. Among these approaches are the following:
A.
Tanimoto similarity coefficient (TAN): for many years, TAN has served as the LBVS search benchmark technique. Tanimoto-based similarity models use the Tanimoto coefficient in its continuous version for the ECFC-4 descriptor [15].
B.
Bayesian inference (BIN), the second method is BIN for the ECFC-4 descriptor [6].
C.
Quantum similarity search (SQB); the third method is SQB, which utilizes a quantum mechanics approach for the ECFC-4 descriptor [9].
D.
Stack of Deep Belief Networks (SDBN): The latest study is multi-descriptor-based on the Stack of deep belief networks method at the MDDR dataset (DS1, DS2, and DS3) for ECFC-4, ECFP-4, and EPFP-4 descriptors. The molecular features were reweighted using deep belief networks [4].
E.
Enhanced Siamese Convolutional Neural network—one dimension (SCNN-1D) [26] and Enhanced Siamese Multilayer perception (SMLP) [25], which are compared with them before and after being combined into one hybrid model.
3.
The Kendall W is the third significant measure that may be used to evaluate the suggested procedures, often known as the significance test. This significance test has already been utilized in prior research [3,4,9,10,19,20,21,22,25,26,29,32,33,34,36,37,38,39,40,41,42]. This test can be construed as a measure of rater agreement. In the Kendall W test, each case represents a judge or rater, and each variable represents an object or person being rated. The number of rankings is computed for each variable. The Kendall W test range is between (0), indicating no agreement, and (1) indicating full agreement. For example, the rank rij by judge number j, which represents an activity class, where there are n objects and m judges in total, is given to object i as the similarity search method. It is then possible to calculate the total rank given to object i as [43]:
i = j = 1 m r i j
Whereas the complete ranks’ mean meaning is:
¯ = 1 2 m n + 1
Squared deviation sum δ is defined as:
δ = i = 1 n ( i ) 2
Then, the Kendall W test is defined as:
W = 12 δ m 2 n 3 n .  
This test demonstrates whether a group of judges can make equivalent decisions about the rating of a set of items or not. The definitions used in this analysis suggest that judges were considered to be the behavior groups of each of the data sets, whereas the recall rates of the different search models were considered to be the items. The outcomes of the Kendall coefficient that are related to significance levels are a significant part of this experiment. This implies verifying whether the value of the coefficient may have happened by chance or not. If the value was important (for which both 0.01 and 0.05 cut-off values were used), it was then possible to assign the item an overall ranking.
4.
For a more evident comparison between the recall values of the proposed methods and the recall values of the previous methods, the improvement percentage for each proposed method will be calculated. The improvement percentage formula is [44].
Improvement m e t h o d 1 =   Recall   method 1     Recall   method 2     Recall   method 1   × 100 %
where the   Recall   method 1   represented the recall value of the first method, and   Recall   method 2     represented the recall value of second method.

4. Results and Discussion

The ECFC-4 descriptor’s experimental outcomes on the MDDRDS1, MDDR-DS2, MDDR-DS3, and MUV data sets are provided in Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, respectively, using cut-offs 1 and 5%. In addition, the results of the proposed Hybrid Siamese Similarity Models are recorded in these tables compared to the benchmark Tanimoto Similarity Coefficient (TAN) and previous studies, which are Bayesian inference (BIN), quantum similarity search (SQB), Stack of deep belief networks (SDBN) in MDDR datasets only, and two of our proposal methods of Siamese architecture in previous studies, which are the SMLP similarity model and SCNN1D similarity model. The hybrid Siamese similarity model with decision fusion using two similarity measures is here called the Hybrid-D-Max2. The hybrid Siamese similarity model with decision fusion using three similarity measures is here called the Hybrid-D-Max3. The hybrid Siamese similarity model with feature fusion summation is here called the Hybrid-F-Sum. The hybrid Siamese similarity model with feature fusion max is here called the Hybrid-F-Max. Each row in the tables lists recall values for the top 1% and 5% of the activity class, and in each row, the best recall rate is shaded. In the tables, the mean row relates to the average of all activity classes, and the row of shaded cells is the total number of shaded cells with the top values for each technique over the full range of activity classes. The first column of the table represents the activity classes of the dataset. This is followed by four columns that represent the previous studies: TAN, BIN, SQR, and SDBN, and this is followed by two columns that represent the two proposed Siamese in previous studies: SMLP, and SCNN1D. It is then followed by four columns representing the proposed hybrid models in this study. Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15 show the contrast among methods for the average recall percentage of successful compound retrieval at the top of the 1% and 5% in MDDRDS1, MDDRDS2, MDDRDS3, and MUV, respectively.
The results presented in MDDR-DS1 (structurally homogeneous and heterogeneous) recall values for the 1 and 5% cut-offs recorded in Table 5 and Table 6 showed that the proposed hybrids of Siamese similarity models were obviously superior to the benchmark studies: TAN, BIN, SQB, SDBN, and previous two selected proposed methods: Siamese SMLP and SCNN1D. In addition, among other hybrid Siamese similarity models, the Hyper model (Hybrid-F-Max) gives the best retrieval recall results in Table 5, in view of the mean and the number of shaded cells, followed by the Hyper model (Hybrid-F-Sum), in view of the mean, followed by the Hyper Siamese model (Hybrid-D-Max2) and the Hyper Siamese model (Hybrid-D-Max3), in view of the mean, followed by the proposed methods in objective one SCNN1D and SMLP, and followed by the SDNB, BIN, SQB, and TAN in view of the mean. The improvement percentages of the hybrid-F-Max model in the mean recall values compared with previous studies and the two proposed methods in objective one are 15.25, 21.09, 55.2, 48.28, 50.35, and 44.45 compared to SCNN1D, SMLPearly, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Sum model are 9.23,15.48, 52.03, 44.60, 46.82, 40.50 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max3 model are 0.41, 7.26, 47.37, 39.22, 41.65, 34.72 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max2 model are 0.89, 7.71, 47.62, 39.52, 41.94, and 35.03 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB respectively. Figure 8 compares methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS1. By comparison, the Hybrid-F-Max proposed method gives the best retrieval recall results in Table 6 in view of the mean, and the number of shaded cells, followed by the Hybrid-F-Sum, Hybrid-D-Max 2, Hybrid-D-Max 3, SCNN1D, and SMLP. Then, SDNB, BIN, SQB, and TAN are in view of the mean. The improvement percentages of the hybrid-F-Max model are 15.50, 18.26, 49.49, 45.24, 48.42, and 40.69 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Sum model are 11.03, 13.94, 46.82, 42.35, 45.70, and 37.55 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max3 model are 0.49, 3.74, 40.52, 35.51, 39.26, and 30.15 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max2 model are 0.68, 3.92, 40.63, 35.64, 39.38, and 30.28 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. Finally, Figure 9 compares methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS1.
Furthermore, the MDDR-DS2 (structurally homogeneous) recall values recorded for the top 1% in Table 7 show that some of the proposed Siamese similarity models’ proposed hybrids are superior to the benchmark TAN method and previous studies. The Hyper Siamese with the decision fusion max model with three similarity measures Hybrid-D-Max 3 gives the best retrieval recall results in Table 7 in view of the mean, followed by SCNN1D in view of the mean and the number of shaded cells, and followed by Hybrid-D-Max 2, Hybrid-F-Max, Hybrid-F-Sum, SDBN, BIN, SQB, SMLP, and TAN. The improvement percentages of the hybrid-D-Max3 model are 0.05, 7.42, 27.79, 6.34, 6.89, and 5.12 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max2 model are 4.01, 25.13, 2.89, 3.46, and 1.62 compared to SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Max model are 3.74, 24.92, 2.62, 3.18, and 1.34 compared to SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Sum model are 2.97, 24.32, 1.84, 2.41, and 0.56 compared to SMLP, TAN, BIN, SQB, and SDNB, respectively. Figure 10 shows the comparison among methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS2. However, the MDDR-DS2 recall values recorded for 5% cut-offs in Table 8 show that the BIN method gave the best retrieval recall results in view of the mean and the number of shaded cells. The second best is SQB, followed by SDBN, Hybrid-D-Max 3, SCNN1D, Hybrid-D-Max 2, Hybrid-F-Max, Hybrid-F-Sum, Hybrid-D-Max 2, and finally, TAN in view of the mean values. The improvement percentages of the hybrid-D-Max model are 0.07, 5.36, and 16.24 compared to SCNN1D, SMLP, and TAN, respectively. The improvement percentages of the hybrid-D-Max2 model are 3.09 and 4.22 compared to SMLP and TAN. The improvement percentages of the hybrid-F-Max model are 2.51 and 13.71 compared to SMLP and TAN. The improvement percentages of the hybrid-F-Max model are 1.83 and 13.11 compared to SMLP and TAN. Figure 11 shows the comparison among methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS2.
Moreover, the MDDR-DS3 (structurally heterogeneous) recall values for the 1 and 5% cut-offs recorded in Table 9 and Table 10 demonstrated that the proposed hybrids Siamese similarity models were superior to the benchmark TAN method and other studies: TAN, BIN, SQB, SDBN, and previous two selected proposed methods: Siamese MLP and CNN1D. In addition, among other hybrid Siamese similarity models, the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) gives the best retrieval recall results in Table 9 in view of the mean and the number of shaded cells, followed by the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) in view of the mean, followed by SMLP, the Hybrid-D-Max3, SCNN1D, and the Hybrid-D-Max2, and followed by the SDNB, BIN, SQB, and TAN in view of the mean. The improvement percentages of the hybrid-F-Max model are 17.03, 12.64, 70.72, 65.68, 68.44, and 50.80 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Sum model are 14.62, 10.10, 69.87, 64.67, 67.52, and 49.37 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max3 model are 4.06, 66.15, 60.31, 63.51, and 43.11 compared to SCNN1D, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max2 model are 64.66, 58.57, 61.91, and 40.61 compared to TAN, BIN, SQB, and SDNB, respectively. Figure 12 compares methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS3. By comparison, the Hybrid-F-Max proposed method gives the best retrieval recall results in Table 10 in view of the mean and the number of shaded cells, followed by the Hybrid-F-Sum in view of the mean, Hybrid-D-Max 3, SMLP, SCNN1D, and Hybrid-D-Max 2. Then, SDNB, TAN, BIN, and SQB are in view of the mean. The improvement percentages of the hybrid-F-Max model are 20.08, 16.35, 68.9, 69.00, 69.63, and 58.72 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-F-Sum model are 14.62, 10.64, 66.78, 66.88, 67.56, and 55.91 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max3 model are 8.94, 4.69, 64.57, 64.67, 65.40, and 52.97 compared to SCNN1D, SMLP, TAN, BIN, SQB, and SDNB, respectively. The improvement percentages of the hybrid-D-Max2 model are 60.67, 60.78, 61.59, and 47.79 compared to TAN, BIN, SQB, and SDNB, respectively. Finally, Figure 13 compares methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS3.
Furthermore, the MUV recall values for the 1% cut-offs recorded in Table 11 demonstrated that some proposed hybrid Siamese similarity models were superior to the benchmark TAN method and other studies: TAN, BIN, SQB, and previous selected proposed method SCNN1D, except the SMLP method. In addition, among other hybrid Siamese similarity models, the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) gives the best retrieval recall results in Table 11 in view of the mean, followed by the SCNN1D, Hybrid-D-Max3, Hybrid-F-Max, BIN, Hybrid-D-Max2, SQB, and TAN. The improvement percentages of the hybrid-F-SUM model are 3.05, 50.64, 19.97, and 55.95 compared to SCNN1D, SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-F-Max model are 46.02, 12.48, and 51.83 compared to SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-D-Max3 model are 46.79, 13.74, and 52.52 compared to SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-D-Max2 model are 33.08 and 40.29 compared to SQB and TAN. Figure 14 compares the methods for the average recall percentage of successful compound retrieval at the top 1% MUV. By comparison, the Hybrid-F-Sum proposed method gives the best recall results in Table 12 in view of the mean, followed by the Hybrid-F-Max in view of the mean, Hybrid-D-Max 3, SCNN1D, and Hybrid-D-Max 2. Then, BIN, SQB, and TAN are in view of the mean. The improvement percentages of the hybrid-F-SUM model are 7.17, 34.07, 22.10, and 38.63 compared to SCNN1D, SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-F-Max model are 5.82, 33.11, 20.97, and 37.73 compared to SCNN1D, SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-D-Max3 model are 4.33, 32.05, 19.72, and 36.75 compared to SCNN1D, SQB, BIN, and TAN, respectively. The improvement percentages of the hybrid-D-Max2 model are 17.61, 2.65, and 23.30 compared to SQB, BIN, and TAN, respectively. Figure 15 compares the methods for the average recall percentage of the successful compound retrieval at the top 1% in MUV.
The second method that can be used to evaluate the proposed methods is the significance test. The Kendall W is the significance test that will be used in this study. Moreover, Table 13, Table 14, Table 15 and Table 16 show the ranking of the hybrid Siamese similarity models (Hybrid-F-Max, Hybrid-F-Sum, Hybrid-D-Max3, Hybrid-D-Max 2) based on previous studies TAN, BIN, SQB, Siamese MLP, and CNN1D, using Kendall W test results for MDDR-DS1, MDDR-DS2, MDDR-DS3, and MUV at the top 1% and top 5%.
For all of the data sets used, the Kendall W test of the top 1% shows that the significance test (P) values are less than 0.05; this means that the hybrid-enhanced Siamese similarity models are significant in all cases with the top 1%. Therefore, the general ranking of all methods indicates that the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) and the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) are superior to other methods and have the top rank in MDDR-DS1 (homogeneous and heterogeneous) and MDDR-DS3 (structurally heterogeneous). In MDDR-DS2 (structurally homogeneous), the hyper Siamese with the decision fusion max model with three similarities (Hybrid-D-Max3) has the top rank among other methods. In the MUV dataset, the Hyper Siamese with the Feature fusion Sum model (Hybrid-F-Sum) has the top rank among other methods except the SMLP method.
It is the same as with the results of the Kendall W test of the top 5%. The results indicate that significance test (P) values are less than 0.05. This means that the hybrid Siamese similarity models are significant in all cases with the top 5%. As a result, the general ranking of all methods indicates that the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) and the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) are superior to other methods and have the top rank in the MDDR-DS1(homogeneous and heterogeneous) and MDDR-DS3 (structurally heterogeneous). In DS2, BIN has the top rank in the top 5% and in the MUV dataset, the SMLP method has the top rank among other methods, and then the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max). Figure 16 and Figure 17 show the ranking of the hybrid Siamese similarity models (Hybrid-D-Max2, Hybrid-D-Max3, Hybrid-F-Sum, Hybrid-F-Max) methods based on TAN, BIN, SQB, SDBN, Siamese MLP, and CNN1D using Kendall W test results for MDDR-DS1, MDDR-DS2, MDDR-DS3, and MUV in the top 1% and 5%, respectively.
Lastly, according to the experiment results, the success of the proposed methods comes from: (1) The Siamese network, which is used for more complicated data samples, especially with heterogeneous data samples, and it is possible to employ deep learning methods with Siamese architecture, which deals efficiently with the vast volume of information stored in databases. (2) Enhancing the Siamese architecture with several similarity measures because each similarity measure focused on different properties, so, when used together, they lead to an improvement in the recall metric. (3) Incorporate the two selected models in one hybrid model because each method provides good results in some classes, so combining them in one hybrid model improved the retrieval recall. The two designs of hybrid models, which used feature data fusion (Hybrid-F-Max and the Hybrid-F-Sum), gave good results compared with the other two designs of hybrid models, which used decision data fusion (Hybrid-D-Max3 and Hybrid-D-Max2) because the first two designs worked on the features, which are enhanced by using the sum and max operation, and then led to improvements in the recall metric. In comparison, the other two designs of hybrid models worked only on selecting the max results between the methods (SMLP, SCNN1D) in their hybrid designs.
Besides that, the proposed methods have good results in MDDR-DS3, MDDR-DS1, and MUV because they contain heterogeneous molecule classes. In MDDR-DS2, the proposed methods did not achieve a higher score than other traditional methods (TAN, BIN, SQB, SDBN) because the dataset has only structurally homogeneous molecules classes. However, some proposed methods have achieved better results at the top 1% only compared with traditional methods.

5. Conclusions

Many techniques for capturing the biological similarity between a test compound and a known target ligand in LBVS have been established. The similarity search is one of the primary tasks in VS that estimates a molecule’s similarity. It is predicated on the idea that molecules with similar structures may also have similar activities. In spite of the good performance of the methods, especially when dealing with molecules that have homogeneous active structural elements, they are not good enough when dealing with structurally heterogeneous molecules. The previous works examined many deep learning methods in the enhanced Siamese similarity model. According to Kendall W’s significant test, the best two methods in MDDR-DS3 (structurally heterogeneous) are the SMLP similarity model and the SCNN1D similarity model. To further improve the retrieval effectiveness of the similarity model, we incorporate the best two models in one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. Many designs of the hybrid models have been tested in this study. The overall results of all methods indicate that the Hybrid-F-Max method and the Hybrid-F-Sum method are superior to previous studies in DS1 and DS3 and have the top ranks among other methods at the top 1 and 5%, while the Hybrid-D-Max3, Hybrid-D-Max2, and Hybrid-F-Max are superior to previous studies in DS2 at the top 1%. In MUV, SMLP has the top rank, then Hybrid-F-Sum in the top 1%, and Hybrid-F-Max, Hybrid-F-Max, and Hybrid-D-Max3. The future work of this study is to reduce the size of the hybrid-enhanced Siamese similarity model by pruning the less significant weights.

Author Contributions

Conceptualization, M.K.A. and N.S.; Methodology, M.K.A. and N.S.; Software, M.K.A.; Validation, M.K.A. and N.S.; Formal analysis, M.K.A. and N.S.; Investigation, M.K.A. and N.S.; Data curation, M.K.A.; Writing—original draft, M.K.A.; Writing—review and editing, M.K.A. and N.S.; Supervision, N.S.; Project administration, M.K.A.; Funding acquisition, N.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Higher Education, Malaysia (JPT(BKPI)1000/016/018/25(58)) through Malaysia Big Data Research Excellence Consortium (Bi-DaREC) (Vot No: R.J130000.7851.4L933), (Vot No: R.J130000.7851.4L942), (Vot No: R.J130000.7851.4L938), (Vot No: R.J130000.7851.4L936).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MDL Drug Data Report (MDDR) dataset is owned by www.accelrys.com (accessed on 31 October 2021). A license is required to access the data. Maximum Unbiased Validation (MUV) Data Sets are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html (accessed on 31 October 2021). Software License: Python 3.7 in environment anaconda/Spyder was used with the following libraries: TensorFlow, theano, keras, numpy, pandas and math. the license of statistics application (IBM spss) is licenseapp.utm.my.

Acknowledgments

The first author would like to thank the Islamic development bank (IsDB) for the scholarships of his Ph.D. and the Mosul university for allowing him and supporting him to continue with his study for the Ph.D.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brown, N. Chemoinformatics—an introduction for computer scientists. ACM Comput. Surv. (CSUR) 2009, 41, 8. [Google Scholar] [CrossRef]
  2. Hertzberg, R.P.; Pope, A.J. High-throughput screening: New technology for the 21st century. Curr. Opin. Chem. Biol. 2000, 4, 445–451. [Google Scholar] [CrossRef]
  3. Nasser, M.; Salim, N.; Hamza, H. Molecular Similarity Searching Based on Deep Belief Networks with Different Molecular Descriptors. In Proceedings of the 2020 2nd International Conference on Big Data Engineering and Technology, Shanghai, China, 29–31 May 2020; pp. 18–24. [Google Scholar]
  4. Nasser, M.; Salim, N.; Hamza, H.; Saeed, F.; Rabiu, I. Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks. Molecules 2021, 26, 128. [Google Scholar] [CrossRef]
  5. Gimeno, A.; Ojeda-Montes, M.J.; Tomás-Hernández, S.; Cereto-Massagué, A.; Beltrán-Debón, R.; Mulero, M.; Pujadas, G.; Garcia-Vallvé, S. The light and dark sides of virtual screening: What is there to know? Int. J. Mol. Sci. 2019, 20, 1375. [Google Scholar] [CrossRef] [Green Version]
  6. Abdo, A.; Chen, B.; Mueller, C.; Salim, N.; Willett, P. Ligand-based virtual screening using bayesian networks. J. Chem. Inf. Modeling 2010, 50, 1012–1020. [Google Scholar] [CrossRef]
  7. Sheridan, R.P.; Kearsley, S.K. Why do we need so many chemical similarity search methods? Drug Discov. Today 2002, 7, 903–911. [Google Scholar] [CrossRef]
  8. Willett, P. Textual and chemical information processing: Different domains but similar algorithms. Inf. Res. 2000, 5. [Google Scholar]
  9. Al-Dabbagh, M.M.; Salim, N.; Himmat, M.; Ahmed, A.; Saeed, F. A quantum-based similarity method in virtual screening. Molecules 2015, 20, 18107–18127. [Google Scholar] [CrossRef]
  10. Himmat, M.; Salim, N.; Al-Dabbagh, M.M.; Saeed, F.; Ahmed, A. Adapting document similarity measures for ligand-based virtual screening. Molecules 2016, 21, 476. [Google Scholar] [CrossRef] [Green Version]
  11. Engel, T. Basic overview of chemoinformatics. J. Chem. Inf. Modeling 2006, 46, 2267–2277. [Google Scholar] [CrossRef]
  12. Hall, D.L.; McMullen, S.A. Mathematical Techniques in Multisensor Data Fusion; Artech House: London, UK, 2004. [Google Scholar]
  13. Liggins, M., II; Hall, D.; Llinas, J. Handbook of Multisensor Data Fusion: Theory and Practice; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  14. Dasarathy, B.V. Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 1997, 85, 24–38. [Google Scholar] [CrossRef]
  15. Bajusz, D.; Rácz, A.; Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 2015, 7, 20. [Google Scholar] [CrossRef] [Green Version]
  16. Cai, C.; Gong, J.; Liu, X.; Gao, D.; Li, H. Molecular similarity: Methods and performance. Chin. J. Chem. 2013, 31, 1123–1132. [Google Scholar] [CrossRef]
  17. Duan, J.; Dixon, S.L.; Lowrie, J.F.; Sherman, W. Analysis and comparison of 2D fingerprints: Insights into database screening performance using eight fingerprint methods. J. Mol. Graph. Model. 2010, 29, 157–170. [Google Scholar] [CrossRef]
  18. Syuib, M.; Arif, S.M.; Malim, N. Comparison of similarity coefficients for chemical database retrieval. In Proceedings of the 2013 1st International Conference on Artificial Intelligence, Modelling and Simulation, Kota Kinabalu, Malaysia, 3–5 December 2013; pp. 129–133. [Google Scholar]
  19. Ahmed, A.; Abdo, A.; Salim, N. Ligand-based virtual screening using Bayesian inference network and reweighted fragments. Sci. World J. 2012, 2012, 1–7. [Google Scholar] [CrossRef] [Green Version]
  20. Ahmed, A.; Saeed, F.; Salim, N.; Abdo, A. Condorcet and borda count fusion method for ligand-based virtual screening. J. Cheminformatics 2014, 6, 19. [Google Scholar] [CrossRef] [Green Version]
  21. Ahmed, A.; Salim, N.; Abdo, A. Fragment reweighting in ligand-based virtual screening. Adv. Sci. Lett. 2013, 19, 2782–2786. [Google Scholar] [CrossRef]
  22. Nasser, M.; Salim, N.; Saeed, F.; Basurra, S.; Rabiu, I.; Hamza, H.; Alsoufi, M.A. Feature Reduction for Molecular Similarity Searching Based on Autoencoder Deep Learning. Biomolecules 2022, 12, 508. [Google Scholar] [CrossRef]
  23. Chicco, D. Siamese neural networks: An overview. Artif. Neural Netw. 2021, 2190, 73–94. [Google Scholar]
  24. Bromley, J.; Guyon, I.; Lecun, Y.; Säckinger, E.; Shah, R. Signature verification using a “Siamese” time delay neural network. In Proceedings of the 6th International Conference on Neural Information Processing Systems, Denver, CO, USA, 29 November–2 December 1993. [Google Scholar]
  25. Altalib, M.K.; Salim, N. Similarity-Based Virtual Screen Using Enhanced Siamese Multi-Layer Perceptron. Molecules 2021, 26, 6669. [Google Scholar] [CrossRef]
  26. Moreau, P.; Anizon, F.; Sancelme, M.; Prudhomme, M.; Bailly, C.; Sevère, D.; Aubertin, A.M. Syntheses and biological activities of rebeccamycin analogues. Introduction of a halogenoacetyl substituent. J. Med.Chem. 1999, 42, 584–592. [Google Scholar] [CrossRef]
  27. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. Master’s Thesis, University of Toronto, Toronto, ON, Canada, 2015. [Google Scholar]
  28. Mueller, J.; Thyagarajan, A. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
  29. Salim, N. Analysis and comparison of molecular similarity measures. Ph.D. Dissertation, University of Sheffield, Sheffield, UK, 2002. [Google Scholar]
  30. MDL Drug Data Report (MDDR); Accelrys Inc.: San Diego, CA, USA; Available online: http://www.accelrys.com (accessed on 31 October 2021).
  31. Rohrer, S.G.; Baumann, K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J. Chem. Inf. Modeling 2009, 49, 169–184. [Google Scholar] [CrossRef]
  32. Abdo, A.; Saeed, F.; Hamza, H.; Ahmed, A.; Salim, N. Ligand expansion in ligand-based virtual screening using relevance feedback. J. Comput.-Aided Mol. Des. 2012, 26, 279–287. [Google Scholar] [CrossRef]
  33. Abdo, A.; Salim, N. Similarity-based virtual screening with a Bayesian inference network. ChemMedChem Chem. Enabling Drug Discov. 2009, 4, 210–218. [Google Scholar] [CrossRef]
  34. Al-Dabbagh, M.M.; Salim, N.; Himmat, M.; Ahmed, A.; Saeed, F. Quantum probability ranking principle for ligand-based virtual screening. J. Comput. Aided Mol. Des. 2017, 31, 365–378. [Google Scholar] [CrossRef]
  35. Hamza, H. Fusion of Molecular Representations and Prediction of Biological Activity Using Convolutional Neural Network and Transfer Learning. Ph.D. Dissertation, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia, 2019. [Google Scholar]
  36. Himmat, M.H.I. New Similarity Measures for Ligand-Based Virtual Screening. Ph.D. Dissertation, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia, 2017. [Google Scholar]
  37. Nasser, M.; Salim, N.; Hamza, H.; Saeed, F. Deep Belief Network for Molecular Feature Selection in Ligand-Based Virtual Screening. In Proceedings of the International Conference of Reliable Information and Communication Technology, Kuala Lumpur, Malaysia, 23–24 July 2018; pp. 3–14. [Google Scholar]
  38. Saeed, F.; Salim, N.; Abdo, A. Voting-based consensus clustering for combining multiple clusterings of chemical structures. J. Cheminformatics 2012, 4, 37. [Google Scholar] [CrossRef] [Green Version]
  39. Saeed, F.; Salim, N.; Abdo, A. Combining multiple clusterings of chemical structures using cluster-based similarity partitioning algorithm. Int. J. Comput. Biol. Drug Des. 2014, 7, 31–44. [Google Scholar] [CrossRef]
  40. Salim, N.; Holliday, J.; Willett, P. Combination of fingerprint-based similarity coefficients using data fusion. J. Chem. Inf. Comput. Sci. 2003, 43, 435–442. [Google Scholar] [CrossRef]
  41. Abdo, A.; Salim, N.; Ahmed, A. Implementing relevance feedback in ligand-based virtual screening using Bayesian inference network. J. Biomol. Screen. 2011, 16, 1081–1088. [Google Scholar] [CrossRef] [Green Version]
  42. Himmat, M.; Salim, N.; Al-Dabbagh, M.M.; Saeed, F.; Ahmed, A. Data mining and fusion methods in ligand-based virtual screening. J. Chem. Pharm. Sci. 2015, 8, 964–969. [Google Scholar]
  43. Legendre, P. Species associations: The Kendall coefficient of concordance revisited. J.Agric.Biol. Environ. Stat. 2005, 10, 226. [Google Scholar] [CrossRef]
  44. Shukur, O.B.; Lee, M.H. Imputation of missing values in daily wind speed data using hybrid AR-ANN method. Mod. Appl. Sci. 2015, 9, 1. [Google Scholar] [CrossRef]
Figure 1. The steps for incorporating two enhanced Siamese similarity models into one hybrid model.
Figure 1. The steps for incorporating two enhanced Siamese similarity models into one hybrid model.
Biomolecules 12 01719 g001
Figure 2. The design of the hybrid Siamese similarity model with two similarity measures using decision fusion.
Figure 2. The design of the hybrid Siamese similarity model with two similarity measures using decision fusion.
Biomolecules 12 01719 g002
Figure 3. The design of the hybrid Siamese similarity model with three similarity measures using decision fusion.
Figure 3. The design of the hybrid Siamese similarity model with three similarity measures using decision fusion.
Biomolecules 12 01719 g003
Figure 4. The design of the hybrid Siamese similarity model with three similarity measures using feature fusion summation.
Figure 4. The design of the hybrid Siamese similarity model with three similarity measures using feature fusion summation.
Biomolecules 12 01719 g004
Figure 5. The design of the hybrid Siamese similarity model with three similarity measures using feature fusion maximum.
Figure 5. The design of the hybrid Siamese similarity model with three similarity measures using feature fusion maximum.
Biomolecules 12 01719 g005
Figure 6. The general steps of the experimental design of this study.
Figure 6. The general steps of the experimental design of this study.
Biomolecules 12 01719 g006
Figure 7. The cross validation for training and testing data.
Figure 7. The cross validation for training and testing data.
Biomolecules 12 01719 g007
Figure 8. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS1 (homogeneous and heterogeneous).
Figure 8. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS1 (homogeneous and heterogeneous).
Biomolecules 12 01719 g008
Figure 9. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS1 (homogeneous and heterogeneous).
Figure 9. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS1 (homogeneous and heterogeneous).
Biomolecules 12 01719 g009
Figure 10. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS2 (homogeneous).
Figure 10. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS2 (homogeneous).
Biomolecules 12 01719 g010
Figure 11. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS2 (homogeneous).
Figure 11. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS2 (homogeneous).
Biomolecules 12 01719 g011
Figure 12. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS3 (structurally heterogeneous).
Figure 12. The comparison among methods for the average recall percentage at the top 1% in MDDR-DS3 (structurally heterogeneous).
Biomolecules 12 01719 g012
Figure 13. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS3 (structurally heterogeneous).
Figure 13. The comparison among methods for the average recall percentage at the top 5% in MDDR-DS3 (structurally heterogeneous).
Biomolecules 12 01719 g013
Figure 14. The comparison among methods for the average recall percentage at the top 1% in MUV.
Figure 14. The comparison among methods for the average recall percentage at the top 1% in MUV.
Biomolecules 12 01719 g014
Figure 15. The comparison among methods for the average recall percentage at the top 5% in MUV.
Figure 15. The comparison among methods for the average recall percentage at the top 5% in MUV.
Biomolecules 12 01719 g015
Figure 16. The ranking methods at the top 1%.
Figure 16. The ranking methods at the top 1%.
Biomolecules 12 01719 g016
Figure 17. The ranking methods at the top 5%.
Figure 17. The ranking methods at the top 5%.
Biomolecules 12 01719 g017
Table 1. The MDDR-DS1 (structurally homogeneous and heterogeneous) activity classes.
Table 1. The MDDR-DS1 (structurally homogeneous and heterogeneous) activity classes.
Activity ClassActivity IndexActive MoleculesPairwise Similarity
Renin inhibitors3142011300.290
HIV protease inhibitors715237500.198
Thrombin inhibitors371108030.180
Angiotensin II AT1 antagonists314329430.229
Substance P antagonists4273112460.149
5HT3 antagonist062337520.140
5HT reuptake inhibitors062453590.122
D2 antagonists077013950.138
5HT1A agonists062358270.133
Protein kinase C inhibitors783744530.120
Cyclooxygenase inhibitors783316360.108
Table 2. The MDDR-DS2 (structurally homogeneous) activity classes.
Table 2. The MDDR-DS2 (structurally homogeneous) activity classes.
Activity ClassActivity IndexActive MoleculesPairwise Similarity
Adenosine (A1) agonists077072070.229
Adenosine (A2) agonists077081560.305
Renin inhibitors3142011300.290
CCK agonists427101110.361
Monocyclic lactams β6410013460.336
Cephalosporins642001130.322
Carbacephems6422010510.269
Carbapenems645001260.260
Tribactams643503880.305
Vitamin D analogous757554550.386
Table 3. The MDDR-DS3 (structurally heterogeneous) activity classes.
Table 3. The MDDR-DS3 (structurally heterogeneous) activity classes.
Activity ClassActivity IndexActive MoleculesPairwise Similarity
Muscarinic (M1) agonists092499000.111
NMDA receptor antagonists1245514000.098
Nitric oxide synthase inhibitors124645050.102
Dopamine -hydroxylase inhibitors312811060.125
Aldose reductase inhibitors432109570.119
Reverse transcriptase inhibitors715227000.103
Aromatase inhibitors757216360.110
Cyclooxygenase inhibitors783316360.108
Phospholipase A2 inhibitors783486170.123
Lipoxygenase inhibitors7835121110.113
Table 4. MUV structure activity classes.
Table 4. MUV structure activity classes.
Activity ClassActivity IndexPairwise Similarity
S1P1 rec. (agonists)4660.117
Rho-Kinase2 (inhibitors)6440.122
SF1 (inhibitors)6000.123
Eph rec. A4 (inhibitors)6890.113
HIV RT-Rnase (inhibitors)6520.099
HSP 90 (inhibitors) 307120.106
SF1 (agonists)6920.114
ER-b-Coact. Bind. (inhibitors)7330.114
ER-a-Coact. Bind. (inhibitors)7130.113
FAK (inhibitors)8100.107
ER-a-Coact. Bind. (potentiators)7370.129
FXIa (inhibitors)8460.161
Cathepsin G (inhibitors)8320.151
D1 rec. (allosteric modulators)8580.111
FXIIa (inhibitors)8520.150
PKA (inhibitors)5480.128
M1 rec. (allosteric inhibitors)8590.126
Table 5. Top 1% retrieval recall for MDDR-DS1 (structurally homogeneous and heterogeneous) dataset for descriptor (ECFC 4).
Table 5. Top 1% retrieval recall for MDDR-DS1 (structurally homogeneous and heterogeneous) dataset for descriptor (ECFC 4).
DS1Previous StudiesPrevious in Our WorkProposed Methods
Retrieval
Result 1%
Activity Index
TANBINSQBSDBNSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
3142069.6974.0873.7374.2184.5884.1984.2283.8886.9488.28
7152325.9428.2626.8427.9759.4161.2560.5361.4859.0266.45
371109.6326.0524.7326.0352.8846.5652.9450.2846.4258.13
3143235.8239.2336.6639.7966.4164.1968.3368.1878.869.72
4273117.7721.6821.1723.0638.8842.6942.1942.8134.0255.45
623313.8714.0612.4919.2935.0323.8733.3635.6432.6647.93
62456.516.316.036.2710.686.7910.910.8519.0216.25
77018.6311.4511.3514.0516.9614.7814.8914.8629.8622.03
62359.7110.8410.1512.8715.3112.8216.3214.2926.9723.58
7837413.6914.2513.0817.4724.621.7824.6724.3125.5325.89
783317.176.035.929.938.585.948.698.4416.1214
Mean19.85722.93122.01424.63137.5734.9937.9137.7341.3944.34
Shaded cells0001220056
Table 6. Top 5% retrieval recall for MDDR-DS1 (structurally homogeneous and heterogeneous) dataset for descriptor (ECFC 4).
Table 6. Top 5% retrieval recall for MDDR-DS1 (structurally homogeneous and heterogeneous) dataset for descriptor (ECFC 4).
DS1Previous StudiesPrevious in Our WorkProposed Methods
Retrieval Result 5%
Activity Index
TANBINSQBSDBNSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
3142083.4987.6187.2289.0387.3590.8287.5587.5890.4394.06
7152348.9252.7248.765.1779.6179.4879.5180.6583.4486.44
3711021.0148.245.6241.257673.9176.5575.5681.7184.54
3143274.2977.5770.4479.8791.8393.8793.8192.2695.2395.02
4273129.6826.6324.3531.9257.5261.0661.262.5369.7477.27
623327.6823.4920.0429.3162.7653.5762.6764.2175.2380.2
624516.5414.8613.7221.0628.920.927.9428.5434.7939.35
770124.0927.7926.7328.4342.2538.3338.4339.8246.6849.65
623520.0623.7822.8127.8240.3635.9839.9337.351.6553.21
7837420.5120.219.5619.0948.2748.449.4447.7855.2757.82
7833116.211.811.3716.2125.0222.6527.2326.7735.0639.69
Mean34.7737.735.5140.8358.1756.2758.5758.4565.3868.84
Shaded cells00000000110
Table 7. Top 1% retrieval recall for MDDR-DS2 (structurally homogeneous) dataset for descriptor (ECFC 4).
Table 7. Top 1% retrieval recall for MDDR-DS2 (structurally homogeneous) dataset for descriptor (ECFC 4).
DS2Previous StudiesPrevious in Our WorkProposed Methods
Retrieval Result 1%
Activity Index
TANBINSQBSDBNSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
770761.8472.1872.0983.1993.2777.3288.291.883.4683.61
770847.039695.6894.8294.8489.9493.1694.992.3293.1
3142065.179.8278.5679.2776.9680.6680.0280.6478.9377.35
4271081.2776.2776.8274.8184.5580.5585.8284.7384.4584.91
6410080.3188.4387.893.6597.6389.3396.4795.6691.7393.22
6420053.8470.1870.1871.1678.6554.2665.8777.3555.9460.39
6422038.6468.3267.5868.7190.8187.9182.6290.5392.2992.34
6450030.5681.279.275.6271.9269.6868.5672.472.5673.2
6435080.1881.8981.6885.2187.3283.6685.2788.3484.9985.06
7575587.5698.0698.0296.5290.9589.6590.5390.9990.990.99
Mean62.63381.23580.76182.29686.6980.383.6586.7382.7683.42
Shaded cells0300311101
Table 8. Top 5% retrieval recall for MDDR-DS2 (structurally homogeneous) dataset for descriptor (ECFC 4).
Table 8. Top 5% retrieval recall for MDDR-DS2 (structurally homogeneous) dataset for descriptor (ECFC 4).
DS2Previous StudiesPrevious in Our WorkProposed Methods
Retrieval Result 5%
Activity Index
TANBINSQBSDBNSCNN1DSMLPHyper-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
770770.3974.8174.3773.995.8585.1789.994.8388.2987.17
770856.5899.6199.6198.2294.992.4593.7495.6193.4893.94
3142088.1995.4694.8895.6494.1294.4295.8295.7295.4994.89
4271088.0992.5591.0990.1285.6484.1887.8285.4586.9185.64
6410093.7599.2299.0399.0598.9394.2198.5898.3995.9296.41
6420077.6899.299.3893.7686.1961.487485.6163.4870.58
6422052.1991.3290.6296.0194.0792.6289.9394.0493.7294.28
6450044.894.9692.4891.5173.271.6871.0473.8473.7674.16
6435091.7191.4790.7886.9490.789.5892.2690.790.690.65
7575594.8298.3598.3791.690.9990.8690.8490.9990.9590.99
Mean75.8293.69593.06191.67590.4685.6788.3990.5287.2687.87
Shaded cells1431102000
Table 9. Top 1% retrieval recall for MDDR-DS3 (structurally heterogeneous) dataset for descriptor (ECFC 4).
Table 9. Top 1% retrieval recall for MDDR-DS3 (structurally heterogeneous) dataset for descriptor (ECFC 4).
Ds3 Retrieval Result 1%Previous StudiesPrevious in Our WorkProposed Methods
Activity Index
TANBINSQBSDBNSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
924912.1215.3310.9919.4738.0143.0638.5333441.1666645.1242.19
124556.579.377.0313.2914.2117.2216.2428618.3785623.0923.26
124648.178.456.9212.9125.9829.1326.7920831.9405837.3141.33
3128116.9518.2918.6723.6267.5266.5765.5238264.761966.7668.38
432106.277.346.8314.2329.3428.0828.7853628.3769636.0537.35
715223.754.086.5711.92128.719.0857110.98570614.715.43
7572117.3220.4120.3829.0852.1152.8350.4881851.7322854.4356.06
783316.317.516.1611.9312.4112.6512.5669412.12599414.9616.65
7834810.159.798.999.1713.8518.1815.99997413.51219214.3315.79
783519.8413.6812.518.1310.7114.3411.71564814.867316.6816.42
Mean9.74511.42510.50416.37527.6229.0827.5728.7832.3433.29
Shaded cells0001010017
Table 10. Top 5% retrieval recall for MDDR-DS3 (structurally heterogeneous) dataset for descriptor (ECFC 4).
Table 10. Top 5% retrieval recall for MDDR-DS3 (structurally heterogeneous) dataset for descriptor (ECFC 4).
Ds3 Retrieval Result 5%Previous StudiesPrevious in Our WorkProposed Methods
Activity Index
TANBINSQBSDBNSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
924924.1725.7217.831.6161.8468.262.4777871.5222275.0367.78
1245510.2914.6511.4216.2932.9738.5934.0541.5857246.9547.07
1246415.2216.5516.7920.946.1251.0147.7029856.3762260.9580.2
3128129.6228.2929.0536.1378.5774.7674.0952473.1428672.6788.57
4321016.0714.4114.1222.0954.4753.0852.0942456.5026262.0851.15
7152212.378.4413.8214.6829.1924.5723.6285830.9571437.2731.36
7572125.2130.0230.6141.0777.3180.9976.661481.4488283.4698.66
7833115.0112.0311.9717.1331.2931.1731.5275435.3228436.9342.36
7834824.6720.7621.1426.9331.8937.3335.5122233.4308638.4147.8
7835111.7112.9413.317.8730.1636.230.9573440.0284441.237.89
Mean18.4318.381824.4747.3849.5946.8752.0355.559.28
Shaded cells0000100046
Table 11. Top 1% retrieval recall for MUV dataset for descriptor (ECFC4).
Table 11. Top 1% retrieval recall for MUV dataset for descriptor (ECFC4).
MUV 1%Previous StudiesPrevious in Our WorkProposed Methods
Activity Index
TANBINSQBSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
4663.16.331.3866.674.334.004.003.00
5488.6214.8911.3813.3328.6718.0023.3326.0023.67
6003.796.335.525.3314.677.337.008.339.00
6447.59118.9715.3314.6720.3320.6723.6720.00
6522.7673.795.3312.004.338.678.678.33
6893.797.334.483.678.005.007.677.678.00
6920.695.331.3836.673.333.333.333.67
7124.148.225.1710.678.674.006.677.007.00
7133.15.892.764.676.003.674.675.334.33
7333.456.674.143.676.004.005.335.004.67
7372.415.111.726.337.333.004.005.004.67
8102.076.781.724.676.674.333.673.333.00
8326.5512.558.2821.3316.678.0016.0018.3317.00
8469.6613.1112.4126.3316.0013.3317.0018.0016.33
85212.4113.789.663318.0018.3323.0025.0021.00
8581.725.111.3837.333.674.003.333.67
8591.384.892.414.336.674.333.673.333.00
Mean4.548.255.0910.0011.227.619.5710.319.43
Shaded cells0204100010
Table 12. Top 5% retrieval recall for MUV dataset for descriptor (ECFC4).
Table 12. Top 5% retrieval recall for MUV dataset for descriptor (ECFC4).
MUV 5%Previous StudiesPrevious in Our WorkProposed Methods
Activity Index
TANBINSQBSCNN1DSMLPHybrid-D-Max 2Hybrid-D-Max 3Hybrid-F-SUMHybrid-F-Max
4665.8610.448.621112.0010.008.337.008.33
54822.7627.2224.143246.6737.3345.0049.6749.00
60011.3812.8916.219.6720.6715.0016.6717.6717.33
64417.5919.6717.9336.6725.3332.0037.6740.3340.33
6527.9311.679.669.3317.3311.0014.3313.6713.00
6899.6613.2211.721415.3310.3314.6719.0016.33
6924.839.224.83614.677.337.007.007.33
71210.3416.4511.0316.6714.008.3319.0017.0017.33
7137.2495.867.3312.009.3311.6711.3312.00
7338.9710.118.626.339.339.338.338.339.00
7378.28128.288.3312.007.008.337.337.00
8106.913.3311.036.6710.007.007.005.678.33
83213.120.4414.833224.6716.6722.6725.6724.67
84628.6226.1126.94736.6731.0040.0039.0036.33
85221.3823.112042.3334.6729.3336.0037.0034.00
8585.869.116.21514.007.678.677.679.00
8598.979.448.6211.6711.3311.6710.3312.0011.33
Mean11.7514.9112.6217.7619.4515.3118.5719.1418.86
Shaded cells030370141
Table 13. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS1, at top 1% and 5%.
Table 13. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS1, at top 1% and 5%.
DatasetRetrieval PercentageWPRank Methods
DS11%0.82148768.80 × 10−14Hybrid-F-Max9.55
Hybrid-F-SUM8.09
Hybrid-D-Max 27.55
Hybrid-D-Max 36.91
SCNN1D6.91
SMLPearly5.36
SDBN4.09
BIN3.18
TAN1.73
SQB1.64
5%0.85514651.91 × 10−14Hybrid-F-Max9.91
Hybrid-F-SUM9.00
Hybrid-D-Max 36.64
Hybrid-D-Max 26.64
SCNN1D6.36
SMLPearly5.82
SDBN3.91
BIN3.00
TAN2.18
SQB1.55
Table 14. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS2, at top 1% and 5%.
Table 14. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS2, at top 1% and 5%.
DatasetRetrieval PercentageWPRank Methods
DS21%0.36031551.68 × 10−4Hybrid-D-Max 37.95
SCNN1D7.20
Hybrid-D-Max 26.30
Hybrid-F-Max6.25
SDBN6.00
BIN5.75
Hybrid-F-SUM5.20
SQB4.85
SMLPearly4.10
TAN1.40
5%0.30821671.05 × 10−3BIN7.95
SQB7.25
SDBN6.90
Hybrid-D-Max 36.15
SCNN1D5.80
Hybrid-D-Max 25.30
Hybrid-F-Max5.15
Hybrid-F-SUM4.40
TAN3.50
SMLPearly2.60
Table 15. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS3, at top 1% and 5%.
Table 15. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SDBN, SCNN1D, and SMLP) using Kendall W test results for DS3, at top 1% and 5%.
DatasetRetrieval PercentageWPRank Methods
DS31%0.77890911.45 × 10−11Hybrid-F-Max9.40
Hybrid-F-Sum8.80
SMLPearly7.10
Hybrid-D-Max 36.30
SCNN1D6.10
Hybrid-D-Max 26.00
SDBN4.70
BIN3.00
SQB2.00
TAN1.60
5%0.86739393.91 × 10−13Hybrid-F-Sum9.00
Hybrid-F-Max8.90
Hybrid-D-Max 37.90
SMLPearly7.00
Hybrid-D-Max 26.10
SCNN1D6.10
SDBN4.00
SQB2.10
TAN2.00
BIN1.90
Table 16. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SCNN1D, and SMLP) using Kendall W test results for MUV, at top 1% and 5%.
Table 16. Ranking of hybrid Siamese similarity models based on (TAN, BIN, SQB, SCNN1D, and SMLP) using Kendall W test results for MUV, at top 1% and 5%.
DatasetRetrieval PercentageWPRank Methods
MUV1%0.55931663.01 × 10−13SMLP7.79
Hybrid-F-Sum6.35
BIN6.06
Hybrid-D-Max 35.82
Hybrid-F-Max5.56
SCNN1D5.47
Hybrid-D-Max 24.41
SQB2.00
TAN1.53
5%0.3626535.52 × 10−8SMLP7.21
Hybrid-F-Max6.26
Hybrid-F-Sum6.12
Hybrid-D-Max 35.91
BIN5.15
SCNN1D4.88
Hybrid-D-Max 24.41
SQB3.00
TAN2.06
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Altalib, M.K.; Salim, N. Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen. Biomolecules 2022, 12, 1719. https://doi.org/10.3390/biom12111719

AMA Style

Altalib MK, Salim N. Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen. Biomolecules. 2022; 12(11):1719. https://doi.org/10.3390/biom12111719

Chicago/Turabian Style

Altalib, Mohammed Khaldoon, and Naomie Salim. 2022. "Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen" Biomolecules 12, no. 11: 1719. https://doi.org/10.3390/biom12111719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop