Hybrid-Enhanced Siamese Similarity Models in Ligand-Based Virtual Screen

Information technology has become an integral aspect of the drug development process. The virtual screening process (VS) is a computational technique for screening chemical compounds in a reasonable amount of time and cost. The similarity search is one of the primary tasks in VS that estimates a molecule’s similarity. It is predicated on the idea that molecules with similar structures may also have similar activities. Many techniques for comparing the biological similarity between a target compound and each compound in the database have been established. Although the approaches have a strong performance, particularly when dealing with molecules with homogenous active structural, they are not enough good when dealing with structurally heterogeneous compounds. The previous works examined many deep learning methods in the enhanced Siamese similarity model and demonstrated that the Enhanced Siamese Multi-Layer Perceptron similarity model (SMLP) and the Siamese Convolutional Neural Network-one dimension similarity model (SCNN1D) have good outcomes when dealing with structurally heterogeneous molecules. To further improve the retrieval effectiveness of the similarity model, we incorporate the best two models in one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. Many designs of the hybrid models will be tested in this study. Several experiments on real-world data sets were conducted, and the findings demonstrated that the new approaches outperformed the previous method.


Introduction
Drug discovery often involves multiple stages, beginning with the identification of a biological target, followed by the parallel screening of thousands of compounds and, finally, the production of the new drug. This technique is time-consuming, costly, and plagued with numerous difficulties. Virtual screening (VS) is a drug discovery computational technique that searches libraries of molecules for structures that are most able to belong to a drug target at a reasonable cost and time. Virtual screening is classified into two types: structure-based approaches, such as ligand-protein docking, and ligand-based approaches, such as similarity searching, machine learning, and pharmacophore mapping [1][2][3][4][5]. Similarity searching is the most effective and one of the maximal broadly used equipment for ligand-based virtual screening because it requires only a bioactive molecule, or reference structure, as the point to begin a database search. The fundamental concept underlying similarity searching states that structurally compatible molecules will show off similar physicochemical and organic properties. A similarity search compares target structure characteristics with each structure's attributes in the database. The degree of resemblance of these two sets of features is used to measure the degree of closeness. Then, the database heterogeneous nature. The first study employed four methods of deep learning in Siamese architecture, which are: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Network-two dimensions (CNN-2D), and Convolutional Neural Network-one dimension (CNN-1D). The second study employed Multilayer Perceptron (MLP) in Siamese architecture [25,26]. This study continues to improve the effectiveness of similarity retrieval for molecules that have structurally heterogeneous by incorporating the two best previous models into one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. The following are the main contributions of this study:

•
The Siamese architecture for selected methods will be enhanced with three similarity measures to better improve the similarity measurements between molecules. • Incorporate many designs of a hybrid model from the selected two models. As mentioned before, each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. • Compared to previous approaches, the proposed strategy yielded promising results in terms of overall performance, particularly when dealing with heterogeneous classes of molecules.

Siamese Architecture
Two identical artificial neural networks made up a Siamese neural network, each capable of handling the input data, which must be coupled to a last layer via a distance layer to foresee whether the two vectors belong to the same class. Because all the weights and biases in the Siamese architecture are related, they are referred to as twins. Both networks are symmetric as a result of this. Through training, the two neural networks also utilize feedforward perceptron and error backpropagation. As a result, it has been used on more complicated data samples, such as heterogeneous data samples with different dimensions and type attributes [23,24]. This work is considered extensions of our previous work [25,26]. Figure 1 shows steps for incorporating two enhanced Siamese similarity models into one hybrid model. The main goal of this work is to improve the retrieval efficiency of molecular similarity searching, especially with structurally heterogeneous molecules, by incorporating two enhanced Siamese deep learning similarity models in one hybrid model. Thus, the steps for incorporating two enhanced Siamese similarity models in one hybrid model will be explained as follows: 1.
We select the best two enhanced Siamese deep learning models from the previous studies according to the Kendall W significance test for ranking the methods. The two best methods in MDDR-DS3 (structurally heterogeneous) are the Siamese multi-layer perceptron similarity model (SMLP) and Siamese convolutional neural network -one dimension (SCNN1D) similarity model.

2.
The Siamese architecture in the selected models will be enhanced with three similarity measures. The reason for this is to further improve the measurements of similarity between molecules in the hybrid model. 3.
Incorporate the selected best two models in a hybrid model. Since each model gives good results in some classes, combining them in one hybrid model may improve the retrieval recall.

4.
Testing many designs of hybrid models by using different types of data fusion to select the best hybrid model that will give good results of the recall metric when using with structurally heterogeneous molecules dataset. We select the best two enhanced Siamese deep learning models from the previous phase according to the Kendall W significance test for ranking the methods.

Hybrid Siamese Similarity Model Using Decision Fusion
The first design of the hybrid similarity model combines the two selected models from the previous studies. The first model is the SMLP similarity model, and the second is the SCNN1D similarity model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. The architecture of the SMLP consists of two twin neural networks, each of which has only one layer with 1024 neurons. The weights have also been linked in this architecture so that Network1 = Network2. The first network reads the fingerprint from the query, and the second reads the fingerprint from the database. The output of each network is a features vector with a fixed length (here, 1024 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 1024. The formula of this measure is [27]: where S AB is the similarity measure, f A is the feature victor of network-1, and f B is the feature vector of network-2. The second similarity measure is the exponential Manhattan distance [28]. The output of this measure is one value. The formula of the exponential Manhattan is: where E AB is the exponential Manhattan distance, f A is the feature vector of network1, and f B is the feature vector of network2. The fusion layer has added the value of the second similarity measure with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the following layers, which contain 1024, 512, 256, 128, and 64 neurons, respectively. Each neuron is connected with all neurons in the previous layer. The ReLU activation function has been used for all layers. The Siamese architecture ends with the output layer, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function. The architecture of the SCNN1D similarity model consists of two twin neural networks, each of which has two layers of convolution neural network (1D-CNN). The layers are made up of 64 filters with a kernel size of 3; the activation function is a rectified linear unit (ReLU), and the maximum pooling size is 2. Then, there comes a flattened layer, followed by a dense layer with a sigmoid activation function. In this design, the weights are connected so that CNN1D-1 = CNN1D-2. The output of each network is a features vector with a fixed length (here, 512 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 512. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The fusion layer has added the value of the second similarity measure with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the end layer, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function. The SMLP similarity model's output (similarity score) will be fused with the output of the SCNN1D similarity model by using decision fusion (maximum). Figure 2 shows the details of the design of the hybrid Siamese similarity model with two similarity measures using the decision fusion.

Hybrid Siamese Similarity Model with Three Similarity Measures Using Decision Fusion
The second design of the hybrid similarity model same as the first design of the hybrid similarity model except using three similarity measures instead of two similarity measures. The second design of the hybrid similarity model is the same as the first design of the hybrid similarity model, except using three similarity measures instead of two similarity measures. The first is the SMLP similarity model, and the second is the SCNN1D similarity model. The third similarity measure will be added for each of them. The reason for that is it further improves the measurements between molecules. The Jaccard similarity measure will be added to the SMLP similarity model as the third similarity measure. The Russel similarity measure will be added to the SCNN1D similarity model as the third similarity measure. The selection for these measures is based on experiments. The formula of the Jaccard similarity measure [29]: where the features of the query's molecular is fi A , the features of the dataset's molecular is fi B , and N is the number of features in the vector. The formula of the Russel similarity measure [29] is: where fi A is the features of the query's molecular, fi B is the features of the dataset's molecular, and n is the number of features. The SMLP similarity model's output (similarity score) will be fused with the output of the SCNN1D similarity model by using the decision fusion (maximum). Figure 3 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using the decision fusion.

Hybrid Siamese Similarity Model with Three Similarity Measures Using Feature Fusion Summation
The third design of the hybrid similarity model combines the two selected models: the first is the SMLP similarity model, and the second is the SCNN1D similarity model. The architecture here of the SMLP consists of two twin neural networks, each of which has only one layer with 1024 neurons. The weights have also been linked in this architecture so that Network1 = Network2. The first network reads the fingerprint from the query, and the second reads the fingerprint from the database. The output of each network is a features vector with a fixed length (here, 1024 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 1024. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The two formulas were covered in Section 2.2. The third similarity measure is the Jaccard measure. The formula was covered in Section 2.3. The output of this measure is one value. The feature fusion layer has added the value of the third similarity measure with the value of the second similarity measure. Then, the result has added with each value of the vector of the first similarity measure. The output of the fusion layer is passed to the following layers, which contain 1024 512 neurons, respectively. Each neuron is connected with all neurons in the previous layer. The ReLU activation function has been used for all layers. The output of this model is a features vector with a length of 512.
The architecture of the SCNN1D similarity model consists of two twin neural networks, each of which has two layers of the convolution neural network (CNN1D). The layers are made up of 64 filters with a kernel size of 3; the activation function is a rectified linear unit (ReLU), and the maximum pooling size is 2. Then comes a flattened layer, followed by a dense layer with a sigmoid activation function. In this design, the weights are connected so that CNN1D-1 = CNN1D-2. The output of each network is a features vector with a fixed length (here, 512 features). The first similarity measure is the absolute difference between the two feature vectors, and the output of this measure is the vector of length 512. The second similarity measure is the exponential Manhattan distance. The output of this measure is one value. The third similarity measure is that the Russel similarity measure will be added to the SCNN1D similarity model as the third similarity measure. The feature fusion layer has added the value of the third similarity measure with the value of the second similarity measure. Then, the result has added with each value of the vector of the first similarity measure. The output of the fusion layer is the feature vector, which is considered as the output of second model. Figure 4 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using feature fusion summation. The SMLP similarity model's output (feature vector 512 bit) will be fused with the output of the SCNN1D similarity model (feature vector 512 bit) by using feature fusion (sum). The result of this layer will be passed to the last layer of the hybrid model, which contains the active sigmoid function that gives the similarity score if 1 means complete similarity and if 0 means complete dissimilarity. Moreover, the RMSprop optimizer has been used, and the binary_crossentropy has been used as a loss function.

Hybrid Siamese Similarity Model with Three Similarity Measures Using Feature Fusion Maximum
The fourth design of the hybrid similarity model is similar to the previous design (third design), except using maximum operation instead of sum operation in the feature fusion between the SMLP similarity model and the SCNN1D similarity model in the hybrid model. Figure 5 shows the details of the design of the hybrid Siamese similarity model with three similarity measures using the feature fusion maximum.

Datasets
Here, we evaluate the search methods for similarity by using MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) data sets, which are the most common [30,31]. The MDDR datasets have been used by our research group and previous studies [3,4,6,9,10,[19][20][21][22]25,26,[32][33][34][35][36][37][38][39][40]. All molecules have been translated to ECFC-4 fingerprint by the Pipeline Pilot software, and our study community has used these databases. Ten reference structures were chosen randomly from each activity class. The MDDR contains three types of data sets, which are: MDDR-DS1: This consists of 102,516 molecules divided into activity and inactivity groups. The activity molecules are split into 11 categories, with some having structurally homogeneous active elements and others having structurally heterogeneous active elements. Table 1 shows the activity classes of molecules in DS1.

2.
MDDR-DS2: This contains 102,516 molecules divided into activity and inactivity groups. The activity molecules are split into 10 groups of homogeneous activity classes activity molecules. Table 2 shows the activity classes of molecules in DS2.

3.
This contains 102,516 molecules divided into activity and inactivity groups; the activity molecules are split into 10 groups of heterogeneous activity classes activity molecules. Table 3 shows the activity classes of molecules in DS3.  Rohrer and Baumann documented the data gathering (MUV), as observed in Table 4. This data collection contains 17 interaction groups, with each class including up to 30 active and 15,000 inactive molecules. Our research team has utilized these data collections in prior papers.

Performance Evaluation Measures
The effectiveness of the proposed approaches is assessed as follows: 1.
Here, the whole dataset is separated into K equal-sized sets, one of which is designated as a test set and the rest sets as training sets. The test set is changed in each iteration, and the final result is determined as the average of the recall values from all iterations. As observed in Figure 7, this procedure is known as k-fold cross-validation. Each iteration tests 10 questions chosen at random from the activity class, and the mean value of these ten searches is determined.

2.
Comparison Methods: The second strategy is to look at existing approaches that could be used to evaluate the outcomes of the proposed models and that use the same datasets. Among these approaches are the following: A. Tanimoto similarity coefficient (TAN): for many years, TAN has served as the LBVS search benchmark technique. Tanimoto-based similarity models use the Tanimoto coefficient in its continuous version for the ECFC-4 descriptor [15]. B.
Quantum similarity search (SQB); the third method is SQB, which utilizes a quantum mechanics approach for the ECFC-4 descriptor [9]. D.
Enhanced Siamese Convolutional Neural network-one dimension (SCNN-1D) [26] and Enhanced Siamese Multilayer perception (SMLP) [25], which are compared with them before and after being combined into one hybrid model.

3.
The Kendall W is the third significant measure that may be used to evaluate the suggested procedures, often known as the significance test. This significance test has already been utilized in prior research [3,4,9,10,[19][20][21][22]25,26,29,[32][33][34][36][37][38][39][40][41][42]. This test can be construed as a measure of rater agreement. In the Kendall W test, each case represents a judge or rater, and each variable represents an object or person being rated. The number of rankings is computed for each variable. The Kendall W test range is between (0), indicating no agreement, and (1) indicating full agreement. For example, the rank rij by judge number j, which represents an activity class, where there are n objects and m judges in total, is given to object i as the similarity search method. It is then possible to calculate the total rank given to object i as [43]: Whereas the complete ranks' mean meaning is: Squared deviation sum δ is defined as: Then, the Kendall W test is defined as: This test demonstrates whether a group of judges can make equivalent decisions about the rating of a set of items or not. The definitions used in this analysis suggest that judges were considered to be the behavior groups of each of the data sets, whereas the recall rates of the different search models were considered to be the items. The outcomes of the Kendall coefficient that are related to significance levels are a significant part of this experiment. This implies verifying whether the value of the coefficient may have happened by chance or not. If the value was important (for which both 0.01 and 0.05 cut-off values were used), it was then possible to assign the item an overall ranking.

4.
For a more evident comparison between the recall values of the proposed methods and the recall values of the previous methods, the improvement percentage for each proposed method will be calculated. The improvement percentage formula is [44].
where the Recall method1 represented the recall value of the first method, and Recall method2 represented the recall value of second method.

Results and Discussion
The ECFC-4 descriptor's experimental outcomes on the MDDRDS1, MDDR-DS2, MDDR-DS3, and MUV data sets are provided in Tables 5-12, respectively, using cut-offs 1 and 5%. In addition, the results of the proposed Hybrid Siamese Similarity Models are recorded in these tables compared to the benchmark Tanimoto Similarity Coefficient (TAN) and previous studies, which are Bayesian inference (BIN), quantum similarity search (SQB), Stack of deep belief networks (SDBN) in MDDR datasets only, and two of our proposal methods of Siamese architecture in previous studies, which are the SMLP similarity model and SCNN1D similarity model. The hybrid Siamese similarity model with decision fusion using two similarity measures is here called the Hybrid-D-Max2. The hybrid Siamese similarity model with decision fusion using three similarity measures is here called the Hybrid-D-Max3. The hybrid Siamese similarity model with feature fusion summation is here called the Hybrid-F-Sum. The hybrid Siamese similarity model with feature fusion max is here called the Hybrid-F-Max. Each row in the tables lists recall values for the top 1% and 5% of the activity class, and in each row, the best recall rate is shaded. In the tables, the mean row relates to the average of all activity classes, and the row of shaded cells is the total number of shaded cells with the top values for each technique over the full range of activity classes. The first column of the table represents the activity classes of the dataset. This is followed by four columns that represent the previous studies: TAN, BIN, SQR, and SDBN, and this is followed by two columns that represent the two proposed Siamese in previous studies: SMLP, and SCNN1D. It is then followed by four columns representing the proposed hybrid models in this study. Figures 8-15 show the contrast among methods for the average recall percentage of successful compound retrieval at the top of the 1% and 5% in MDDRDS1, MDDRDS2, MDDRDS3, and MUV, respectively.
The results presented in MDDR-DS1 (structurally homogeneous and heterogeneous) recall values for the 1 and 5% cut-offs recorded in Tables 5 and 6 showed that the proposed hybrids of Siamese similarity models were obviously superior to the benchmark studies: TAN, BIN, SQB, SDBN, and previous two selected proposed methods: Siamese SMLP and SCNN1D. In addition, among other hybrid Siamese similarity models, the Hyper model (Hybrid-F-Max) gives the best retrieval recall results in Table 5 Figure 8 compares methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS1. By comparison, the Hybrid-F-Max proposed method gives the best retrieval recall results in Table 6 Figure 9 compares methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS1.    Furthermore, the MDDR-DS2 (structurally homogeneous) recall values recorded for the top 1% in Table 7 show that some of the proposed Siamese similarity models' proposed hybrids are superior to the benchmark TAN method and previous studies. The Hyper Siamese with the decision fusion max model with three similarity measures Hybrid-D-Max 3 gives the best retrieval recall results in Table 7 Figure 10 shows the comparison among methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS2. However, the MDDR-DS2 recall values recorded for 5% cut-offs in Table 8 show that the BIN method gave the best retrieval recall results in view of the mean and the number of shaded cells. The second best is SQB, followed by SDBN,  Figure 11 shows the comparison among methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS2.    Moreover, the MDDR-DS3 (structurally heterogeneous) recall values for the 1 and 5% cut-offs recorded in Tables 9 and 10 demonstrated that the proposed hybrids Siamese similarity models were superior to the benchmark TAN method and other studies: TAN, BIN, SQB, SDBN, and previous two selected proposed methods: Siamese MLP and CNN1D. In addition, among other hybrid Siamese similarity models, the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) gives the best retrieval recall results in Table 9 in view of the mean and the number of  Figure 12 compares methods for the average recall percentage of successful compound retrieval at the top 1% in MDDR-DS3. By comparison, the Hybrid-F-Max proposed method gives the best retrieval recall results in Table 10 Figure 13 compares methods for the average recall percentage of successful compound retrieval at the top 5% in MDDR-DS3.   Furthermore, the MUV recall values for the 1% cut-offs recorded in Table 11 demonstrated that some proposed hybrid Siamese similarity models were superior to the benchmark TAN method and other studies: TAN, BIN, SQB, and previous selected proposed method SCNN1D, except the SMLP method. In addition, among other hybrid Siamese similarity models, the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) gives the best retrieval recall results in Table 11 Figure 14 compares the methods for the average recall percentage of successful compound retrieval at the top 1% MUV. By comparison, the Hybrid-F-Sum proposed method gives the best recall results in Table 12 Figure 15 compares the methods for the average recall percentage of the successful compound retrieval at the top 1% in MUV.   The second method that can be used to evaluate the proposed methods is the significance test. The Kendall W is the significance test that will be used in this study. Moreover, Tables 13-16 show the ranking of the hybrid Siamese similarity models (Hybrid-F-Max, Hybrid-F-Sum, Hybrid-D-Max3, Hybrid-D-Max 2) based on previous studies TAN, BIN, SQB, Siamese MLP, and CNN1D, using Kendall W test results for MDDR-DS1, MDDR-DS2, MDDR-DS3, and MUV at the top 1% and top 5%. For all of the data sets used, the Kendall W test of the top 1% shows that the significance test (P) values are less than 0.05; this means that the hybrid-enhanced Siamese similarity models are significant in all cases with the top 1%. Therefore, the general ranking of all methods indicates that the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) and the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) are superior to other methods and have the top rank in MDDR-DS1 (homogeneous and heterogeneous) and MDDR-DS3 (structurally heterogeneous). In MDDR-DS2 (structurally homogeneous), the hyper Siamese with the decision fusion max model with three similarities (Hybrid-D-Max3) has the top rank among other methods. In the MUV dataset, the Hyper Siamese with the Feature fusion Sum model (Hybrid-F-Sum) has the top rank among other methods except the SMLP method.
It is the same as with the results of the Kendall W test of the top 5%. The results indicate that significance test (P) values are less than 0.05. This means that the hybrid Siamese similarity models are significant in all cases with the top 5%. As a result, the general ranking of all methods indicates that the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max) and the Hyper Siamese with Feature fusion Sum model (Hybrid-F-Sum) are superior to other methods and have the top rank in the MDDR-DS1(homogeneous and heterogeneous) and MDDR-DS3 (structurally heterogeneous). In DS2, BIN has the top rank in the top 5% and in the MUV dataset, the SMLP method has the top rank among other methods, and then the Hyper Siamese with Feature fusion Max model (Hybrid-F-Max). Figures 16 and 17 show the ranking of the hybrid Siamese similarity models (Hybrid-D-Max2, Hybrid-D-Max3, Hybrid-F-Sum, Hybrid-F-Max) methods based on TAN, BIN, SQB, SDBN, Siamese MLP, and CNN1D using Kendall W test results for MDDR-DS1, MDDR-DS2, MDDR-DS3, and MUV in the top 1% and 5%, respectively.  Lastly, according to the experiment results, the success of the proposed methods comes from: (1) The Siamese network, which is used for more complicated data samples, especially with heterogeneous data samples, and it is possible to employ deep learning methods with Siamese architecture, which deals efficiently with the vast volume of information stored in databases. (2) Enhancing the Siamese architecture with several similarity measures because each similarity measure focused on different properties, so, when used together, they lead to an improvement in the recall metric. (3) Incorporate the two selected models in one hybrid model because each method provides good results in some classes, so combining them in one hybrid model improved the retrieval recall. The two designs of hybrid models, which used feature data fusion (Hybrid-F-Max and the Hybrid-F-Sum), gave good results compared with the other two designs of hybrid models, which used decision data fusion (Hybrid-D-Max3 and Hybrid-D-Max2) because the first two designs worked on the features, which are enhanced by using the sum and max operation, and then led to improvements in the recall metric. In comparison, the other two designs of hybrid models worked only on selecting the max results between the methods (SMLP, SCNN1D) in their hybrid designs.
Besides that, the proposed methods have good results in MDDR-DS3, MDDR-DS1, and MUV because they contain heterogeneous molecule classes. In MDDR-DS2, the proposed methods did not achieve a higher score than other traditional methods (TAN, BIN, SQB, SDBN) because the dataset has only structurally homogeneous molecules classes. However, some proposed methods have achieved better results at the top 1% only compared with traditional methods.

Conclusions
Many techniques for capturing the biological similarity between a test compound and a known target ligand in LBVS have been established. The similarity search is one of the primary tasks in VS that estimates a molecule's similarity. It is predicated on the idea that molecules with similar structures may also have similar activities. In spite of the good performance of the methods, especially when dealing with molecules that have homogeneous active structural elements, they are not good enough when dealing with structurally heterogeneous molecules. The previous works examined many deep learning methods in the enhanced Siamese similarity model. According to Kendall W's significant test, the best two methods in MDDR-DS3 (structurally heterogeneous) are the SMLP similarity model and the SCNN1D similarity model. To further improve the retrieval effectiveness of the similarity model, we incorporate the best two models in one hybrid model. The reason is that each method gives good results in some classes, so combining them in one hybrid model may improve the retrieval recall. Many designs of the hybrid models have been tested in this study. The overall results of all methods indicate that the Hybrid-F-Max method and the Hybrid-F-Sum method are superior to previous studies in DS1 and DS3 and have the top ranks among other methods at the top 1 and 5%, while the Hybrid-D-Max3, Hybrid-D-Max2, and Hybrid-F-Max are superior to previous studies in DS2 at the top 1%. In MUV, SMLP has the top rank, then Hybrid-F-Sum in the top 1%, and Hybrid-F-Max, Hybrid-F-Max, and Hybrid-D-Max3. The future work of this study is to reduce the size of the hybrid-enhanced Siamese similarity model by pruning the less significant weights.