# Exact Maximum Clique Algorithm for Different Graph Types Using Machine Learning

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

^{2})) be computed. The fraction of a graph on which looser upper bounds are used (O(NlogN)) is empirically estimated to be 0.025 for random graphs. Even though MCQD seems to progress quickly with a default value of Tlimit in many graphs, there are some graphs where Tlimit performs poorly [4]. In particular, the Tlimit parameter is suboptimal in some dense and synthetic graphs of the DIMACS benchmark [16]. Here, we present an improvement to the original MCQD algorithm that automatically determines the value of the Tlimit parameter for the MCQD algorithm. We predict that the Tlimit parameter uses machine learning for the input graph. The code used to perform the experiments is freely available at http://insilab.org/mcqd-ml (accessed on 9 November 2021).

#### 1.1. Problem Description and Notation

#### 1.2. Maximum Clique Dynamic (MCQD) Algorithm

Algorithm 1. Dynamic algorithm for maximum clique search. |

1: procedure MaxCliqueDyn(R, C, level) |

2: S[level] ← S[level] + S[level - 1] – Sold[level] |

3: Sold[level] ← S[level - 1] |

4: while R ≠ Ø do |

5: choose a vertex p with maximum C(p) (last vertex) from R |

6: R ← R \ {p} |

7: if |Q| + C[index_of_p_in_R] > |Qmax| then |

8: Q ← Q ∪ {p} |

9: if R ∩ Γ(p) ≠ Ø then |

10: if S[level] / ALL_STEPS < Tlimit then |

11: calculate the degrees of vertices in G(R ∩ Γ(p)) |

12: sort vertices in R ∩ Γ(p) in descending order with respect to their degrees |

13: ColorSort(R ∩ Γ(p), C’) |

14: S[level] ← S[level] + 1 |

15: ALL_STEPS ← ALL_STEPS + 1 |

16: MaxCliqueDyn(R ∩ Γ(p), C’, level + 1) |

17: else if |Q| > |Qmax| then |

18: Qmax ← Q |

19: Q ← Q \ {p} |

#### 1.3. Protein Product Graphs and Use of Molecular Docking Graphs in Drug Discovery

## 2. Overview of Graph Theory and Neural Networks Approaches

#### 2.1. Graphs Used for Training and Testing

#### 2.2. Molecular Docking Graphs

#### 2.3. Protein Product Graphs

#### 2.4. Small Protein Product Graphs

#### 2.5. Protocol for Machine Learning on Graphs

## 3. Materials and Methods

#### 3.1. Preparation of a Labeled Training Set

#### 3.2. Maximum Clique Dynamic Algorithm with Machine Learning (MCQD-ML)

#### 3.3. Evaluation of Possible Acceleration of the MCQD Algorithm

#### 3.4. Evaluation of the Effect of Machine Learning Models on Validation Sets

^{2}score on the validation set, which contains graphs from different domains. This value (also called coefficient of determination) is used in statistics to evaluate statistical models. Values of R

^{2}typically range from 0 to 1, with 1 being the best possible value. If the model predicts the mean of the data (constant value), the R

^{2}value is 0. The value can also be negative if the model does not perform as well as the mean of the data. The results of our evaluation are shown in Table 2.

^{2}value, with any machine learning model performing better than the standard MCQD parameter choice, which is nearly equal to 0. Thus, we expect the GAT model to perform the best, while the other models in the test set are not as fast. In the next section, we evaluate the models based on the time they take to find the maximum clique.

## 4. Results

#### 4.1. Dense Random Graphs

#### 4.2. Small Protein Product Graphs

#### 4.3. Protein Product Graphs

#### 4.4. Molecular Docking Graphs

#### 4.5. Weighted Molecular Docking Graphs

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Wu, Q.; Hao, J.-K. A review on algorithms for maximum clique problems. Eur. J. Oper. Res.
**2015**, 242, 693–709. [Google Scholar] [CrossRef] - Depolli, M.; J Konc, K.R.; Trobec, R.; Janezic, D. Exact parallel maximum clique algorithm for general and protein graphs. J. Chem. Inf. Model.
**2013**, 53, 2217–2228. [Google Scholar] [CrossRef] - Butenko, S.; Wilhelm, W.E. Clique-detection models in computational biochemistry and genomics. Eur. J. Oper. Res.
**2006**, 173, 1–17. [Google Scholar] [CrossRef] - Konc, J.; Janezic, D. An improved branch and bound algorithm for the maximum clique problem. Match Commun. Math. Comput. Chem.
**2007**, 58, 569–590. [Google Scholar] - Prates, M.; Avelar, P.H.; Lemos, H.; Lamb, L.C.; Vardi, M.Y. Learning to solve np-complete problems: A graph neural network for decision tsp. Proc. AAAI Conf. Artif. Intell.
**2019**, 33, 4731–4738. [Google Scholar] [CrossRef] - Walteros, J.L.; Buchanan, A. Why is maximum clique often easy in practice? Oper. Res.
**2020**, 68, 1625–1931. [Google Scholar] [CrossRef] - Li, C.-M.; Jiang, H.; Manyà, F. On minimization of the number of branches in branch-and-bound algorithms for the maximum clique problem. Comput. Oper. Res.
**2017**, 84, 1–15. [Google Scholar] [CrossRef] - Tomita, E.; Seki, T. An efficient branch-and-bound algorithm for finding a maxi-mum clique. In Proceedings of the International Conference on Discrete Mathematics and Theoretical Computer Science, Dijon, France, 7–12 July 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 278–289. [Google Scholar]
- Tomita, E.; Matsuzaki, S.; Nagao, A.; Ito, H.; Wakatsuki, M. A much faster algorithm for finding a maximum clique with computational experiments. J. Inf. Process.
**2017**, 25, 667–677. [Google Scholar] [CrossRef] [Green Version] - Carraghan, R.; Pardalos, P.M. An exact algorithm for the maximum clique problem. Oper. Res. Lett.
**1990**, 9, 375–382. [Google Scholar] [CrossRef] - Li, C.-M.; Quan, Z. An efficient branch-and-bound algorithm based on maxsat for the maximum clique problem. Proc. AAAI Conf. Artif. Intell.
**2010**, 24, 128–133. [Google Scholar] - Segundo, P.S.; Lopez, A.; Pardalos, P.M. A new exact maximum clique algorithm for large and massive sparse graphs. Comput. Oper. Res.
**2016**, 66, 81–94. [Google Scholar] [CrossRef] - Bengio, Y.; Lodi, A.; Prouvost, A. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res.
**2021**, 290, 405–421. [Google Scholar] [CrossRef] - Abe, K.; Xu, Z.; Sato, I.; Sugiyama, M. Solving np-hard problems on graphs with extended AlphaGO Zero. arXiv
**2019**, arXiv:1905.11623. [Google Scholar] - Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. arXiv
**2018**, arXiv:1812.08434. [Google Scholar] [CrossRef] - Johnson, D.S.; Trick, M.A. Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, 11–13 October 1993; American Mathematical Society: Boston, MA, USA, 1996. [Google Scholar]
- Konc, J.; Janežič, D. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics
**2010**, 26, 1160–1168. [Google Scholar] [CrossRef] [PubMed] - Fine, J.; Konc, J.; Samudrala, R.; Chopra, G. CANDOCK: Chemical atomic network-based hierarchical flexible docking algorithm using generalized statistical potentials. J. Chem. Inf. Model.
**2020**, 60, 1509–1527. [Google Scholar] [CrossRef] - Lešnik, S.; Konc, J. In silico laboratory: Tools for similarity-based drug discovery. In Targeting Enzymes for Pharmaceutical Development; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–28. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
- Shervashidze, N.; Vishwanathan, S.; Petri, T.; Mehlhorn, K.; Borgwardt, K. Efficient graphlet kernels for large graph comparison. Artif. Intell. Stat. PMLR
**2009**, 5, 488–495. [Google Scholar] - Shervashidze, N.; Schweitzer, P.; van Leeuwen, E.J.; Mehlhorn, K.; Borgwardt, K.M. Weisfeiler-lehman graph kernels. J. Mach. Learn. Res.
**2011**, 12, 2539–2561. [Google Scholar] - Leskovec, J. Stanford cs224w: Machine Learning with Graphs. Traditional Methods for Machine Learning in Graphs 2019. Available online: http://web.stanford.edu/class/cs224w/ (accessed on 1 May 2021).
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv
**2018**, arXiv:1810.00826. [Google Scholar] - Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; Song, L. Learning combinatorial optimization algorithms over graphs. Adv. Neural Inf. Process. Syst.
**2017**, 1704, 6348–6358. [Google Scholar] - Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv
**2016**, arXiv:1609.02907. [Google Scholar] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv
**2017**, arXiv:1706.03762. [Google Scholar] - Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv
**2016**, arXiv:1609.02907. [Google Scholar]

**Figure 2.**Time necessary for MCQD to find the maximum clique on a random graph with 150 nodes and density p = 0.7. The red line represents the default value (0.025) of the MCQD algorithm.

**Figure 3.**Time needed by the MCQD algorithm to find the maximum clique on different graphs independent of Tlimit. The blue line represents the mean time and the shaded area represents standard deviation over 20 runs of the MCQD algorithm.

**Figure 4.**Time that MCQD algorithm and each variant of the MCQD-ML algorithm needs to find the maximum clique on three different graphs from a test set of dense random graphs dependent on the Tlimit parameter.

**Figure 5.**Time needed by MCQD algorithm and each variant of the MCQD-ML algorithm to find a maximum clique on three different graphs from test set of protein product graphs dependent of Tlimit parameter.

**Figure 6.**Time that MCQD algorithm and each variant of the MCQD-ML algorithm need to find the maximum clique on three different graphs from a test set of molecular docking graphs dependent of Tlimit parameter.

ML Method | Description | Works on Graphs | Representative Power | References |
---|---|---|---|---|

XGBoost | Ensemble of gradient boosted trees. | No. Best for tabular data. | Works well on tabular data and extracted features of a graph. Results depend on the quality of features extracted. | [21,22] |

SVR-WL | Support vector machine with Weisfeiler–Lehman kernel | Yes. | Can distinguish non-iso-morphic graphs. | [24] |

GNN | Graph Neural Networks (GCN, GAT, GIN) | Yes. | Can distinguish most graphs and learn good representations. | [15,25,28,29,30] |

Model Name | R^{2} Score on Validation Set |
---|---|

MCQD | −0.02 |

XGB | 0.15 |

SVR-WL | 0.21 |

GCN | 0.42 |

GAT | 0.55 |

GIN | 0.16 |

**Table 3.**Times needed by algorithms to find the maximum clique for each graph in a test set of dense random graphs. Best times are in bold.

n | p | MCQD | XGB | GCN | GAT | GIN | SVR-WL |
---|---|---|---|---|---|---|---|

63 | 0.9944 | 0.0008 | 0.0007 | 0.0007 | 0.0007 | 0.0011 | 0.0007 |

113 | 0.9987 | 0.0024 | 0.0022 | 0.0023 | 0.0022 | 0.0028 | 0.0023 |

121 | 0.9955 | 0.0044 | 0.0042 | 0.0042 | 0.0041 | 0.0065 | 0.0047 |

175 | 0.9954 | 0.0171 | 0.0159 | 0.0157 | 0.0151 | 0.0194 | 0.0157 |

304 | 0.9911 | 8.8271 | 6.4638 | 7.305 | 6.2747 | 8.6368 | 9.3515 |

414 | 0.9943 | 2.3677 | 1.8574 | 1.7514 | 1.2559 | 5.0611 | 1.9631 |

443 | 0.9938 | 57.898 | 55.2395 | 66.4033 | 58.8421 | 428.473 | 265.817 |

475 | 0.9979 | 0.2406 | 0.2327 | 0.2413 | 0.2305 | 0.2336 | 0.2287 |

476 | 0.9977 | 0.3262 | 0.2695 | 0.3024 | 0.2906 | 0.2703 | 0.2652 |

524 | 0.9992 | 0.5042 | 0.438 | 0.466 | 0.4482 | 0.4341 | 0.4278 |

622 | 0.9981 | 0.6802 | 0.6225 | 0.6212 | 0.6082 | 0.6253 | 0.612 |

690 | 0.9978 | 326.052 | 1124.65 | 511.101 | 428.92 | 115.922 | −1.0000 |

828 | 0.9979 | 382.55 | 322.846 | 431.302 | 254.81 | −1.0000 | 1217.84 |

931 | 0.9995 | 1.98 | 1.7438 | 1.7799 | 1.7807 | 1.7584 | 1.7017 |

941 | 0.9988 | 25.4684 | 12.2125 | 22.7202 | 20.2954 | 12.3739 | 12.044 |

Speedup | 0.52 | 0.77 | 1.04 | 0.73 | 0.31 | ||

Average speedup | 1.14 | 1.04 | 1.18 | 1.09 | 1.05 |

**Table 4.**Times that algorithms need to find maximum clique for each graph from test set of small product graphs. Best times are in bold.

n | p | MCQD | XGB | GCN | GAT | GIN | SVR-WL |
---|---|---|---|---|---|---|---|

61 | 0.9792 | 0.0008 | 0.0008 | 0.0008 | 0.0007 | 0.0012 | 0.0008 |

138 | 0.9422 | 0.0079 | 0.0137 | 0.0078 | 0.0074 | 0.0102 | 0.0076 |

200 | 0.8581 | 0.0358 | 0.0398 | 0.0388 | 0.0381 | 0.0327 | 0.0393 |

271 | 0.9852 | 0.2062 | 0.2004 | 0.1972 | 0.1907 | 0.1831 | 0.1913 |

346 | 0.9091 | 2.3032 | 0.7774 | 2.8878 | 2.8278 | 14.3173 | 4.7920 |

451 | 0.9743 | 0.8956 | 0.8989 | 0.8955 | 0.8464 | 1.3257 | 1.3406 |

563 | 0.9800 | 1.7685 | 1.8496 | 1.7348 | 1.6936 | 1.7277 | 1.6994 |

655 | 0.9692 | 2.3652 | 2.3684 | 2.4533 | 2.6894 | 15.9674 | 15.8806 |

750 | 0.9625 | 4.7147 | 5.8504 | 4.2834 | 4.1741 | 8.0964 | 8.0182 |

905 | 0.9412 | 18.4683 | 16.2290 | 25.2455 | 18.5778 | −1.0000 | 283.5820 |

Speedup | 1.08 | 0.81 | 0.99 | 0.29 | 0.09 | ||

Average speedup | 1.13 | 0.96 | 1.02 | 0.69 | 0.70 |

**Table 5.**Times that algorithms need to find the maximum clique for each graph from a test set of full product graphs. Best times are in bold.

n | p | MCQD | XGB | GCN | GAT | GIN | SVR-WL |
---|---|---|---|---|---|---|---|

27,840 | 0.0069 | 9.8018 | 9.9147 | 10.2909 | 9.8759 | 10.9743 | 10.1547 |

36,841 | 0.0060 | 18.6482 | 19.7002 | 19.1695 | 19.0900 | 23.4188 | 19.9433 |

121,359 | 0.0024 | 198.5920 | 199.5000 | 199.4170 | 199.8520 | 378.1210 | 199.3480 |

Speedup | 0.99 | 0.99 | 0.99 | 0.55 | 0.99 | ||

Average speedup | 0.98 | 0.98 | 0.99 | 0.74 | 0.97 |

**Table 6.**Times that algorithms need to find maximum clique for each graph from test set of docking graphs. Best times are in bold.

n | p | MCQD | XGB | GCN | GAT | GIN | SVR-WL |
---|---|---|---|---|---|---|---|

345 | 0.1266 | 0.0025 | 0.0025 | 0.0026 | 0.0025 | 0.0026 | 0.0025 |

1779 | 0.1108 | 0.0940 | 0.0948 | 0.0943 | 0.0939 | 0.0952 | 0.0952 |

1851 | 0.1580 | 0.1606 | 0.1606 | 0.1829 | 0.1562 | 0.4394 | 0.1580 |

3233 | 0.1620 | 3.6941 | 3.4489 | 1.9791 | 1.9817 | 3.5981 | 1.9176 |

4211 | 0.0448 | 0.3889 | 0.3900 | 0.3990 | 0.3783 | 0.3967 | 0.3823 |

5293 | 0.1119 | 2.7147 | 6.0876 | 2.9925 | 2.6810 | 5.4606 | 2.7374 |

5309 | 0.1474 | 26.3695 | 19.1478 | 33.7628 | 7.7648 | 7.8752 | 7.5596 |

5735 | 0.0592 | 1.2673 | 1.2681 | 1.3803 | 1.2271 | 1.3196 | 1.2476 |

6294 | 0.1382 | 3.0941 | 15.8609 | 3.3343 | 3.0363 | 3.1517 | 3.0399 |

7211 | 0.1012 | 4.4230 | 11.2341 | 4.5631 | 4.2580 | 9.0498 | 4.3415 |

Speedup | 0.73 | 0.86 | 1.96 | 1.41 | 1.96 | ||

Average speedup | 0.84 | 1.01 | 1.34 | 1.10 | 1.34 |

**Table 7.**Times that the MCQD algorithm and the MCQD-ML (variant with the GAT model) algorithm need to find the maximum clique for each graph from a test set of weighted molecular docking graphs.

n | p | MCQD | GAT |
---|---|---|---|

345 | 0.1266 | 0.0049 | 0.0048 |

1779 | 0.1108 | 0.1188 | 0.1122 |

1851 | 0.1580 | 1.0636 | 1.0563 |

3233 | 0.1620 | 107.3550 | 106.3990 |

4211 | 0.0448 | 0.4950 | 0.4960 |

5293 | 0.1119 | 1.1551 | 1.1550 |

5309 | 0.1474 | 16.5672 | 16.4841 |

5735 | 0.0592 | 1.3452 | 1.3416 |

6294 | 0.1382 | 11.4437 | 10.8346 |

7211 | 0.1012 | 6.8934 | 6.8882 |

Speedup | 1.01 | ||

Average speedup | 1.01 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Reba, K.; Guid, M.; Rozman, K.; Janežič, D.; Konc, J.
Exact Maximum Clique Algorithm for Different Graph Types Using Machine Learning. *Mathematics* **2022**, *10*, 97.
https://doi.org/10.3390/math10010097

**AMA Style**

Reba K, Guid M, Rozman K, Janežič D, Konc J.
Exact Maximum Clique Algorithm for Different Graph Types Using Machine Learning. *Mathematics*. 2022; 10(1):97.
https://doi.org/10.3390/math10010097

**Chicago/Turabian Style**

Reba, Kristjan, Matej Guid, Kati Rozman, Dušanka Janežič, and Janez Konc.
2022. "Exact Maximum Clique Algorithm for Different Graph Types Using Machine Learning" *Mathematics* 10, no. 1: 97.
https://doi.org/10.3390/math10010097