# A Machine Learning Approach to Algorithm Selection for Exact Computation of Treewidth

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. The Importance of Treewidth

#### 1.2. Algorithm Selection

#### 1.3. Our Contribution

## 2. Preliminaries

#### 2.1. Tree Decompositions and Treewidth

- The union of all bags ${X}_{i}$ equals V, i.e., every vertex of the original graph G is in at least one bag in the tree decomposition.
- For every edge $(u,v)$ in G, there exists a bag X that contains both u and v, i.e., both endpoints of every edge in the original graph G can be found together in at least one bag in the tree decomposition.
- If two bags ${X}_{i}$ and ${X}_{j}$ both contain a vertex v, then every bag on the path between ${X}_{i}$ and ${X}_{j}$ also contains v.

#### 2.2. Algorithm Selection

- the set of algorithms A,
- the instances of the problem, also known as the problem space P,
- measurable characteristics (features) of a problem instance, known as the feature space F,
- the performance space Y.

## 3. Methodology

#### 3.1. Features

- 1:
**number of nodes:**v- 2:
**number of edges:**e- 3,4:
**ratio:**$\frac{v}{e}$, $\frac{e}{v}$- 5:
**density:**$\frac{2e}{v(v-1)}$- 6–13:
**degree statistics:**min, max, mean, median, ${Q}_{0.25}$, ${Q}_{0.75}$, variation coefficient, entropy.

**q1**(first quartile),

**q3**(third quartile) and

**variation**in the rest of this paper. We note that there are some mathematical correlations in the above features, particularly between $v/e$ and $e/v$. We keep both to align with Reference [37], where both features were also used but also because we wish to defer the question of feature correlations and significance to our post-facto feature analysis later in the article. There we observe that leaving out exactly one of $e/v$ and $v/e$ in any case does not improve performance. Other correlations in the features are more indirect and occur via more complex mathematical transformations. There is no guarantee that the classifiers we use can easily `learn’ these complex correlations, which justifies inclusion. It is also useful to keep all 13 features to illustrate that careful pre-processing of features, which can be quite a complex process, is not strictly necessary to obtain a powerful hybrid algorithm.

#### 3.2. Treewidth Algorithms

**tdlib**, by**Lukas Larisch and Felix Salfelder**, referred to as**tdlib**. This is an implementation of the algorithm that Tamaki proposed [14] for the 2016 iteration of the PACE challenge [15], which itself builds on the algorithm by Arnborg et al. [8]. Implementation available at github.com/freetdi/p17**Exact Treewidth**, by**Hisao Tamaki and Hiromu Ohtsuka**, referred to as**tamaki**. Also an implementation of Tamaki’s algorithm [14]. Implementation available at github.com/TCS-Meiji/PACE2017-TrackA**Jdrasil**, by**Max Bannach, Sebastian Berndt and Thorsten Ehlers**, referred to as**Jdrasil**[41]. Implementation available at github.com/maxbannach/Jdrasil

#### 3.3. Machine Learning Algorithms

#### 3.4. Reflections on the Choice of Machine Learning Model

## 4. Experiments and Results

#### 4.1. Datasets

- PACE 2017 treewidth exact competition instances, referred to as
**ex**. Available at github.com/PACE-challenge/Treewidth-PACE-2017-instances - PACE 2017 bonus instances, referred to as
**bonus**. Available at github.com/PACE-challenge/Treewidth-PACE-2017-bonus-instances - Named graphs, referred to as
**named**. (These are graphs with special names, originally extracted from the SAGE graphs database). Available at github.com/freetdi/named-graphs - Control flow graphs, referred to as
**cfg**. Available at github.com/freetdi/CFGs - PACE 2017 treewidth heuristic competition instances, referred to as
**he**. Available at github.com/PACE-challenge/Treewidth-PACE-2017-instances - UAI 2014 Probabilistic Inference Competition instances, referred to as
**uai**. Available at github.com/PACE-challenge/UAI-2014-competition-graphs - SAT competition graphs, referred to as
**sat_sr15**. Available at people.mmci.uni-saarland.de/ hdell/pace17/SAT-competition-gaifman.tar - Transit graphs, referred to as
**transit**. Available at github.com/daajoe/transit_graphs - PACE 2016 treewidth instances [15], referred to as
**pace2016**. Available at bit.ly/pace16-tw-instances-20160307 - PACE 2019 Vertex Cover challenge instances*, referred to as
**vcPACE**. Available at pacechallenge.org/files/pace2019-vc-exact-public-v2.tar.bz2 - DIMACS Maximum Clique benchmark instances*, referred to as
**dimacsMC**. Available at iridia.ulb.ac.be/~fmascia/maximum_clique/DIMACS-benchmark

#### 4.2. Experimental Setup

**sat_sr15**,

**he**and

**dimacsMC**(Section 4.1) and were therefore too big. For instance,

**all**graphs of the sat_sr15 dataset were too hard for any solver to terminate. For an even more extreme example, graphs 195 through 200 of the PACE 2017 Heuristic Competition dataset were all too big for our feature extractor to load them into memory, with graph 195 being the smallest of them at 1.3 million vertices and graph 200 being the biggest at 15.5 million. These graphs are three orders of magnitude larger than the graphs the implementations were built to work on—the biggest graph from the PACE 2017 Exact Competition dataset is graph number 198, which has 3104 vertices. Other errors included unusual yet trivial cases like the graph collect-neighbor_collect_neighbor_init.gr from the dataset

**cfg**, which only contained one vertex and which broke assumptions of both our feature extractor and the solvers we used. All such entries were discarded from the dataset. An additional 680 entries were discarded because no solver managed to obtain a solution on them in the given time limit—presumably those problem instances were too hard. The resulting dataset contained a total of 29,498 instances.

**tdlib**was labeled the best algorithm for about 99% (29,234) of instances;

**Jdrasil**—for under 0.2% (55); and

**tamaki**—for about 0.7% (209). Under these circumstances, an algorithm selection approach could trivially achieve 99% accuracy by simply always selecting

**tdlib**. To create genuine added value above this trivial baseline, we imposed additional rules in order to re-balance the dataset towards graphs that are neither too easy, nor too hard; we call these graphs of moderate difficulty. Graphs were considered

**not**of moderate difficulty if

**either**all algorithms found a solution quicker than some lower bound (i.e., the graph is too easy),

**or**all algorithms failed to find a solution within the allotted time (i.e., the graph is too hard). The reasoning behind this approach is that if algorithms’ run times lie outside of the defined moderate area, there is little gains to be made with algorithm selection anyway. If a graph is too easy, a comparatively weak algorithm can still solve it quickly; if a graph is too hard, there simply is no “correct" algorithm to select. Formally, if $lb$ is the lower bound, $ub$ is the allotted time (upper bound) and $rt(A,G)$ is the run time of algorithm A on graph G, then if $(\forall A:lb\ge rt(A,G\left)\right)\vee (\forall A:ub\le rt(A,G\left)\right)$ holds, G will be excluded.

#### 4.3. Experimental Results

- Victories. A `victory’ is defined as being (or selecting, in the case of the hybrid algorithm or the oracle algorithm) the fastest algorithm for a certain graph.
- Total run time on the entire dataset.
- Terminations. A `termination’ is defined as successfully solving the given problem instance within the given time. No regard is given to the run-time, the only thing that matters is whether the algorithm managed to find a solution at all.

#### 4.3.1. Dataset A

**tdlib**) was fastest for 898. The hybrid algorithm’s run time was 87,500 s, while the overall fastest solver (

**tamaki**) required 137,447 s and the perfect algorithm—54,810 s). The hybrid algorithm terminated on 1145 out of the 1162 graphs, whereas the best solver (

**tamaki**) terminated on 1115.

#### 4.3.2. Dataset B

**tdlib**) was fastest for 192. The hybrid algorithm’s run time was 82,085 s, while the overall fastest solver (

**tamaki**) required 135,935 s and the perfect algorithm—54,466 s. The hybrid algorithm terminated on 413 out of the 427 graphs, whereas the best solver (

**tamaki**) terminated on 380.

#### 4.3.3. Dataset C

**tamaki**) was fastest for 168. The hybrid algorithm’s run time was 80,697 s, while the overall fastest solver (

**tamaki**) required 134,910 s and the perfect algorithm—54,060 s. The hybrid algorithm terminated on 323 out of the 337 graphs, whereas the best solver (

**tamaki**) terminated on 290.

#### 4.3.4. Dataset B—Decision Tree

#### 4.3.5. Dataset B—Principal Component Analysis

**73%**compared to the baseline model’s accuracy of

**76.5%**. While the loss of accuracy is small, our drop-column feature analysis in Section 5 shows many sets of three features that can be used to build a model of similar or higher accuracy.

## 5. Analysis

**tdlib**and

**tamaki**) have nearly equal results and the third solver (

**Jdrasil**) is still best for a significant number of problem instances, unlike in Dataset A. This guarantees that the hybrid algorithm’s task is the hardest, as the trivial `winner-takes-all’ approach would be the least effective.

**76.5%**. In order to obtain more statistically robust results, we made multiple runs of 10-fold cross-validation and kept their scores.

**variation**and

**minimum degree**, as well as

**variation**and

**entropy**.

**74.5%**and was achieved by removing

**minimum degree**,

**maximum degree**and

**variation**. Of note is also the fact that 13 of 14 sets contained at least two of these same features and one set contained only one of them.

**minimum degree**emerged as a clear winner, having

**72%**accuracy. We highlight the fact that by removing three features at a time, we only managed to lower the accuracy to about

**74.5%**, while a model with only one feature successfully reached

**72%**.

**variation**and/or

**maximum degree**.

**variation**,

**minimum degree**and

**maximum degree**, adding all three of them produced results that were around the middle of the pack at

**74.5%**. However, a large majority of the best results contained at least one or more often two of those features.

**78.5%**by only adding

**mean**,

**variation**and

**maximum degree**—compared to

**76.5%**for the original model. Naturally, we view such results with caution. They could well be the result of chance, especially seeing as how we use no validation set—however, another possibility is that the classifiers can be more efficiently trained on smaller subsets of features.

**variation**,

**minimum degree**and

**maximum degree**appear in our analysis indicates that they carry some critically important signal that the classifier needs in order to be accurate. However, it appears that one or two of the features are sufficient to reproduce the signal and adding the third one does not help much.

**tdlib**—after that split alone, selecting

**tdlib**would be the correct choice about 70% of the time. On the right side of the split, a similar situation is observed with

**tamaki**being the correct choice about 72% of the time. This shows that the first split, which sends to the left graphs where the first quartile of the degree distribution is smaller than or equal to 4.5, is very important for solvers’ performance. The first quartile being low is an indicator for a small or sparse graph and according to the model,

**tdlib**’s performance on those is better, while

**tamaki**seems to cope better with larger or denser graphs.

**Jdrasil**being the correct label, both of which require that the variation coefficient feature is higher than a certain threshold, which is quite high (0.603 and 0.866). This leads us to believe that

**Jdrasil**performs well on graphs where there is significant variation in the degree of all vertices. One of these paths also requires the graph to have more than 2184 edges or otherwise

**tdlib**is selected instead. This again indicates that

**tdlib**is better at solving smaller or sparser graphs, while

**Jdrasil**can deal with variability in larger graphs too.

**tdlib**also seems to excel on graphs where the degree distribution has a low first quartile and the minimum degree is low. This also confirms our belief that

**tdlib**excels on smaller and sparser graphs.

## 6. Discussion

**degree variation coefficient**, was only about 3.5 times more important than the least important feature,

**median degree**. The overall distribution of feature importances is such that the most important five features together account for about 50% of the importance, while the remaining eight account for the rest. Our interpretation of these results is that they show all features being significant contributors and while there are features that are more important than others, there are no clearly dominant features that eliminate the need for others.

**variation**,

**minimum degree**and

**maximum degree**—all seem to be related in that removing all of them significantly reduces performance and performance increases as more of them are added, until all three are added, which does not provide a significant improvement in performance. Our interpretation is that there is a predictive signal that is present only in those three features but any two of them are sufficient to reproduce it. Besides this insight, feature removal provided little that we could interpret and that was not in direct conflict with other parts of the same analysis.

**q1**), which could be an indicator for graph size or density, was by far the most predictive feature in our Decision Tree model. To make discussing this easier, we will separate the concepts characteristic and feature. Characteristics are a more general characteristic that can be represented by many different features; features are specifically the numbers we described in Section 3.1.

**variation**,

**minimum degree**and

**maximum degree**together carry an important signal—these features can be considered a proxy for size, density and variability. The impurity reduction results are also consistent with this, as they showed

**variation**,

**density**and

**minimum degree**being the three most important features. The relative lack of importance stemming from which proxy is used for these characteristics of a graph is also demonstrated by

**q1**—while that feature is by far the most important in the Decision Tree analysis, the other two analytical approaches did not show it being particularly important, as it was only seventh out of thirteen in impurity reduction and did not make even one appearance in the sets of important features that the feature removal analysis yielded.

**tdlib**is dominant on `easy’ graphs. In the unfiltered dataset,

**tdlib**was the best algorithm for 97% of graphs which became progressively less with a higher lower bound being imposed on difficulty. At the lower bound of 30 s (Dataset A),

**tdlib**was the best algorithm for 77% of graphs; at 10 s (Dataset B) that number went further down to 45%; and at 30 s (Dataset C) it was only 38%. Notably,

**tdlib**kept the “most victories" title in the unfiltered dataset, Dataset A and Dataset B; however, in Dataset C,

**tamaki**dethroned it with 49% versus 38%. Undeniably, going from a 97% dominance to no longer being the best algorithm as difficulty increases tells us something about the strengths and weaknesses of the solver. This is also confirmed by our analysis of the Decision Tree model, which clearly showed

**tdlib**had an aptitude for smaller and sparser graphs.

**tamaki**’s robustness. It is the best solver in terms of terminations

**and**run time on all three datasets, despite not being the best in terms of victories on datasets A and B. Most interesting is

**tamaki**’s time performance on Dataset A—while

**tdlib**has more than four times as many victories on that dataset as

**tamaki**does,

**tamaki**’s run time is still about 30% better than

**tdlib**’s. Our analysis of the Decision Tree model showed

**tamaki**having an affinity for larger or denser graphs, complementing

**tdlib**’s strength on smaller or sparser graphs. The weakness of

**tamaki**on sparse graphs that we discover is consistent with the findings of the solver’s creator [14].

**Jdrasil**seemed to have a tighter niche than the other solvers—specifically, larger graphs with a lot of variability in their vertices’ degree. However,

**Jdrasil**clearly struggles on most graphs, as evidenced by its always coming in last in our experiments on all datasets and all performance metrics.

**tdlib**—low density and size;

**tamaki**—high density and size, low variability;

**Jdrasil**—high density and size, high variability.

## 7. Conclusions and Future Work

#### 7.1. Conclusions

#### 7.2. Future Work

**tdlib, tamaki, Jdrasil**) to try to shed more light on why certain (graph, algorithm) pairings are superior to others and to analyse how different features of the graph contribute to the running times of the algorithms. It is highly speculative but perhaps such an analysis could help to identify new parameters which can then be analysed using the formal machinery of parameterized complexity. In this article we chose emphatically not to “open the black box” in this way, preferring instead to see how far a simple machine learning framework could succeed using generic graph features and without in-depth, algorithm-specific knowledge. Nevertheless, such an analysis would be a valuable contribution to the algorithm selection literature. (Beyond treewidth, it would be interesting to explore whether our simple machine learning framework can be effective, without extensive modification, in the computation of other NP-hard graph parameters.)

**tdlib**ecosystem, which is available at github.com/freetdi/tdlib), as it would provide numerous opportunities and benefits—for example, selecting not just solvers but also pre-processors and kernelisation algorithms for treewidth (see Reference [51] and the references therein) and using the results from pre-processors as features for the solver selection, among others. Such a framework could also incorporate advances in computing treewidth using parallel processing [52].

#### 7.3. The Best of Both Worlds

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Diestel, R. Graph Theory (Graduate Texts in Mathematics); Springer: New York, NY, USA, 2005. [Google Scholar]
- Bodlaender, H.L. A tourist guide through treewidth. Acta Cybern.
**1994**, 11, 1. [Google Scholar] - Bodlaender, H.L.; Koster, A.M. Combinatorial optimization on graphs of bounded treewidth. Comput. J.
**2008**, 51, 255–269. [Google Scholar] [CrossRef] - Cygan, M.; Fomin, F.V.; Kowalik, L.; Lokshtanov, D.; Marx, D.; Pilipczuk, M.; Pilipczuk, M.; Saurabh, S. Parameterized Algorithms; Springer: Cham, Switzerland, 2015; Volume 4. [Google Scholar]
- Bannach, M.; Berndt, S. Positive-Instance Driven Dynamic Programming for Graph Searching. arXiv
**2019**, arXiv:1905.01134. [Google Scholar] - Hammer, S.; Wang, W.; Will, S.; Ponty, Y. Fixed-parameter tractable sampling for RNA design with multiple target structures. BMC Bioinform.
**2019**, 20, 209. [Google Scholar] [CrossRef] [PubMed] - Bienstock, D.; Ozbay, N. Tree-width and the Sherali–Adams operator. Discret. Optim.
**2004**, 1, 13–21. [Google Scholar] [CrossRef] - Arnborg, S.; Corneil, D.G.; Proskurowski, A. Complexity of finding embeddings in a k-tree. SIAM J. Algeb. Discret. Methods
**1987**, 8, 277–284. [Google Scholar] [CrossRef] - Strasser, B. Computing Tree Decompositions with FlowCutter: PACE 2017 Submission. arXiv
**2017**, arXiv:1709.08949. [Google Scholar] - Van Wersch, R.; Kelk, S. ToTo: An open database for computation, storage and retrieval of tree decompositions. Discret. Appl. Math.
**2017**, 217, 389–393. [Google Scholar] [CrossRef][Green Version] - Bodlaender, H. A Linear-Time Algorithm for Finding Tree-Decompositions of Small Treewidth. SIAM J. Comput.
**1996**, 25, 1305–1317. [Google Scholar] [CrossRef] - Bodlaender, H.L.; Fomin, F.V.; Koster, A.M.; Kratsch, D.; Thilikos, D.M. On exact algorithms for treewidth. ACM Trans. Algorithms (TALG)
**2012**, 9, 12. [Google Scholar] [CrossRef] - Gogate, V.; Dechter, R. A complete anytime algorithm for treewidth. In Proceedings of the 20th conference on Uncertainty in artificial intelligence, UAI 2004, Banff, AB, Canada, 7–11 July 2004; AUAI Press: Arlington, VA, USA, 2004; pp. 201–208. [Google Scholar]
- Tamaki, H. Positive-instance driven dynamic programming for treewidth. J. Comb. Optim.
**2019**, 37, 1283–1311. [Google Scholar] [CrossRef] - Dell, H.; Husfeldt, T.; Jansen, B.M.; Kaski, P.; Komusiewicz, C.; Rosamond, F.A. The first parameterized algorithms and computational experiments challenge. In Proceedings of the 11th International Symposium on Parameterized and Exact Computation (IPEC 2016), Aarhus, Denmark, 24–26 August 2016; Schloss Dagstuhl-Leibniz-Zentrum für Informatik: Wadern, Germany, 2017. [Google Scholar]
- Dell, H.; Komusiewicz, C.; Talmon, N.; Weller, M. The PACE 2017 Parameterized Algorithms and Computational Experiments Challenge: The Second Iteration. In Proceedings of the 12th International Symposium on Parameterized and Exact Computation (IPEC 2017), Leibniz International Proceedings in Informatics (LIPIcs), Vienna, Austria, 6–8 September 2017; Lokshtanov, D., Nishimura, N., Eds.; Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik: Dagstuhl, Germany, 2018; Volume 89, pp. 1–12. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science
**2015**, 349, 255–260. [Google Scholar] [CrossRef] - Hutter, F.; Hoos, H.H.; Leyton-Brown, K. Automated configuration of mixed integer programming solvers. In Proceedings of the International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, Thessaloniki, Greece, 4–7 June 2019; Springer: Cham, Switzerland, 2010; pp. 186–202. [Google Scholar]
- Kruber, M.; Lübbecke, M.E.; Parmentier, A. Learning when to use a decomposition. In Proceedings of the International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, Padova, Italy, 5–8 June 2017; Springer: Cham, Switzerland, 2017; pp. 202–210. [Google Scholar]
- Tang, Y.; Agrawal, S.; Faenza, Y. Reinforcement Learning for Integer Programming: Learning to Cut. arXiv
**2019**, arXiv:1906.04859. [Google Scholar] - Smith-Miles, K.; Lopes, L. Measuring instance difficulty for combinatorial optimization problems. Comput. Oper. Res.
**2012**, 39, 875–889. [Google Scholar] [CrossRef] - Hutter, F.; Xu, L.; Hoos, H.H.; Leyton-Brown, K. Algorithm runtime prediction: Methods & evaluation. Artif. Intell.
**2014**, 206, 79–111. [Google Scholar] - Leyton-Brown, K.; Hoos, H.H.; Hutter, F.; Xu, L. Understanding the empirical hardness of NP-complete problems. Commun. ACM
**2014**, 57, 98–107. [Google Scholar] [CrossRef] - Lodi, A.; Zarpellon, G. On learning and branching: A survey. Top
**2017**, 25, 207–236. [Google Scholar] [CrossRef] - Alvarez, A.M.; Louveaux, Q.; Wehenkel, L. A machine learning-based approximation of strong branching. INFORMS J. Comput.
**2017**, 29, 185–195. [Google Scholar] [CrossRef] - Balcan, M.F.; Dick, T.; Sandholm, T.; Vitercik, E. Learning to branch. arXiv
**2018**, arXiv:1803.10150. [Google Scholar] - Bengio, Y.; Lodi, A.; Prouvost, A. Machine Learning for Combinatorial Optimization: A Methodological Tour d’Horizon. arXiv
**2018**, arXiv:1811.06128. [Google Scholar] - Fischetti, M.; Fraccaro, M. Machine learning meets mathematical optimization to predict the optimal production of offshore wind parks. Comput. Oper. Res.
**2019**, 106, 289–297. [Google Scholar] [CrossRef][Green Version] - Sarkar, S.; Vinay, S.; Raj, R.; Maiti, J.; Mitra, P. Application of optimized machine learning techniques for prediction of occupational accidents. Comput. Oper. Res.
**2019**, 106, 210–224. [Google Scholar] [CrossRef] - Nalepa, J.; Blocho, M. Adaptive guided ejection search for pickup and delivery with time windows. J. Intell. Fuzzy Syst.
**2017**, 32, 1547–1559. [Google Scholar] [CrossRef] - Rice, J.R. The algorithm selection problem. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 1976; Volume 15, pp. 65–118. [Google Scholar]
- Leyton-Brown, K.; Nudelman, E.; Andrew, G.; McFadden, J.; Shoham, Y. A portfolio approach to algorithm selection. In Proceedings of the IJCAI, Acapulco, Mexico, 9–15 August 2003; Volume 3, pp. 1542–1543. [Google Scholar]
- Nudelman, E.; Leyton-Brown, K.; Devkar, A.; Shoham, Y.; Hoos, H. Satzilla: An algorithm portfolio for SAT. Available online: http://www.cs.ubc.ca/~kevinlb/pub.php?u=SATzilla04.pdf (accessed on 12 July 2019).
- Xu, L.; Hutter, F.; Hoos, H.H.; Leyton-Brown, K. SATzilla: Portfolio-based algorithm selection for SAT. J. Artif. Intell. Res.
**2008**, 32, 565–606. [Google Scholar] [CrossRef] - Ali, S.; Smith, K.A. On learning algorithm selection for classification. Appl. Soft Comput.
**2006**, 6, 119–138. [Google Scholar] [CrossRef] - Guo, H.; Hsu, W.H. A machine learning approach to algorithm selection for NP-hard optimization problems: A case study on the MPE problem. Ann. Oper. Res.
**2007**, 156, 61–82. [Google Scholar] [CrossRef] - Musliu, N.; Schwengerer, M. Algorithm selection for the graph coloring problem. In Proceedings of the International Conference on Learning and Intelligent Optimization 2013 (LION 2013), Catania, Italy, 7–11 January 2013; pp. 389–403. [Google Scholar]
- Xu, L.; Hutter, F.; Hoos, H.H.; Leyton-Brown, K. Hydra-MIP: Automated algorithm configuration and selection for mixed integer programming. In Proceedings of the RCRA Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion at the International Joint Conference on Artificial Intelligence (IJCAI), Paris, France, 16–20 January 2011; pp. 16–30. [Google Scholar]
- Kerschke, P.; Hoos, H.H.; Neumann, F.; Trautmann, H. Automated algorithm selection: Survey and perspectives. Evol. Comput.
**2019**, 27, 3–45. [Google Scholar] [CrossRef] - Abseher, M.; Musliu, N.; Woltran, S. Improving the efficiency of dynamic programming on tree decompositions via machine learning. J. Artif. Intell. Res.
**2017**, 58, 829–858. [Google Scholar] [CrossRef] - Bannach, M.; Berndt, S.; Ehlers, T. Jdrasil: A modular library for computing tree decompositions. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA 2017), London, UK, 21–23 June 2017; Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik: Wadern, Germany, 2017. [Google Scholar]
- Kotsiantis, S.B. Decision trees: A recent overview. Artif. Intell. Rev.
**2013**, 39, 261–283. [Google Scholar] [CrossRef] - Li, R.H.; Belford, G.G. Instability of decision tree classification algorithms. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; ACM: New York, NY, USA, 2002; pp. 570–575. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News
**2002**, 2, 18–22. [Google Scholar] - Bertsimas, D.; Dunn, J. Optimal classification trees. Mach. Learn.
**2017**, 106, 1039–1082. [Google Scholar] [CrossRef] - Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Ásgeirsson, E.I.; Stein, C. Divide-and-conquer approximation algorithm for vertex cover. SIAM J. Discret. Math.
**2009**, 23, 1261–1280. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Tsamardinos, I.; Rakhshani, A.; Lagani, V. Performance-estimation properties of cross-validation-based protocols with simultaneous hyper-parameter optimization. Int. J. Artif. Intell. Tools
**2015**, 24, 1540023. [Google Scholar] [CrossRef] - Smith-Miles, K.A. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. (CSUR)
**2009**, 41, 6. [Google Scholar] [CrossRef] - Bodlaender, H.L.; Jansen, B.M.; Kratsch, S. Preprocessing for treewidth: A combinatorial analysis through kernelization. SIAM J. Discret. Math.
**2013**, 27, 2108–2142. [Google Scholar] [CrossRef] - Van Der Zanden, T.C.; Bodlaender, H.L. Computing Treewidth on the GPU. arXiv
**2017**, arXiv:1709.09990. [Google Scholar]

**Figure 1.**Experimental results using Leave-One-Out cross-validation. Each chart presents the results from the experiment on one of the three datasets (A, B and C) based on one of the three performance metrics (victories, total run time and terminations). For the Oracle and Hybrid algorithms, results are not presented as an aggregate of all solver choices—instead, each solver’s contribution to each performance metric is presented separately (see Section 4.2 and Section 4.3). Total running times are measured in seconds.

**Figure 2.**Feature importance for the Random Forest model trained on Dataset B (Section 4.2). We refer the reader to Section 3.1 for the list of features used. Each bar indicates the fraction of impurity in the dataset that is removed by splits on the relevant feature in all trees of the model.

**Figure 3.**The CART Decision Tree trained on Dataset B. Each rectangle represents a node in the tree. The first line in each internal node indicates the condition according to which that node splits instances. In leaf nodes, this line is omitted, as no split is executed there. Each internal node’s left child pertains to the subset of instances that met the condition, whereas the right child pertains to those that did not meet the condition. The next line describes the node’s Gini impurity, which is the probability that a randomly chosen element from the set would be incorrectly labeled if its label was randomly drawn from the distribution of labels in the subset. The next line shows how many graphs are in the subset that the node represents, while the following line shows how many instances of each class are in that subset, with the three values corresponding to

**Jdrasil**,

**tamaki**and

**tdlib**, respectively. The last line shows the class which is the most represented in the subset, which is also the class that the Decision Tree would assign to all instances in that subset.

**Table 1.**The total number of graphs per source dataset and the number of those graphs in filtered datasets A, B and C (see Section 4.1 and Section 4.2 for clarification).

Datasets | Unfiltered | A | B | C |
---|---|---|---|---|

bonus | 100 | 36 | 36 | 35 |

cfg | 1797 | 43 | 1 | 0 |

ex | 200 | 200 | 172 | 137 |

he | 200 | 26 | 17 | 17 |

named | 150 | 47 | 14 | 11 |

pace2016 | 145 | 95 | 16 | 13 |

toto | 27,123 | 594 | 92 | 53 |

transit | 19 | 10 | 4 | 4 |

uai | 133 | 27 | 14 | 43 |

vc | 178 | 58 | 42 | 38 |

vcPACE | 100 | 6 | 6 | 6 |

dimacsMC | 80 | 20 | 13 | 10 |

sat_sr15 | 115 | 0 | 0 | 0 |

Total | 30,340 | 1162 | 427 | 337 |

**Table 2.**Number of graphs, within each dataset, for which the given algorithm was fastest (see Section 4.1 and Section 4.2).

Unfiltered | Dataset A | Dataset B | Dataset C | |
---|---|---|---|---|

Jdrasil | 55 | 55 | 49 | 42 |

tamaki | 209 | 209 | 186 | 168 |

tdlib | 29,234 | 898 | 192 | 127 |

**Table 3.**The individual components in the Principal Component Analysis. Each row refers to one of the three components. The leftmost column shows how much of the total variance of the data is explained by that component; all other columns show how the respective feature is factored into the component.

VE | v | e | v/e | e/v | Density | q1 | Median | q3 | Minimum | Mean | Maximum | Variation | Entropy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0.58 | −0.01 | 0.33 | −0.17 | 0.36 | 0.30 | 0.36 | 0.36 | 0.36 | 0.32 | 0.36 | 0.00 | −0.02 | 0.12 |

0.25 | 0.55 | 0.09 | 0.15 | 0.00 | −0.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.55 | 0.55 | 0.24 |

0.08 | −0.14 | 0.21 | 0.63 | 0.05 | −0.44 | 0.07 | 0.06 | 0.04 | 0.02 | 0.05 | −0.18 | −0.17 | 0.52 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Slavchev, B.; Masliankova, E.; Kelk, S. A Machine Learning Approach to Algorithm Selection for Exact Computation of Treewidth. *Algorithms* **2019**, *12*, 200.
https://doi.org/10.3390/a12100200

**AMA Style**

Slavchev B, Masliankova E, Kelk S. A Machine Learning Approach to Algorithm Selection for Exact Computation of Treewidth. *Algorithms*. 2019; 12(10):200.
https://doi.org/10.3390/a12100200

**Chicago/Turabian Style**

Slavchev, Borislav, Evelina Masliankova, and Steven Kelk. 2019. "A Machine Learning Approach to Algorithm Selection for Exact Computation of Treewidth" *Algorithms* 12, no. 10: 200.
https://doi.org/10.3390/a12100200