# Solving the Examination Timetabling Problem in GPUs

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Description

Dataset | Examinations | Students | Periods | Conflict Density |
---|---|---|---|---|

car-f-92 I | 543 | 18,419 | 32 | 0.14 |

car-s-91 I | 682 | 16,925 | 35 | 0.13 |

ear-f-83 I | 190 | 1125 | 24 | 0.27 |

hec-s-92 I | 81 | 2823 | 18 | 0.42 |

kfu-s-93 I | 461 | 5349 | 20 | 0.06 |

lse-f-91 I | 381 | 2726 | 18 | 0.06 |

pur-s-93 I | 2419 | 30,032 | 42 | 0.03 |

rye-s-93 I | 486 | 11,483 | 23 | 0.07 |

sta-f-83 I | 139 | 611 | 13 | 0.14 |

tre-s-92 I | 261 | 4360 | 23 | 0.18 |

uta-s-92 I | 622 | 21,266 | 35 | 0.13 |

ute-s-92 I | 184 | 2749 | 10 | 0.08 |

yor-f-83 I | 181 | 941 | 21 | 0.29 |

## 3. Related Work

## 4. The GPU Architecture and CUDA Programming Model

## 5. Methodology

#### 5.1. Encoding and Representation

**Figure 1.**Direct encoding: single-point crossover between two chromosomes after the second gene. E1–E5: examinations; T1–T3: timeslots.

**Figure 2.**Indirect encoding: single-point crossover between two chromosomes after the first gene. Examinations in bold red are missing or appear twice. E1–E5: examinations; T1–T3: timeslots.

#### 5.2. Initial Construction of Solutions

Algorithm 1: Construction of the initial solutions. |

#### 5.3. Evaluation

Algorithm 2: Evaluation, chromosome-threaded approach: each thread evaluates one chromosome of the population. |

Algorithm 3: Evaluation, examination-threaded approach: each thread evaluates one gene of the population. |

#### 5.3.1. Weight Factor

#### 5.3.2. Exploiting Sparsity

**Figure 3.**(

**a**) Conflict graph for the problem described in the half-compressed conflict matrix figure; (

**b**) half-compressed conflict matrix structure (the dat array contains tuples of the exam in conflict (white) and the conflicting number of students (gray)).

#### 5.4. Selection and Reproduction

Algorithm 4: Selection and reproduction, chromosome-threaded approach: each thread produces two new chromosomes. |

Algorithm 5: Selection and reproduction, examination-threaded approach (one kernel implementation): each thread produces one gene for each of the two new chromosomes. |

Algorithm 6: Selection, examination-threaded approach (two kernels implementation): each thread selects the parents, the cut point and creates a random number. |

$chromosome\leftarrow blockDim.x*blockIdx.x+threadIdx.x$; $get\phantom{\rule{4pt}{0ex}}the\phantom{\rule{4pt}{0ex}}random\phantom{\rule{4pt}{0ex}}seed\phantom{\rule{4pt}{0ex}}of\phantom{\rule{4pt}{0ex}}the\phantom{\rule{4pt}{0ex}}chromosome$; $p1\leftarrow the\phantom{\rule{4pt}{0ex}}chromosome\phantom{\rule{4pt}{0ex}}with\phantom{\rule{4pt}{0ex}}the\phantom{\rule{4pt}{0ex}}lower\phantom{\rule{4pt}{0ex}}cost\phantom{\rule{4pt}{0ex}}of\phantom{\rule{4pt}{0ex}}pop/\phantom{\rule{4pt}{0ex}}32\phantom{\rule{4pt}{0ex}}randomly\phantom{\rule{4pt}{0ex}}selected\phantom{\rule{4pt}{0ex}}chromosomes$; $p2\leftarrow the\phantom{\rule{4pt}{0ex}}chromosome\phantom{\rule{4pt}{0ex}}with\phantom{\rule{4pt}{0ex}}the\phantom{\rule{4pt}{0ex}}lower\phantom{\rule{4pt}{0ex}}cost\phantom{\rule{4pt}{0ex}}of\phantom{\rule{4pt}{0ex}}pop/\phantom{\rule{4pt}{0ex}}32\phantom{\rule{4pt}{0ex}}randomly\phantom{\rule{4pt}{0ex}}selected\phantom{\rule{4pt}{0ex}}chromosomes$; $cross\_prob\leftarrow random\phantom{\rule{4pt}{0ex}}number$; $cut\_point\leftarrow random\_number\phantom{\rule{4pt}{0ex}}in\phantom{\rule{4pt}{0ex}}[1,\phantom{\rule{4pt}{0ex}}(n\_courses-1\left)\right]$; $p1\_d\left[chromosome\right]\leftarrow p1$; $p2\_d\left[chromosome\right]\leftarrow p2$; $cross\_prob\_d\left[chromosome\right]\leftarrow cross\_prob$; $cut\_point\_d\left[chromosome\right]\leftarrow cut\_point$; |

Algorithm 7: Reproduction, examination-threaded approach (two kernels implementation): each thread uses the previous selected chromosomes and implements the crossover operation. |

#### 5.5. Mutation

Algorithm 8: Mutation, chromosome-threaded approach: each thread mutates the genes in a single chromosome, according to a probability. |

Algorithm 9: Mutation, examination-threaded approach: each thread mutates a single gene, according to a probability |

#### 5.6. Termination Criterion

#### 5.7. Greedy Steepest Descent Algorithm

Algorithm 10: Greedy steepest descent algorithm: every thread calculates the cost of an examination for a specific time slot. |

## 6. Experimental Results

#### 6.1. Speedups

#### 6.1.1. Evaluation

**Figure 4.**Speedup in the evaluation stage for the examination-threaded approach with the use of the compressed sparse row (CSR) conflict matrix.

#### 6.1.2. Reproduction

**Figure 5.**Speedup in the reproduction stage for the examination-threaded approach with the separation of the selection and reproduction stages in different kernels.

#### 6.1.3. Mutation

#### 6.1.4. Greedy Steepest Descent Algorithm

#### 6.1.5. Total Speedup

Dataset | pop = 256 | pop = 2048 | pop = 8192 | pop = 16,384 |
---|---|---|---|---|

car-s-91 | 4.37 | 4.54 | 21.97 | 23.96 |

hec-s-92 | 3.26 | 4.73 | 25.372 | 26.17 |

lse-f-91 | 3.09 | 4.32 | 22.54 | 23.25 |

pur-s-93 | 4.34 | 4.31 | 22.11 | 21.37 |

#### 6.2. Compressed Sparse Row Format

**Table 3.**GPU evaluation execution time and speedup between the full conflict matrix and the compressed sparse row format implementation.

Dataset | Full Conflict Matrix (ms) | CSR Conflict Matrix (ms) | Speedup | Conflict Density |
---|---|---|---|---|

car-s-91 | 20.80 | 3.97 | 5.24 | 0.13 |

hec-s-92 | 0.12 | 0.09 | 1.29 | 0.42 |

lse-f-91 | 5.52 | 0.41 | 13.44 | 0.06 |

pur-s-93 | 377.09 | 11.50 | 32.78 | 0.03 |

#### 6.3. Improvement with the Greedy Steepest Descent Algorithm

**Table 4.**Average distance (%) from the best reported solution in the bibliography with the genetic algorithm.

Dataset | pop = 256 | pop = 2048 | pop = 16,384 |
---|---|---|---|

car-f-92 | 89.68 | 89.41 | 58.46 |

car-s-91 | 110.67 | 110.67 | 80.40 |

ear-f-83 | 74.85 | 64.78 | 56.94 |

hec-s-92 | 91.06 | 80.73 | 74.10 |

kfu-s-93 | 91.80 | 58.66 | 38.87 |

lse-f-91 | 113.42 | 70.32 | 59.98 |

pur-s-93 | INF | INF | INF |

rye-s-93 | 153.7 | 149.91 | 94.54 |

sta-f-83 | 19.43 | 19.43 | 19.23 |

tre-s-92 | 57.04 | 48.06 | 42.17 |

uta-s-92 | 81.56 | 81.56 | 65.40 |

ute-s-92 | 35.83 | 25.88 | 19.85 |

yor-f-83 | 41.41 | 37.12 | 30.41 |

**Table 5.**Average distance (%) from the best reported solution in the bibliography with the hybrid evolutionary algorithm.

Dataset | pop = 256 | pop = 2048 | pop = 16,384 |
---|---|---|---|

car-f-92 | 42.21 | 29.40 | 26.77 |

car-s-91 | 44.84 | 30.27 | 24.95 |

ear-f-83 | 43.16 | 33.77 | 28.88 |

hec-s-92 | 30.87 | 27.02 | 20.78 |

kfu-s-93 | 24.17 | 16.15 | 12.53 |

lse-f-91 | 43.20 | 29.03 | 27.43 |

pur-s-93 | 263.7 | 153.23 | 126.06 |

rye-s-93 | 54.01 | 36.38 | 33.36 |

sta-f-83 | 16.83 | 16.69 | 16.64 |

tre-s-92 | 22.88 | 19.42 | 13.85 |

uta-s-92 | 43.95 | 30.13 | 25.57 |

ute-s-92 | 7.52 | 5.43 | 4.53 |

yor-f-83 | 17.64 | 13.40 | 11.31 |

#### 6.4. Tournament Selection Size

**Table 6.**Average difference (%) from the best reported solution in the bibliography with the genetic algorithm and a tournament selection size of 10.

Dataset | pop = 256 | pop = 2048 | pop = 16,384 |
---|---|---|---|

car-f-92 | 77.73 | 66.69 | 63.97 |

car-s-91 | 90.09 | 82.24 | 78.27 |

ear-f-83 | 67.90 | 58.75 | 61.48 |

hec-s-92 | 66.13 | 59.39 | 58.09 |

kfu-s-93 | 52.36 | 46.04 | 45.46 |

lse-f-91 | 67.17 | 64.63 | 63.47 |

pur-s-93 | INF | INF | INF |

rye-s-93 | 119.01 | 106.42 | 94.77 |

sta-f-83 | 19.43 | 19.43 | 19.42 |

tre-s-92 | 53.26 | 49.31 | 41.63 |

uta-s-92 | 80.02 | 73.28 | 71.11 |

ute-s-92 | 30.02 | 33.75 | 31.53 |

yor-f-83 | 37.82 | 35.69 | 34.25 |

**Table 7.**Average difference (%) from the best reported solution in the bibliography with the genetic algorithm and a tournament selection size of $population\phantom{\rule{4pt}{0ex}}size/32$.

Dataset | pop = 256 | pop = 2048 | pop = 16,384 |
---|---|---|---|

car-f-92 | 81.40 | 61.34 | 46.17 |

car-s-91 | 94.92 | 60.79 | 50.19 |

ear-f-83 | 65.67 | 56.00 | 49.19 |

hec-s-92 | 71.63 | 54.45 | 54.68 |

kfu-s-93 | 57.49 | 41.70 | 38.46 |

lse-f-91 | 77.11 | 53.93 | 50.95 |

pur-s-93 | INF | INF | INF |

rye-s-93 | 123.55 | 81.60 | 83.19 |

sta-f-83 | 19.43 | 19.43 | 18.84 |

tre-s-92 | 48.74 | 39.40 | 35.03 |

uta-s-92 | 80.55 | 61.57 | 48.31 |

ute-s-92 | 31.05 | 25.25 | 17.71 |

yor-f-83 | 38.58 | 32.22 | 27.03 |

#### 6.5. Quality of Solutions

Dataset | Minimum Cost | Maximum Cost | Average Cost |
---|---|---|---|

car-f-92 I | 4.86 | 5.32 | 5.04 |

car-s-91 I | 5.72 | 6.33 | 5.92 |

ear-f-83 I | 39.05 | 42.52 | 40.24 |

hec-s-92 I | 11.34 | 12.91 | 12.27 |

kfu-s-93 I | 15.45 | 16.98 | 16.14 |

lse-f-91 I | 12.06 | 13.37 | 12.68 |

pur-s-93 I | 10.75 | 33.64 | 21.51 |

rye-s-93 I | 9.20 | 10.48 | 9.63 |

sta-f-83 I | 157.38 | 157.70 | 157.47 |

tre-s-92 I | 9.15 | 9.75 | 9.51 |

uta-s-92 I | 4.05 | 4.33 | 4.20 |

ute-s-92 I | 25.43 | 27.18 | 26.19 |

yor-f-83 I | 39.52 | 42.41 | 41.32 |

Dataset | Minimum Cost | Maximum Cost | Average Cost |
---|---|---|---|

car-f-92 I | 4.77 | 5.06 | 4.92 |

car-s-91 I | 5.32 | 5.84 | 5.60 |

ear-f-83 I | 37.27 | 39.82 | 38.65 |

hec-s-92 I | 10.88 | 12.14 | 11.39 |

kfu-s-93 I | 14.48 | 16.00 | 15.33 |

lse-f-91 I | 11.22 | 12.62 | 11.93 |

pur-s-93 I | 5.25 | 14.99 | 8.36 |

rye-s-93 I | 8.64 | 9.32 | 8.98 |

sta-f-83 I | 157.20 | 157.45 | 157.39 |

tre-s-92 I | 8.72 | 9.61 | 9.17 |

uta-s-92 I | 3.64 | 4.13 | 3.90 |

ute-s-92 I | 25.11 | 26.71 | 25.58 |

yor-f-83 I | 38.80 | 41.21 | 40.05 |

Dataset | This Work | IGA [38] | hMOEA [35] | MMAS [34] | Ersoy [39] | Best |
---|---|---|---|---|---|---|

car-f-92 | 4.47 | 4.2 | 4.2 | 4.8 | - | 3.74 [61] |

car-s-91 | 5.24 | 4.9 | 5.4 | 5.7 | - | 4.42 [61] |

ear-f-83 | 34.41 | 35.9 | 34.2 | 36.8 | - | 29.3 [30] |

hec-s-92 | 10.39 | 11.5 | 10.4 | 11.3 | 11.6 | 9.2 [30] |

kfu-s-93 | 13.77 | 14.4 | 14.3 | 15.0 | 15.8 | 12.81 [62] |

lse-f-91 | 11.06 | 10.9 | 11.3 | 12.1 | 13.2 | 9.6 [30] |

pur-s-93 | 5.25 | 4.7 | - | 5.4 | - | 3.7 [30] |

rye-s-93 | 8.61 | 9.3 | 8.8 | 10.2 | - | 6.8 [30] |

sta-f-83 | 157.05 | 157.8 | 157.0 | 157.2 | 157.7 | 134.70 [28] |

tre-s-92 | 8.51 | 8.4 | 8.6 | 8.8 | - | 7.72 [62] |

uta-s-92 | 3.63 | 3.4 | 3.5 | 3.8 | - | 3.06 [61] |

ute-s-92 | 24.87 | 27.2 | 25.3 | 27.7 | 26.3 | 24.21 [63] |

yor-f-83 | 37.15 | 39.3 | 36.4 | 39.6 | 40.7 | 34.78 [62] |

normalized | 16.51% | 17.14% | 21.11% | 25.23% | - | - |

#### 6.6. Limitations and Advantages

## 7. Conclusions and Future Work

## Author Contributions

## Conflicts of Interest

