Figure 1.
The two architectures for w2v. (a) Skip-gram architecture, (b) continuos bag of words architecture.
Figure 2.
Graph showing the pipeline steps done in order to prepare the indexes and the runs.
Figure 3.
Graph showing the pipeline steps done in order to do the word2vec runs.
Figure 4.
T1: box plots for P@10 of the different models for each index. (a) NoPorterNoStop index, (b) Porter index, (c) PorterStop index, (d) Stop index.
Figure 5.
T1: box plots for P@10 of the different indexes for each model: (a) precision for BM25 with different indexes, (b) precision for DirichletLM with different indexes, (c) precision for PL2 with different indexes, (d) precision for TF-IDF with different indexes.
Figure 6.
T2: box plots for P@10 of the different models for each index: (a) NoPorterNoStop indexs, (b) Porter index, (c) PorterStop index, (d) Stop index.
Figure 7.
T2: box plots for P@10 of the different indexes for each model: (a) Precision for BM25 with different indexes, (b) Precision for DirichletLM with different indexes, (c) Precision for PL2 with different indexes, (d) precision for TF-IDF with different indexes.
Figure 8.
T1: box plots for P@10 of the different models for each index, with query expansion (QE) and relevance feedback (RF): (a) NoPorterNoStop indexs, (b) Porter index, (c) PorterStop index, (d) Stop index.
Figure 9.
T1: box plots for P@10 of the different indexes for each model, with QE+RF: (a) precision for BM25 with different indexes, (b) precision for DirichletLM with different indexes, (c) precision for PL2 with different indexes, (d) precision for TF-IDF with different indexes.
Figure 10.
T2: box plots for P@10 of the different models for each index, with QE and RF: (a) NoPorterNoStop indexs, (b) Porter index, (c) PorterStop index, (d) Stop index.
Figure 11.
T2: box plots for P@10 of the different indexes for each model, with QE+RF: (a) precision for BM25 with different indexes, (b) precision for DirichletLM with different indexes, (c) precision for PL2 with different indexes, (d) precision for TF-IDF with different indexes.
Figure 12.
Precision: box plots of the w2v runs: (a) T1: P@10 for w2v_avg and w2v_si, (b) T2: P@10 for w2v_avg and w2v_si, (c) T1: P@100 for w2v_avg and w2v_si, (d) T2: P@100 for w2v_avg and w2v_si, (e) T1: P@1000 for w2v_avg and w2v_si, (f) T2: P@1000 for w2v_avg and w2v_si.
Figure 13.
T1: box plots of P@10 of the fusion methods: (a) NoPorterNoStop fusions of the runs, (b) NoPorterNoStop+QR+RF fusions of the runs, (c) Porter fusions of the runs, (d) Porter+QE+RF fusions of the runs, (e) PorterStop fusions of the runs, (f) PorterStop+QE+RF fusions of the runs, (g) Stop fusions of the runs, (h) Stop+QE+RF fusions of the runs.
Figure 14.
T2: box plots of P@10 of the fusion methods: (a) NoPorterNoStop fusions of the runs, (b) NoPorterNoStop+QR+RF fusions of the runs, (c) Porter fusions of the runs, (d) Porter+QE+RF fusions of the runs, (e) PorterStop fusions of the runs, (f) PorterStop+QE+RF fusions of the runs, (g) Stop fusions of the runs, (h) Stop+QE+RF fusions of the runs.
Figure 15.
T1: scatter plots for Porter/Dirichlet vs. NoPorterNoStop/Dirichlet with QE+RF: (a) NDCG@10, (b) NDCG@100, (c) NDCG@1000, (d) Precision@10, (e) Precision@100, (f) Precision@1000.
Figure 16.
T2: scatter plots for Porter/Dirichlet vs. NoPorterNoStop/Dirichlet with QE+RF: box plots of P@10 of the fusion methods: (a) NDCG@10, (b) NDCG@100, (c) NDCG@1000, (d) Precision@10, (e) Precision@100, (f) Precision@1000.
Figure 17.
Scatter plots of Porter/Dirichlet with QE+RF vs. the same run without QE+RF: (a) P@10, (b) NDCG@10.
Figure 18.
Scatter plots of NoPorterNoStop/BM25 with QE+RF vs. the same run without QE+RF: (a) P@10, (b) NDCG@10.
Figure 19.
Scatter plots of NoPorterNoStop/TF-IDF vs. w2v-si: (a) P@10, (b) NDCG@10.
Figure 20.
T1: scatter plots of P@10 of the fusion of all the models using the same index: (a) CombSUM vs. RR fusion of the Terrier runs using NoPorterNoStop index NoPorterNoStop fusions of the runs, (b) CombSUM vs. RR fusion of the Terrier runs using Porter index, (c) CombSUM vs. RR fusion of the Terrier runs using PorterStop index, (d) CombSUM vs. RR fusion of the Terrier runs using Stop index.
Figure 21.
T1: scatter plots of P@10 of the fusion of the models with different indexes: (a) CombSUM vs. RR fusion of the Terrier runs with BM25 weighting scheme, (b) CombSUM vs. RR fusion of the Terrier runs with DirichletLM weighting scheme, (c) CombSUM vs. RR fusion of the Terrier runs with PL2 weighting scheme, (d) CombSUM vs. RR fusion of the Terrier runs with TF-IDF weighting scheme.
Figure 22.
T2: scatter plots of P@10 and NDCG@10 of Porter/DirichletLM vs. RR fusion of best runs per index with QE+RF: (a) P@10, (b) NDCG@10.
Figure 23.
T1: scatter plots of P@10 and NDCG@10 of Porter/DirichletLM vs. RR fusion of best runs per index with QE+RF: (a) P@10, (b) NDCG@10.
Figure 24.
T2: scatter plots of P@10 and NDCG@10 of NoPorterNoStop/DirichletLM vs. RR fusion of best runs per model with QE+RF: (a) P@10, (b) NDCG@10.
Figure 25.
T1: scatter plots of the best run vs. the fusion of the two best runs: (a) P@10 for Porter/Dirichlet vs. CombSUM, (b) P@10 for Porter/Dirichlet vs. RR, (c) NDCG@10 for Porter/Dirichlet vs. CombSUM, (d) NDCG@10 for Porter/Dirichlet vs. RR.
Figure 26.
T2: scatter plots of the best run vs. the fusion of the two best runs: (a) P@10 for Porter/Dirichlet vs. CombSUM, (b) P@10 for Porter/Dirichlet vs. RR, (c) NDCG@10 for Porter/Dirichlet vs. CombSUM, (d) NDCG@10 for Porter/Dirichlet vs. RR.
Table 1.
Summary of the datasets used.
Task | Tracks | Dataset | # Topics |
---|
T1 | 2018, 2019 | All articles of PUBMED | 60 |
T2 | 2017, 2018, 2019 | Result of boolean search on PUBMED | 90 |
Table 2.
Summary of all the runs analyzed per research question.
RQ | Runs | Total per Task |
---|
RQ1 | BM25, Dirichlet, PL2, TF-IDF, w2v-avg, w2v-si | 18 |
RQ2 | QE+RF of BM25, Dirichlet, PL2, TF-IDF | 16 |
RQ3 | RQ1, RQ2, N, P, P+S, S | 18 |
Table 3.
NDCG at various cut offs and Recall@R for the different models for T1. The highest values are in bold.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
NoPorterNoStop/BM25 | 0.0443 | 0.0389 | 0.082 | 0.0244 |
NoPorterNoStop/Dirichlet | 0.0167 | 0.0238 | 0.065 | 0.0169 |
NoPorterNoStop/PL2 | 0.0454 | 0.0387 | 0.0782 | 0.0243 |
NoPorterNoStop/TF_IDF | 0.0448 | 0.0393 | 0.0821 | 0.0248 |
Porter/BM25 | 0.1346 | 0.1538 | 0.2549 | 0.1045 |
Porter/Dirichlet | 0.1276 | 0.1652 | 0.2749 | 0.1088 |
Porter/PL2 | 0.1257 | 0.1495 | 0.2462 | 0.0985 |
Porter/TF_IDF | 0.1451 | 0.1682 | 0.2692 | 0.1077 |
PorterStop/BM25 | 0.1316 | 0.1559 | 0.2597 | 0.1004 |
PorterStop/Dirichlet | 0.1217 | 0.1612 | 0.2662 | 0.1003 |
PorterStop/PL2 | 0.1297 | 0.1461 | 0.2426 | 0.0928 |
PorterStop/TF_IDF | 0.1306 | 0.1551 | 0.258 | 0.1001 |
Stop/BM25 | 0.1186 | 0.1458 | 0.2374 | 0.0911 |
Stop/Dirichlet | 0.1243 | 0.1538 | 0.2496 | 0.1017 |
Stop/PL2 | 0.1209 | 0.1396 | 0.2238 | 0.0851 |
Stop/TF_IDF | 0.1172 | 0.1464 | 0.237 | 0.0911 |
Table 4.
NDCG at various cut offs and Recall@R for the different models for T2. The highest values are in bold.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
NoPorterNoStop/BM25 | 0.1861 | 0.2406 | 0.3737 | 0.166 |
NoPorterNoStop/Dirichlet | 0.1786 | 0.2526 | 0.3878 | 0.1707 |
NoPorterNoStop/PL2 | 0.2052 | 0.2525 | 0.3929 | 0.1803 |
NoPorterNoStop/TF_IDF | 0.2075 | 0.2665 | 0.4031 | 0.1835 |
Porter/BM25 | 0.2162 | 0.2654 | 0.4127 | 0.1814 |
Porter/Dirichlet | 0.2662 | 0.3165 | 0.459 | 0.2067 |
Porter/PL2 | 0.2565 | 0.2969 | 0.4464 | 0.2041 |
Porter/TF_IDF | 0.2761 | 0.3202 | 0.4716 | 0.2156 |
PorterStop/BM25 | 0.26 | 0.3144 | 0.4682 | 0.2062 |
PorterStop/Dirichlet | 0.2438 | 0.3032 | 0.4456 | 0.1959 |
PorterStop/PL2 | 0.2584 | 0.3058 | 0.4608 | 0.2037 |
PorterStop/TF_IDF | 0.2618 | 0.3148 | 0.4683 | 0.2072 |
Stop/BM25 | 0.2396 | 0.2967 | 0.4501 | 0.2004 |
Stop/Dirichlet | 0.2363 | 0.2977 | 0.4373 | 0.1981 |
Stop/PL2 | 0.2387 | 0.2933 | 0.4455 | 0.196 |
Stop/TF_IDF | 0.2391 | 0.2987 | 0.451 | 0.2015 |
Table 5.
T1: NDCG at various cut offs and Recall@R for the different models with QE+RF. The highest values are in bold.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
NoPorterNoStop/BM25 | 0.2313 | 0.227 | 0.3382 | 0.1471 |
NoPorterNoStop/Dirichlet | 0.3792 | 0.339 | 0.4687 | 0.2332 |
NoPorterNoStop/PL2 | 0.1987 | 0.2003 | 0.3101 | 0.1282 |
NoPorterNoStop/TF_IDF | 0.2162 | 0.2182 | 0.3297 | 0.1385 |
Porter/BM25 | 0.2417 | 0.2603 | 0.3872 | 0.1728 |
Porter/Dirichlet | 0.3778 | 0.3397 | 0.4684 | 0.2329 |
Porter/PL2 | 0.2114 | 0.2433 | 0.372 | 0.1549 |
Porter/TF_IDF | 0.2404 | 0.2576 | 0.3855 | 0.1624 |
PorterStop/BM25 | 0.2397 | 0.2484 | 0.3763 | 0.1584 |
PorterStop/Dirichlet | 0.3498 | 0.3256 | 0.4558 | 0.2252 |
PorterStop/PL2 | 0.2237 | 0.2337 | 0.3609 | 0.1485 |
PorterStop/TF_IDF | 0.233 | 0.2453 | 0.3736 | 0.1571 |
Stop/BM25 | 0.2201 | 0.239 | 0.3619 | 0.1586 |
Stop/Dirichlet | 0.3436 | 0.3273 | 0.4551 | 0.2372 |
Stop/PL2 | 0.2112 | 0.2234 | 0.3443 | 0.1503 |
Stop/TF_IDF | 0.2139 | 0.2354 | 0.3575 | 0.1573 |
Table 6.
DirichletLM+QE+RF for T1: P@10, P@100 and P@1000. The highest values are in bold.
Index | P@10 | P@100 | P@1000 |
---|
NoPorterNoStop | 0.36 | 0.1782 | 0.0557 |
Porter | 0.3633 | 0.1842 | 0.0558 |
PorterStop | 0.345 | 0.1773 | 0.0549 |
Stop | 0.3333 | 0.1853 | 0.0553 |
Table 7.
T2: NDCG at various cut offs and Recall@R for the different models with QE+RF. The highest values are in bold.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
NoPorterNoStop/BM25 | 0.4449 | 0.474 | 0.6113 | 0.3321 |
NoPorterNoStop/Dirichlet | 0.5231 | 0.5359 | 0.6644 | 0.3857 |
NoPorterNoStop/PL2 | 0.42 | 0.4581 | 0.6019 | 0.3239 |
NoPorterNoStop/TF_IDF | 0.4325 | 0.4681 | 0.6092 | 0.3175 |
Porter/BM25 | 0.3966 | 0.434 | 0.5815 | 0.3007 |
Porter/Dirichlet | 0.5213 | 0.5367 | 0.6648 | 0.3877 |
Porter/PL2 | 0.3939 | 0.4398 | 0.5878 | 0.3097 |
Porter/TF_IDF | 0.4182 | 0.4557 | 0.603 | 0.3125 |
PorterStop/BM25 | 0.4045 | 0.4484 | 0.5981 | 0.3066 |
PorterStop/Dirichlet | 0.52 | 0.5231 | 0.6525 | 0.3741 |
PorterStop/PL2 | 0.3801 | 0.4356 | 0.5864 | 0.3013 |
PorterStop/TF_IDF | 0.3958 | 0.443 | 0.5931 | 0.3017 |
Stop/BM25 | 0.3868 | 0.4329 | 0.5829 | 0.3046 |
Stop/Dirichlet | 0.5149 | 0.5231 | 0.6544 | 0.3691 |
Stop/PL2 | 0.3616 | 0.4225 | 0.574 | 0.2955 |
Stop/TF_IDF | 0.3776 | 0.4272 | 0.5781 | 0.2985 |
Table 8.
DirichletLM+QE+RF for T2: P@10, P@100 and P@1000. The highest values are in bold.
Index | P@10 | P@100 | P@1000 |
---|
NoPorterNoStop | 0.5056 | 0.2412 | 0.0602 |
Porter | 0.5 | 0.2412 | 0.0601 |
PorterStop | 0.5089 | 0.2337 | 0.0593 |
Stop | 0.5011 | 0.2351 | 0.0599 |
Table 9.
T1: scores of NDCG and R@R of w2v runs and Terrier with NoPorterNoStop index.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
w2v_avg | 0.1144 | 0.1109 | 0.1692 | 0.0689 |
w2v_si | 0.1185 | 0.1123 | 0.1708 | 0.0688 |
NoPorterNoStop/BM25 | 0.0443 | 0.0389 | 0.082 | 0.0244 |
NoPorterNoStop/Dirichlet | 0.0167 | 0.0238 | 0.065 | 0.0169 |
NoPorterNoStop/PL2 | 0.0454 | 0.0387 | 0.0782 | 0.0243 |
NoPorterNoStop/TF_IDF | 0.0448 | 0.0393 | 0.0821 | 0.0248 |
Table 10.
T2: scores of NDCG and R@R of w2v runs and Terrier with NoPorterNoStop index.
Index/Model | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
w2v_avg | 0.2065 | 0.2228 | 0.3523 | 0.1328 |
w2v_si | 0.2104 | 0.2238 | 0.3539 | 0.1347 |
NoPorterNoStop/BM25 | 0.1861 | 0.2406 | 0.3737 | 0.166 |
NoPorterNoStop/Dirichlet | 0.1786 | 0.2526 | 0.3878 | 0.1707 |
NoPorterNoStop/PL2 | 0.2052 | 0.2525 | 0.3929 | 0.1803 |
NoPorterNoStop/TF_IDF | 0.2075 | 0.2665 | 0.4031 | 0.1835 |
Table 11.
T1: NDCG and Recall@R for the fusion runs. The highest values are in bold.
Index/Fusion | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
N/CombSUM | 0.0474 | 0.0439 | 0.0836 | 0.0246 |
N/ProbFUSE | 0.0269 | 0.0378 | 0.0778 | 0.0249 |
N/RR | 0.0453 | 0.0405 | 0.0794 | 0.0241 |
P/CombSUM | 0.1438 | 0.1715 | 0.2718 | 0.1095 |
P/ProbFUSE | 0.0552 | 0.1498 | 0.2569 | 0.1013 |
P/RR | 0.1538 | 0.1701 | 0.2726 | 0.1096 |
P+S/CombSUM | 0.1383 | 0.1628 | 0.2671 | 0.105 |
P+S/ProbFUSE | 0.0543 | 0.1441 | 0.2499 | 0.0932 |
P+S/RR | 0.1447 | 0.1635 | 0.2649 | 0.1029 |
S/CombSUM | 0.1198 | 0.1551 | 0.242 | 0.0973 |
S/ProbFUSE | 0.0634 | 0.1427 | 0.2368 | 0.0875 |
S/RR | 0.1264 | 0.1529 | 0.2437 | 0.0973 |
Table 12.
T1: NDCG and Recall@R for the fusion runs with QE+RF. The highest values are in bold.
Index/Fusion | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
N/CombSUM | 0.2861 | 0.2756 | 0.3749 | 0.1813 |
N/ProbFUSE | 0.1995 | 0.2476 | 0.3918 | 0.1702 |
N/RR | 0.2796 | 0.2588 | 0.3984 | 0.1672 |
P/CombSUM | 0.3084 | 0.3154 | 0.4278 | 0.2101 |
P/ProbFUSE | 0.1798 | 0.269 | 0.4107 | 0.1912 |
P/RR | 0.2952 | 0.2964 | 0.4341 | 0.1942 |
P+S/CombSUM | 0.2889 | 0.2963 | 0.4135 | 0.1963 |
P+S/ProbFUSE | 0.1758 | 0.2499 | 0.3975 | 0.1788 |
P+S/RR | 0.2878 | 0.2801 | 0.4224 | 0.1822 |
S/CombSUM | 0.2824 | 0.2842 | 0.3956 | 0.1915 |
S/ProbFUSE | 0.1874 | 0.248 | 0.3898 | 0.1751 |
S/RR | 0.2661 | 0.2684 | 0.4039 | 0.186 |
Table 13.
T2: NDCG and Recall@R for the fusion runs. The highest values are in bold.
Index/Fusion | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
N/CombSUM | 0.2061 | 0.2652 | 0.4028 | 0.1852 |
N/ProbFUSE | 0.1546 | 0.2446 | 0.3844 | 0.1621 |
N/RR | 0.2117 | 0.2673 | 0.4044 | 0.1815 |
P/CombSUM | 0.2746 | 0.3189 | 0.4694 | 0.213 |
P/ProbFUSE | 0.1831 | 0.2873 | 0.4377 | 0.1913 |
P/RR | 0.2752 | 0.3158 | 0.4667 | 0.2132 |
P+S/CombSUM | 0.2642 | 0.3175 | 0.47 | 0.2064 |
P+S/ProbFUSE | 0.172 | 0.281 | 0.4353 | 0.1789 |
P+S/RR | 0.2612 | 0.3152 | 0.4688 | 0.2063 |
S/CombSUM | 0.242 | 0.2993 | 0.4516 | 0.2018 |
S/ProbFUSE | 0.1674 | 0.2791 | 0.4293 | 0.1765 |
S/RR | 0.246 | 0.3007 | 0.4505 | 0.219 |
Table 14.
T2: NDCG and Recall@R for the fusion runs with QE+RF. The highest values are in bold.
Index/Fusion | NDCG@10 | NDCG@100 | NDCG@1000 | R@R |
---|
N/CombSUM | 0.4867 | 0.5098 | 0.6407 | 0.3664 |
N/ProbFUSE | 0.3264 | 0.4451 | 0.5814 | 0.2982 |
N/RR | 0.4708 | 0.4992 | 0.6392 | 0.3512 |
P/CombSUM | 0.4837 | 0.5056 | 0.6417 | 0.3574 |
P/ProbFUSE | 0.3378 | 0.4413 | 0.5821 | 0.2899 |
P/RR | 0.4628 | 0.4881 | 0.6332 | 0.3492 |
P+S/CombSUM | 0.4531 | 0.4852 | 0.6236 | 0.3442 |
P+S/ProbFUSE | 0.3139 | 0.4281 | 0.571 | 0.2868 |
P+S/RR | 0.4353 | 0.4737 | 0.6211 | 0.3346 |
S/CombSUM | 0.4462 | 0.4738 | 0.6103 | 0.339 |
S/ProbFUSE | 0.323 | 0.4266 | 0.5691 | 0.2797 |
S/RR | 0.4311 | 0.461 | 0.6103 | 0.319 |
Table 15.
T1: of P/D vs. N/D and count of the number of topics in which P/D is better than N/D and viceversa.
Measure | | # Of Outscored Topics P/D vs. N/D |
---|
P@10 | 0.003 | 12 vs. 11 |
P@100 | 0.006 | 26 vs. 21 |
P@1000 | 0 | 23 vs. 17 |
NDCG@10 | −0.001 | 24 vs. 29 |
NDCG@100 | 0.001 | 32 vs. 27 |
NDCG@1000 | 0 | 32 vs. 27 |
Table 16.
T2: of P/D vs. N/D and count of the number of topics in which P/D is better than N/D and viceversa.
Measure | | # Of Outscored Topics P/D vs. N/D |
---|
P@10 | −0.006 | 21 vs. 22 |
P@100 | 0 | 26 vs. 21 |
P@1000 | 0 | 13 vs. 19 |
NDCG@10 | −0.002 | 42 vs. 39 |
NDCG@100 | 0.001 | 50 vs. 38 |
NDCG@1000 | 0 | 53 vs. 36 |
Table 17.
Dirichlet + QE + RF run vs. Dirichlet without QE+RF scores.
Measure | | # Of Outscored Topics P/D+QE+RF vs. P/D |
---|
P@10 | 0.234 | 73 vs. 5 |
P@100 | 0.087 | 74 vs. 3 |
P@1000 | 0.14 | 66 vs. 3 |
NDCG@10 | 0.255 | 79 vs. 10 |
NDCG@100 | 0.22 | 81 vs. 8 |
NDCG@1000 | 0.206 | 84 vs. 6 |
Table 18.
BM25 + QE + RF run vs. BM25 without QE+RF scores.
Measure | | # Of Outscored Topics N/B+QE+RF vs. N/B |
---|
P@10 | 0.236 | 70 vs. 1 |
P@100 | 0.095 | 80 vs. 2 |
P@1000 | 0.21 | 69 vs. 1 |
NDCG@10 | 0.259 | 76 vs. 6 |
NDCG@100 | 0.233 | 85 vs. 4 |
NDCG@1000 | 0.238 | 86 vs. 3 |
Table 19.
T1: scores to find the best fusion, CombSUM vs. RR.
Fused Runs | Measure | P@10 | P@100 | P | NDCG@10 | NDCG |
---|
NoPorterNoStop | | 0 | 0.002 | 0.001 | 0.002 | 0.004 |
# outscore | 0 vs. 0 | 8 vs. 4 | 10 v 3 | 3 vs. 2 | 26 vs. 4 |
Porter | | −0.008 | 0.003 | 0 | −0.01 | −0.001 |
# outscore | 4 vs. 7 | 17 vs. 4 | 14 vs. 15 | 12 vs. 20 | 30 vs. 29 |
PorterStop | | −0.005 | 0 | 0 | −0.006 | 0.002 |
# outscore | 5 vs. 8 | 11 vs. 10 | 17 vs. 12 | 12 vs. 23 | 32 vs. 26 |
Stop | | −0.012 | 0.002 | 0 | −0.007 | −0.002 |
# outscore | 3 vs. 9 | 13 vs. 3 | 17 vs. 16 | 5 vs. 23 | 26 vs. 31 |
BM25 | | −0.01 | −0.03 | 0.02 | −0.007 | 0.008 |
# outscore | 4 vs. 10 | 7 vs. 19 | 25 vs. 11 | 12 vs. 16 | 42 vs. 18 |
DirichletLM | | −0.002 | −0.005 | −0.001 | −0.004 | −0.009 |
# outscore | 3 vs. 5 | 13 vs. 20 | 16 vs. 19 | 11 vs. 11 | 35 vs. 24 |
PL2 | | 0 | −0.004 | 0.001 | −0.002 | 0.001 |
# outscore | 6 vs. 5 | 5 vs. 20 | 27 vs. 13 | 20 vs. 27 | 37 vs. 23 |
TF-IDF | | −0.012 | −0.004 | 0.002 | −0.009 | 0.004 |
# outscore | 4 vs. 10 | 4 vs. 22 | 27 vs. 9 | 15 vs. 15 | 41 vs. 19 |
Table 20.
T1: Porter/DirichletLM+QE+RF run vs. RR fusion of DirichletLM+QE+RF model with the four indexes.
Measure | | # Outscore |
---|
P@10 | 0.012 | 14 vs. 11 |
P@100 | −0.004 | 20 vs. 20 |
P@1000 | −0.001 | 14 vs. 21 |
NDCG@10 | 0.009 | 31 vs. 19 |
NDCG@100 | −0.006 | 28 vs. 31 |
NDCG@1000 | −0.008 | 24 vs. 35 |
Table 21.
T2: Porter/DirichletLM+QE+RF run vs. RR fusion of DirichletLM+QE+RF model with the four indexes.
Measure | | # Outscore |
---|
P@10 | −0.012 | 18 vs. 26 |
P@100 | −0.001 | 21 vs. 26 |
P@1000 | 0 | 10 vs. 20 |
NDCG@10 | −0.007 | 36 vs. 41 |
NDCG@100 | −0.002 | 40 vs. 48 |
NDCG@1000 | −0.001 | 41 vs. 48 |
Table 22.
T1: Porter/DirichletLM+QE+RF run vs. RR fusion of models using Porter index.
Measure | | # Outscore |
---|
P@10 | 0.087 | 37 vs. 12 |
P@100 | 0.013 | 34 vs. 22 |
P@1000 | 0 | 26 vs. 26 |
NDCG@10 | 0.083 | 40 vs. 19 |
NDCG@100 | 0.043 | 45 vs. 15 |
NDCG@1000 | 0.034 | 37 vs. 23 |
Table 23.
T2: Porter/DirichletLM+QE+RF run vs. RR fusion of models using NoPorterNoStop index.
Measure | | # Outscore |
---|
P@10 | 0.046 | 44 vs. 21 |
P@100 | 0.008 | 39 vs. 25 |
P@1000 | −0.001 | 20 vs. 32 |
NDCG@10 | 0.051 | 54 vs. 33 |
NDCG@100 | 0.037 | 54 vs. 34 |
NDCG@1000 | 0.026 | 47 vs. 42 |
Table 24.
T1: comparison of the best run vs. the CombSUM fusion of the two best runs.
Measure | | # Outscore |
---|
P@10 | −0.012 | 10 vs. 13 |
P@100 | 0.001 | 19 vs. 16 |
P@1000 | 0 | 15 vs. 10 |
NDCG@10 | −0.011 | 24 vs. 25 |
NDCG@100 | −0.004 | 29 vs. 28 |
NDCG@1000 | −0.003 | 25 vs. 34 |
Table 25.
T1: comparison of the best run vs. the reciprocal ranking (RR) fusion of the two best runs.
Measure | | # Outscore |
---|
P@10 | −0.012 | 9 vs. 14 |
P@100 | 0.001 | 20 vs. 18 |
P@1000 | 0 | 16 vs. 14 |
NDCG@10 | −0.016 | 18 vs. 30 |
NDCG@100 | −0.006 | 27 vs. 32 |
NDCG@1000 | −0.005 | 27 vs. 31 |
Table 26.
T2: comparison of the best run vs. the CombSUM fusion of the two best runs.
Measure | | # Outscore |
---|
P@10 | −0.014 | 8 vs. 19 |
P@100 | −0.001 | 16 vs. 26 |
P@1000 | 0 | 7 vs. 8 |
NDCG@10 | .009 | 34 vs. 35 |
NDCG@100 | .001 | 48 vs. 40 |
NDCG@1000 | −0.002 | 45 vs. 42 |
Table 27.
T2: comparison of the best run vs. the RR fusion of the two best runs.
Measure | | # Outscore |
---|
P@10 | −0.016 | 11 vs. 25 |
P@100 | −0.001 | 15 vs. 25 |
P@1000 | 0 | 10 vs. 13 |
NDCG@10 | −0.009 | 35 vs. 39 |
NDCG@100 | −0.004 | 41 vs. 45 |
NDCG@1000 | −0.004 | 46 vs. 42 |
Table 28.
T1: ANOVA test results for the different models and indexes for P@10.
Model | F Test | p-Value | Model | F Test | p-Value |
---|
BM25 | 4.321 | 0.0055 | N | 0.592 | 0.6211 |
BM25+QE+RF | 0.055 | 0.9829 | N+QE+RF | 7.927 | 4.65 |
Dirichlet | 9.343 | 7.35 | P | 0.198 | 0.8979 |
Dirichlet+QE+RF | 0.278 | 0.8413 | P+QE+RF | 8.864 | 1.37 |
PL2 | 4.083 | 0.0075 | P+S | 0.026 | 0.9943 |
PL2+QE+RF | 0.283 | 0.838 | P+S+QE+RF | 6.461 | 0.0003 |
TF-IDF | 4.584 | 0.0039 | S | 0.046 | 0.9868 |
TF-IDF+QE+RF | 0.244 | 0.8658 | S+QE+RF | 5.786 | 0.0008 |
Table 29.
T2: ANOVA test results for the different models and indexes for P@10.
Model | F Test | p-Value | Model | F Fest | p-Value |
---|
BM25 | 1.015 | 0.386 | N | 0.352 | 0.7876 |
BM25+QE+RF | 0.743 | 0.527 | N+QE+RF | 2.404 | 0.0672 |
Dirichlet | 1.938 | 0.1231 | P | 0.781 | 0.5051 |
Dirichlet+QE+RF | 0.027 | 0.9939 | P+QE+RF | 4.066 | 0.0073 |
PL2 | 0.563 | 0.6398 | P+S | 0.094 | 0.9633 |
PL2+QE+RF | 0.712 | 0.5452 | P+S+QE+RF | 5.64 | 0.0009 |
TF-IDF | 0.933 | 0.4247 | S | 0.014 | 0.9977 |
TF-IDF+QE+RF | 0.566 | 0.6379 | S+QE+RF | 5.664 | 0.0008 |
Table 30.
T1: ANOVA test results for the indexes for P@10.
Model | F Fest | p-Value |
---|
NoPorterNoStop+QE+RF | 0.091 | 0.7636 |
Porter+QE+RF | 0.26 | 0.6111 |
PorterStop+S+QE+RF | 0.165 | 0.6852 |
Stop+QE+RF | 0.47 | 0.4945 |
Table 31.
T2: ANOVA test results for indexes for P@10.
Model | F Fest | p-Value |
---|
NoPorterNoStop+QE+RF | 0.244 | 0.6221 |
Porter+QE+RF | 0.11 | 0.7404 |
PorterStop+S+QE+RF | 0.305 | 0.5812 |
Stop+QE+RF | 0.113 | 0.736714 |