Solving MultiDocument Summarization as an Orienteering Problem
Abstract
:1. Introduction
2. Related Work
2.1. Statistical Approaches
2.2. Machine Learning Approaches
2.3. Clustering Approaches
2.4. GraphBased Approaches
2.5. Semantic Approaches
2.6. OptimizationBased Approaches
2.7. SwarmIntelligenceBased Approaches
3. Orienteering Problem
4. Ant Colony Optimization
5. The Proposed Solution
5.1. Preprocessing
5.2. Building an Intermediate Representation
5.3. Computing the Content Scores
5.4. Selecting Summary Sentences
5.4.1. Encoding of an MDS Instance into an OP Instance
Algorithm 1 Encoding of an MDS instance into an OP instance. 

5.4.2. Decoding a Solution to OP into a Solution to MDS
Algorithm 2 Decoding of a solution to OP into a solution to MDS. 

5.4.3. Correctness of the Reduction
 The length of S is less than or equal to L, so the total traveled time of P is less than or equal to ${T}_{max}$:$${\sum}_{{s}_{i}\in S}{l}_{i}\le L\Rightarrow {\sum}_{{v}_{i}\in P}{\sum}_{{v}_{j}\in P}{t}_{ij}{x}_{ij}\le {T}_{max}\text{}\left(\mathrm{time}\text{}\mathrm{budget}\text{}\mathrm{constraint}\right).$$
 Maximizing the overall content coverage score of S will maximize the total gained profit of P:$$\underset{}{max}\left({\sum}_{{s}_{i}\in S}co{v}_{i}\right)\Rightarrow \underset{}{max}\left({\sum}_{{v}_{i}\in P}{p}_{i}\right)\text{}\left(\mathrm{maximize}\text{}\mathrm{the}\text{}\mathrm{profit}\right).$$
 If the traveled time of P is less than or equal to ${T}_{max}$, then the total length of S is less than or equal to L:$${\sum}_{{v}_{i}\in P}{\sum}_{{v}_{j}\in P}{t}_{ij}{x}_{ij}\le {T}_{max}\Rightarrow {\sum}_{{s}_{i}\in S}{l}_{i}\le L\text{}\left(\mathrm{summary}\text{}\mathrm{length}\right).$$
 Maximizing the gained profit of P will maximize the score of the overall content coverage of S:$$\underset{}{max}\left({\sum}_{{v}_{i}\in P}{p}_{i}\right)\Rightarrow \underset{}{max}\left({\sum}_{{s}_{i}\in S}co{v}_{i}\right)\text{}\left(\mathrm{maximize}\text{}\mathrm{the}\text{}\mathrm{coverage}\right).$$
5.4.4. ACS for OP
Algorithm 3 Approximating an OP solution using ACS. 

6. Experiments
6.1. Corpora
6.2. Evaluation Metrics
6.3. Evaluation Results
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ACO  Ant colony optimization 
ACS  Ant colony system 
ABC  Artificial bee colony 
AS  Ant system 
AutoSummENG  AUTOmatic SUMMary Evaluation based on Ngram Graphs 
CS  Cuckoo search 
DUC  Document understanding conference 
GA  Genetic Algorithm 
hLDA  Hierarchical Latent Dirichlet Allocation 
HMM  Hidden Markov model 
LCS  Longest common subsequence 
MDS  Multidocument summarization 
MeMoG  Merged Model Graph 
MMS  Multilingual multidocument summarization 
NP  Noun phrase 
NPowER  Ngram graph Powered Evaluation via Regression 
OP  Orienteering problem 
PSO  Particle swarm optimization 
ROUGE  RecallOriented Understudy for Gisting Evaluation 
SI  Swarm intelligence 
TAC  Text analysis conference 
TSP  Traveling salesman problem 
TFIDF  Term frequency times inverse document frequency 
TFISF  Term frequency times inverse sentence frequency 
Parameter  Value 

Number of ants (m)  Number of sentences in the text to be summarized. 
Initial pheromone value (${\tau}_{0}$)  ${n}^{1}.{L}_{nn}$, ${L}_{nn}$ is the overall coverage (i.e., total profit) of the summary generated by following the nearest neighbor heuristic, and n is the number of sentences in this summary. 
Pheromone decay parameters ($\alpha $ and $\rho $ )  0.1 
Heuristic exponent ($\beta $)  2 
System ID  Research Group (Participant)  Reference 

CCSNSA04  NSA  [22] 
MEDLAB_Fudan  Fudan University  * 
CL  CL Research  [14] 
LARIS.2004  Laris Labs  [48] 
ULeth2004  University of Lethbridge  [43] 
columbia1  Columbia University  [28] 
CLaCDUCTape2  Concordia University  [44] 
webcl2004  ISI  * 
kul.2004  KU Leuven  [31] 
lcc.duc04  LCC  [15] 
uofo  University of Ottawa  * 
msrnlp.duc2004  Microsoft  [37] 
crl_nyu.duc04  CRL/NYU  [16] 
nttcslab.duc2004  NTT  [20] 
shef2004.saggion  University of Sheffield  [30] 
UofMMEAD  University of Michigan  [33] 
System ID  Participant  Reference 

MMS1  UJFGrenoble  [19] 
MMS2  UWB  * 
MMS3  ExB  [41] 
MMS5  ESIAllSummarizer  [29] 
MMS8  IDAOCCAMS  [17] 
MMS9  GiauUngVan  * 
MMS11  SCEPoly  [50] 
MMS12  BUPTCIST  [21] 
MMS13  BGUMUSE  [24] 
MMS15  NCSR/SCIFYNewSumRerank  * 
System ID  R1  R2  R3  R4  RL  RW  Relative Improvement of  

MDSOP (%)  
R1  R2  R3  R4  RL  RW  
MDSOP  0.386142  0.08799  0.031144  0.013086  0.33438  0.15             
2 (baseline)  0.3212  0.06402  0.02011  0.00694  0.2847  0.12639  +20.22  +37.44  +54.87  +88.56  +17.45  +17.39 
CCSNSA04  0.37938  0.09215  0.03589  0.01689  0.32803  0.14707  +1.78 ^{*}  −4.51 ^{*}  −13.22 ^{*}  −22.52 ^{*}  +1.94  +0.88 
MEDLAB_Fudan  0.37584  0.0839  0.02675  0.01068  0.3339  0.14853  +2.74  +4.87  +16.43  +22.53  +0.14 ^{*}  −0.11 ^{*} 
CL  0.3319  0.07652  0.02762  0.01278  0.28452  0.12568  +16.34  + 14.99  +12.76  +2.39  +17.52  +18.05 
LARIS.2004  0.37422  0.08033  0.02555  0.01013  0.32356  0.14308  +3.19  +9.54  +21.89  +29.18  +3.34  +3.69 
ULeth2004  0.31238  0.0513  0.01364  0.00469  0.26949  0.11886  +23.61  +71.52  +128.33  +179.02  +24.08  +24.82 
columbia1  0.36282  0.07763  0.02637  0.01232  0.32299  0.14339  +6.43  +13.35  +18.10  +6.22  +3.53  +3.47 
CLaCDUCTape2  0.35387  0.07028  0.02047  0.00856  0.30801  0.13787  +9.12  +25.2  +52.14  +52.87  +8.56  +7.61 
webcl2004  0.3643  0.07987  0.02743  0.01253  0.31921  0.14298  +6  +10.17  +13.54  +4.44  +4.75  +3.77 
kul.2004  0.34142  0.07812  0.02599  0.01094  0.29622  0.13193  +13.1  +12.63  +19.83  +19.62  +12.88  +12.46 
lcc.duc04  0.37155  0.08528  0.02713  0.01073  0.32281  0.1441  +3.93  +3.18  +14.8  +21.96  + 3.58  +2.96 
uofo  0.23412  0.01806  0.00265  0.00074  0.21411  0.09549  +64.93 ^{⋆}  +387.21 ^{⋆}  +1075.25 ^{⋆}  +1668.38 ^{⋆}  +56.17 ^{⋆}  +55.37 ^{⋆} 
msrnlp.duc2004  0.33918  0.05853  0.01338  0.00377  0.30147  0.13339  +13.85  +50.338  +132.77  +247.11  +10.92  +11.23 
crl_nyu.duc04  0.34644  0.08608  0.03442  0.01635  0.29838  0.13124  +11.46  +2.22  −9.52  −19.96  +12.07  +13.05 
nttcslab.duc2004  0.31263  0.05376  0.014  0.00547  0.27008  0.11745  +23.51  +63.67  +122.46  +139.23  +23.81  +26.32 
shef2004.saggion  0.36763  0.08255  0.02843  0.01212  0.31964  0.14306  +5.04  +6.59  +9.55  +7.97  +4.61  +3.71 
UofMMEAD  0.33962  0.07135  0.02342  0.01019  0.26726  0.12144  +13.7  +23.32  +32.98  +28.42  +25.11  +22.17 
System ID  R1  R2  RSU4  Relative Improvement of MDSOP (%)  

R1  R2  RSU4  
MDSOP  0.468276  0.173698  0.204328       
MMS1  0.42463  0.12593  0.16892  +10.28  +37.93  +20.96 
MMS2  0.45302  0.17452  0.20371  +3.37  −0.47 ^{*}  +0.30 ^{*} 
MMS3  0.43478  0.15572  0.19  +7.7  +11.55  +7.54 
MMS5  0.43857  0.1576  0.18962  +6.77  +10.21  +7.76 
MMS8  0.47035  0.1673  0.19989  −0.44 ^{*}  +3.82  +2.22 
MMS9  0.4281  0.14296  0.1844  +9.38  +21.50  +10.81 
MMS11  0.41515  0.12438  0.1665  +12.8  +39.65  +22.72 
MMS12  0.39243  0.10205  0.14846  +19.33 ^{⋆}  +70.21 ^{⋆}  +37.63 ^{⋆} 
MMS13  0.43376  0.15885  0.1914  +7.96  +9.35  +6.75 
MMS15  0.42514  0.15414  0.18308  +10.15  +12.69  +11.61 
System ID  AutoSummENG  MeMoG  NPowER  Relative Improvement of MDSOP (%)  

AutoSummENG  MeMoG  NPowER  
MDSOP  0.2157  0.2521  1.9942       
MMS1  0.1751  0.1988  1.8441  +23.19  +26.81  +8.14 
MMS2  0.1909  0.222  1.9054  +12.99  +13.56 ^{*}  +4.66 ^{*} 
MMS3  0.164  0.1848  1.8039  +31.52  +36.42  +10.55 
MMS5  0.1778  0.1944  1.8436  +21.32  +29.68  +8.17 
MMS8  0.1925  0.2185  1.9046  +12.05 ^{*}  +15.38  +4.7 
MMS9  0.1657  0.1797  1.8013  +30.18  +40.29  +10.71 
MMS11  0.1688  0.1836  1.8125  +27.78  +37.31  +10.02 
MMS12  0.1475  0.1651  1.7453  +46.24 ^{⋆}  +52.7 ^{⋆}  +14.26 ^{⋆} 
MMS13  0.1607  0.1801  1.7911  +34.23  +39.98  +11.34 
MMS15  0.1744  0.2004  1.8446  +23.68  +25.8  +8.11 
