Solving Multi-Document Summarization as an Orienteering Problem

With advances in information technology, people face the problem of dealing with tremendous amounts of information and need ways to save time and effort by summarizing the most important and relevant information. Thus, automatic text summarization has become necessary to reduce the information overload. This article proposes a novel extractive graph-based approach to solve the multi-document summarization (MDS) problem. To optimize the coverage of information in the output summary, the problem is formulated as an orienteering problem and heuristically solved by an ant colony system algorithm. The performance of the implemented system (MDS-OP) was evaluated on DUC 2004 (Task 2) and MultiLing 2015 (MMS task) benchmark corpora using several ROUGE metrics, as well as other methods. Its comparison with the performances of 26 systems shows that MDS-OP achieved the best F-measure scores on both tasks in terms of ROUGE-1 and ROUGE-L (DUC 2004), ROUGE-SU4, and three other evaluation methods (MultiLing 2015). Overall, MDS-OP ranked among the best 3 systems.


Introduction
Despite the availability of information today, users need tools that enable them to reach their desired content by automatically summarizing the important and relevant parts and discarding those that are similar.Text summarization, specifically multi-document text summarization (MDS), which creates a summary from a set of related documents, is a useful solution to the exponential growth of information on the Internet [1].In more than half a century, several approaches have been used to automatically generate summaries, such as statistical and graph-based approaches.In addition, some summarization studies have followed a global selection approach of summary sentences whereby they are selected in a way that optimizes the overall score of the resulting summary.Studies in both global selection and graph-based approaches have achieved promising results (for more details, see Sections 2.4 and 2.6.) Motivated by the promising results of those studies, this paper proposes a novel graph-based MDS approach to produce extractive generic summaries that optimize information coverage objective.The summarization problem is reduced to an orienteering problem (OP), a variant of the traveling salesman problem (TSP) [2].Different problems were modeled as an OP and many heuristics were proposed to approximate their solution [3,4].Reducing the MDS problem into OP enables obtaining the benefits from all these studies.The proposed approach is based on a swarm intelligence (SI) meta-heuristic-more specifically, an ant colony system (ACS) [5]-to find an approximate solution to MDS.ACS is a variant of ant colony optimization (ACO) algorithms which are considered among the best SI algorithms applied to TSP [5].ACS was applied for single text summarization and short document summarization; however, to the best of our knowledge, it has not been studied for MDS.
The remainder of this paper is arranged as follows.Section 2 briefly presents some related studies.Sections 3 and 4 describe OP and ACS, respectively.Section 5 describes the main steps of the proposed solution.Section 6 presents the experimental results.Finally, Section 7 concludes this study and outlines some future research directions.

Related Work
For more than half a century, several approaches have been used to automatically generate summaries.Different ways of classifying these approaches have been presented in the literature.Based on linguistic space levels, Mani [10] divided the summarization approaches into shallow approaches, deeper approaches, and hybrid approaches.The shallow approaches commonly generate extractive summaries, and their representation level is, at most, at the syntactic level.The deeper approaches produce abstractive summaries, and their representation level is, at least, at the semantic level.The hybrid approaches combine the two aforementioned approaches.In addition, based on the dominant techniques used in the summarization process, Lloret and Palomar [11] differentiated among five kinds of approaches: statistical-based, topic-based, graph-based, discourse-based, and machine learning-based systems.This section presents seven approaches for text summarization, including those based on SI meta-heuristics.

Statistical Approaches
This approach has been followed since the first development of the field [12,13].An example of a statistical feature that has been used frequently in the summarization studies is word frequency.Litkowski [14] proposed a frequency-based method for sentence ranking , whereby the summary is generated by extracting the top ranked ones while checking and eliminating complete duplicates.Lacatusu et al. [15] proposed a summarization system which exploits the highest ranked terms, called topic signature terms.Each term is given a weight based on its relative frequency in a relevant cluster of documents.In more detail, the system scores sentences based on these terms.Then, to deal with the redundancies, it chooses a sentence to be part of the summary if the number of overlapped topic signature terms between this sentence and the already selected ones is below a predefined threshold.Another statistical feature used in summarization is term frequency times inverse document frequency (TF-IDF), which is an information-retrieval word importance measure that is also used to calculate sentence scores.Nobata and Sekine [16] used this feature along with other features to give a score to each sentence.Nevertheless, to improve the performance, the position feature is only used with document clusters in which the key sentences occur at the beginning of each document.Finally, the redundancy problem is handled by computing the similarity between sentences.
Conroy et al. [17] examined the effects of four different term weighting approaches, including term frequency and nonnegative matrix factorization, as well as different sentence segmentation and tokenization methods on the performance of multi-lingual single document and multi document summarizers.In the sentence extraction phase, they used OCCAMS [18] algorithm.Balikas and Amini [19] proposed an MDS approach in which enhanced text representations are produced by a neural network and used to extract summary sentences based on the cosine similarity measure.Each sentence is compared to the most frequent terms and to the title of the document to which it belongs.A text mining algorithm was also used.Hirao et al. [20] scored sentences based on a sequential pattern (n-grams and gappy n-grams), which is extracted using a text mining algorithm [20].
Wan et al. [21] examined the hierarchical Latent Dirichlet Allocation (hLDA) model for text summarization.A hierarchical topic tree is built for each set of documents.Each node represents a latent topic and each sentence is assigned to a path which starts at the root and ends at a leaf.Summary sentences are extracted in a way that maximizes the coverage of the important sub topics and ensures that the redundancy rate among these sentences is less than 0.5.

Machine Learning Approaches
Machine learning algorithms have been used in text summarization systems for different purposes, such as to select summary sentences or to assign weights to text terms.The CLASSY model [22] uses a hidden Markov model (HMM) along with the pivoted QR algorithm [23] to score and select summary sentences.The classifier was trained using the DUC 2003 corpus.In addition, CLASSY uses only one feature; the number of signature tokens, which are the tokens that are more likely to be found in the document to be summarized than in the corpus.Genetic algorithms (GAs) have been used in summarization.Litvak et al. [24] proposed a multilingual text summarization tool, called MUSEEC.This tool is an extension of the MUSE [25] summarization algorithm that gives a score for each sentence calculated based on a weighted linear combination of different language independent statistical features.MUSE follows a supervised learning approach that uses GA to find the best weights of these features.MUSEEC expands MUSE by adding a list of features based on Part-of-Speech tagging.Finally, deep learning has been recently introduced to text summarization.Zhong et al. [26] and Yousefi-Azar and Hamey [27] used different deep learning models with query-oriented single document and multi-document summarization, respectively.

Clustering Approaches
Clustering has been used in text summarization to identify the topics in a set of documents.Blair-Goldensohn et al. [28] proposed a system that divides the sentences into clusters and identifies one representative sentence for each cluster.Then, the proposed system ranks these representative sentences based on the size of the cluster they belong to.Finally, it creates a summary by selecting the top-ranked representative sentences.Aries et al. [29] proposed a method that uses a fuzzy clustering algorithm to cluster the input text into topics.The summary sentences are scored based on their coverage of these topics.
The centroid of the document cluster has been included in the solution of many text summarizers, such as in the summarization approach proposed by Saggion and Gaizauskas [30].It ranks the sentences based on three features: the similarity to the cluster centroid, the similarity to the lead part of the document, and the sentence position.In addition to identifying text topics, Angheluta et al. [31] used a clustering to eliminate the redundancy in the summary sentences.In addition, the important sentences were chosen based on the number of keywords they contained, and keywords were detected using the authors' topic segmentation module.

Graph-Based Approaches
Representing texts as graphs has become a widely used approach in the application of text processing [32].In the field of text summarization, many studies have followed the graph-based approach and used different graph representations.For example, graph nodes can represent different types of textual units, such as sentences [33] or words [34].Moreover, the edges between the nodes can be represented using different types of relationships, such as using the cosine similarity measure [33].Furthermore, different kinds of graphs have been used, such as the bipartite graph between documents terms and sentences [35].Several graph-based algorithms, such as random walk [36] and spreading activation [34], have been used to give a score for each node.
Vanderwende et al. [37] proposed a system that creates a semantic graph by connecting the nodes of the logical forms of text sentences with bidirectional edges that represent the semantic relationships and produces the summary by extracting and merging part of the logical forms.Wan et al. [38] proposed an iterative reinforcement solution that combines ideas similar to PageRank [39] and the HITS [40] graph-ranking algorithms.Erkan and Radev [33] proposed a new sentence centrality measure, LexPageRank, based on the concept of prestige or centrality in the social network field.This measure is similar to the PageRank [39] method, except that the edges in the sentence similarity graph are undirected edges that are added based on a predefined threshold.
Remus and Bordag [41] proposed an algorithm that starts by a clustering step where the input documents are ordered chronologically based on time references extracted from the text and then grouped based on their position on the time line.Then, it ranks the sentences in each cluster separately using FairTextRank algorithm, which is an iterative extension of the PageRank [39] algorithm.The summary is constructed by selecting sentences from each cluster.Overall, graph-based summarization approaches have been shown competitive with the other state-of-the-art approaches.For example, the graph-based summarization approach proposed by Wan et al. [42] outperformed the best three participating systems in DUC 2003 and DUC 2004 competitions.

Semantic Approaches
Lexical and co-reference chains have been investigated for text summarization.Chali and Kolla [43] extracted lexical chains from the text to give a score for each sentence, segment, and cluster (the documents to be summarized are divided into clusters).Then, summary sentences are selected by extracting the best sentences from the best segment of the best cluster.Bergler et al. [44] proposed a solution that consists in ranking the noun phrases (NPs) based on the NP cross-document co-reference chains and generating the summary by extracting the sentences with the top-ranked NPs.

Optimization-Based Approaches
Several summarization studies propose to solve the text summarization as an optimization problem.The selection of summary sentences has been reduced to different optimization problems.The summary sentences are selected according to one of the following approaches: (1) the greedy selection approach, in which the best textual units are selected one item at a time, and (2) the global optimal selection approach that searches for the best summary rather than the best sentences.The first approach rarely produces the best summaries [45] and, thus, most of the summarization studies are based on the second one.In the literature, several objectives have been studied and optimized using different optimization methods.Rautray and Balabantaray [46] described some of these objectives, including text coherence, which is the relatedness of summary contents (e.g., sentences) and significance, which is how relevant the summary content is to the documents to be summarized and to the user's needs (e.g., user query).Nevertheless, in all these formulations, searching for the optimal summary is an NP-hard problem [47], and it is therefore essential to approximate the solution to MDS.Meta-heuristics can be used to find approximate solutions to NP-hard problems, such as GA [48] and a population-based method [49].For example, Vanetik and Litvak [50] proposed a linear programming-based global optimization method to extract summary sentences.
Promising results have been produced from the global optimal selection approach.For example, the evaluation results of summaries produced by Shen and Li's summarization framework [47] are not far from the results of the best methods in various DUC competitions.In addition, all the state-of-the-art methods on corpora from DUC 2004 through DUC 2007 in both generic and query-driven summarization were outperformed by the proposed solution of Lin and Bilmes [51], a monotone, non-decreasing submodular function for summarizing documents.Finally, in addition to the fact that obtaining approximate solutions are much faster than obtaining the exact one, some studies have shown that the results of both solutions are comparable [52].Nevertheless, these experiments have been conducted on a limited size problem.

Swarm-Intelligence-Based Approaches
SI has been introduced to text summarization during the last decade.It produced promising results in several studies on different NLP problems, including text summarization [53][54][55][56][57].The majority of SI-based summarization studies used particle swarm optimization (PSO).In these studies, PSO algorithms were used to select summary sentences [53,58] or set the weight of each feature extracted from the text to be summarized [54].Alguliev et al. [53] proposed an optimization model to solve the summarization problem.This model uses a discrete PSO algorithm to generate multi-document summaries by maximizing their coverage and diversity.Using DUC 2001 and 2002 corpora, the model showed promising evaluation results.Binwahlan et al. [54] used a PSO algorithm as a machine learning technique and ROUGE-1 as a fitness function to investigate the best features' weights.Asgari et al. [58] proposed an extractive single-document summarization method based on a multi-agent PSO.
Other SI meta-heuristics have also been used with text summarization, including artificial bee colony (ABC) [55,56], ACO [57,59], and cuckoo search (CS) [60].Peyrard and Eckle-Kohler [55] proposed a general optimization framework to summarize a set of input documents using the ABC algorithm.Sanchez-Gomez et al. [56] also proposed an ABC based summarizer by formulating the summarization problem as a multi-objective one.ACO has also been used for single document [61] and short text [57,59] summarization problems.Finally, Rautray and Balabantaray [60] proposed a multi-document summarizer using CS meta-heuristic.

Orienteering Problem
The orienteering problem (OP) is an NP-hard problem which was introduced in 1987 by Golden et al. [62].Its name came from the orienteering sport [63], which is the game where the competitors must find a path by visiting some of the control points within a limited amount of time.Each control point has a score or profit.Each competitor should start at a certain control point and return to another one.The competitors try to maximize the total collected profit gained from the visited control points without exceeding the time budget constraint.OP belongs to the family of problems called traveling salesman problems (TSPs) with profits [64].These problems are variants of the TSP where each vertex has a profit and the solution can include a subset of the existing vertices.The objective function of OP is to maximize the collected scores while the travel cost (e.g., time) is a constraint to satisfy (e.g., time to not exceed) [3].In other words, the OP asks to find a path starting from the first vertex and ending at the last one that maximizes the total collected scores while the total traveling time does not exceed a predefined time budget.
More formally, OP can be described as follows [3].Given a graph GpV, Aq where A is the set of the graph arcs, and V " tv 1 , . . ., v N u is the set of the N graph vertices (i.e., the set of all control points in the problem).Each vertex v i has a profit p i .Each arc a ij has a traveled time t ij .A binary variable x ij represents a traversed arc a ij from vertex v i to vertex v j .OP asks to maximize the objective: subject to the following constraints: 2 ď u i ď N; @i " 2, . . ., N u i ´uj `1 ď pN ´1qp1 ´xij q; @i, j " 2, . . ., N.
The aforementioned objective function maximizes the total profit of the selected vertices.Equation ( 2) represents the time constraint by ensuring that the total traveled time does not exceed a pre-defined time budget T max .Equation (3) guarantees that the vertices v 1 and v N are selected as the first and the last vertices of the solution path, respectively.Equation ( 4) ensures the connectivity of the solution path and the uniqueness of its vertices.Equations ( 5) and ( 6) guarantee that the solution path does not contain sub-tours where u i stands for the position of the vertex v i in the path.

Ant Colony Optimization
ACO is an SI meta-heuristic inspired from the collective behavior of real ant colonies.Ants use pheromone traces to communicate with each other to find the shortest path between their nest and food.There are several ACO algorithm variants for approximating solutions to optimization problems, such as the ant system (AS) and ACS [5].
ACS was proposed by Dorigo and Gambardella [5] as an improvement of the AS algorithm for solving large instances of the TSP.ACS modifies the three updating rules of AS: the state transition rule, the global updating rule, and the local updating rule.An ant k chooses to move from city r to city s by using the following rule: s "

#
arg max uPJ k prq trτpr, uqs.rηpr, uqs β u i f q ď q 0 pexploitationq S i f q ą q 0 pbiased explorationq where J k prq is the set of all cities that can be visited by the ant k, τ represents the desirability measure (the pheromone), η stands for the heuristic value, q is a random number q uniformly distributed over r0, 1s, and q 0 is a parameter with a value between 0 and 1 (inclusive) to control the relative importance of exploration versus exploitation.The parameter β has a value greater than zero and controls the relative weight of the pheromone with respect to the heuristic.S is a randomly selected city chosen according to the following probability distribution: % rτpr,sqs.rηpr,sqsβ ř uPJ k prq rτpr,uqs.rηpr,uqs The state transition rule of ACS, formulated by Equations ( 7) and ( 8), is called pseudo-random proportional.The global updating rule is applied once all the ants complete their tours.In ACS, only the ant that produces the best tour, so far, is allowed to add an amount of pheromone according to the following equation: τpr, sq Ð p1 ´αq.τpr, sq `α.∆τpr, sq where ∆τpr, sq " # pL gb q ´1 i f pr, sq P global best tour 0 otherwise .
α is the pheromone decay parameter whose values range between 0 and 1, and L gb is the cost of the best solution generated from the beginning of the trial.The local updating rule is applied during the construction of the solutions.The amounts of pheromone of the visited edges are updated as follows: τpr, sq Ð p1 ´ρq.τpr, sq `ρ.∆τpr, sq (10) where the value of the parameter ρ is between 0 and 1 (exclusive).A possible value for ∆τpr, sq is the initial pheromone value τ 0 .

The Proposed Solution
The contribution of this work consists of reducing an MDS instance into an OP instance and then optimizing the information coverage by using an ACS algorithm.Figure 1 illustrates the main components of the implemented system MDS-OP.

Preprocessing
Four preprocessing steps are applied including text segmentation, tokenization, stemming, and stop word removal.Text segmentation and tokenization divide the text into sentences and words, respectively, by using the Stanford CoreNLP tools [65].Stop words are removed to filter out common words with low semantic weight [66].Examples of these words are "and" "the" and "to".An English stop word list (http://jmlr.csail.mit.edu/papersvolume5/lewis04a/a11-smart-stop-list/english.stop) from the SMART information retrieval system is used.Word stemming is performed by using Porter stemmer (https://tartarus.org/martin/PorterStemmer/).This step enables an equal treatment of the different variants of terms.

Building an Intermediate Representation
In this stage, the graph representation is built.The texts to be summarized are represented as a connected directed graph.Each sentence is added to the graph as a vertex with a weight representing its content score (i.e., its saliency).Regarding the graph arcs, two arcs in opposite directions are added between each pair of vertices (i.e., sentences).The weights of each vertex is calculated in the third stage (see Section 5.3) and used to optimize the information coverage of the output summaries.The weight of each arc stands for the length of the original sentence (i.e., before the preprocessing stage) that is represented by the vertex at the end of the arc.In other words, the weight of a graph arc from vertex v i to vertex v j is the length of the sentence s j .See Figure 2 for an example of input text with four sentences.
An example of the intermediate representation of an input text with four sentences.The length of Sentence 1 (s 1 ), Sentence 2 (s 2 ), Sentence 3 (s 3 ), and Sentence 4 (s 4 ) are l 1 , l 2 , l 3 , and l 4 , respectively.

Computing the Content Scores
In this stage, the score of each word in the text is computed to get the content scores of the sentences.The content score for each sentence is based on the scores of the words it contains.The proposed algorithm to compute the scores of the words follows the iterative reinforcement approach proposed by Wan et al. [38].It combines ideas similar to PageRank [39] and the HITS [40] graph-ranking algorithms.First, three graphs are built: (1) a sentence-to sentence graph to represent the relationship among the sentences, (2) a word-to-word graph to represent the relationship among the words, and (3) a sentence-to-word bipartite graph that connects each sentence with the words it contains.To compute the scores of the words, the algorithm applies a PageRank-based method to the sentence-to-sentence and word-to-word graphs, and an HITS-based method to the sentence-to-word graph, where hubs represent the sentences, and authorities represent the words.
The proposed algorithm computes the arcs weights of the sentence-to-sentence and the sentence-to-word graphs based on the TF-ISF scores and cosine similarity measure.For the word-to-word graph, the arc weights are equal to the longest common substring between the two connected words.The weights of the arcs in sentence-to-sentence, sentence-to-word, and word-to-word graphs are represented by three matrices: U, W, and V, respectively.The scores of the words (represented by matrix V) and sentences (represented by matrix U) are computed by applying the following two equations, which are calculated repeatedly until a convergence state is reached.
where r U, r W, and r V are the normalized version of the matrices U, W, and V, respectively.The normalized transposed of the matrix W is p W. The values of the vector u pnq and vector u pn´1q represent the values of the vector U at the iterations n and n ´1, respectively.Similarly, the values of the vector v pnq and vector v pn´1q represent the values of the vector V at the iterations n and n ´1, respectively.After each calculation of u pnq and v pnq , the two vectors are normalized.In addition, to emphasize the importance of the first sentences, the proposed algorithm gives more weight to the words of these sentences.
Several differences exist between the reinforcement approach of Wan et al. [38] and the proposed algorithm.The proposed algorithm generates multi-document summaries instead of single-document summaries.It uses the scores of the words to maximize the overall information coverage score of the resulting summary, whereas the reinforcement approach uses the scores of the sentences to generate the summaries.Moreover, it computes the similarities among the words in the word-to-word graph based on the longest common substring to keep the proposed solution language independent, instead of using knowledge-based or corpus-based measures.

Selecting Summary Sentences
In this stage, the MDS is formulated as an OP to maximize the content coverage of the produced summaries.ACS is then used to approximate a solution to OP.Consider an MDS instance.In this study, the textual unit chosen is the sentence.Therefore, each document is split into sentences.Let D be a set of related documents to summarize.D " ts 1 , . . ., s |D| u, where s k represents sentence k (1 ď k ď |D|) and |D| is the total number of sentences in D. The MDS problem asks to create a sequence summary S of a maximum length L by extracting part of sentences from D such that the overall content coverage of S is maximized.More formally, it asks to optimize the following objective: where cov k is the content coverage score of sentence s k , and z k is a binary variable which equals 1 if s k is a summary sentence and 0 otherwise.The length of sentence s k is l k .In this study, the content coverage score of each sentence is expressed by the total weight of its words that have not been covered by other sentences already in S. In other words, regardless of the number of occurrences of a word j covered by S, its weight w j is added only once to the total content coverage score.Therefore, instead of using the scores of sentences, the content coverage score of S is expressed by the total weight of words it covers as follows: where b j is a binary variable defined as follows: The constant d kj equals 1 if the sentence s k contains the word j and 0 otherwise.

Encoding of an MDS Instance into an OP Instance
Algorithm 1 outlines the main steps to encode an MDS instance into an OP instance.The computational complexity of Algorithm 1 can be estimated as follows.The number of iterations of the first loop is |D|.The number of iterations of the second loop is |D `2 ˆpp|D| `2q ´1q, which is the number of the arcs in the graph that can be created by using the sentences as the vertices where two arcs are added between each pair of these vertices.Thus, Algorithm 1 runs in Op|D| 2 q in the worst case.T max : the time budget 6: V: the set of graph vertices (N " |V|) Create an arc a rk from vertex v r to vertex v k 26: t rk Ð l k´1

27:
A Ð A Y a rk

28:
Create an arc a kr from vertex v k to vertex v r 29: t kr Ð l r´1

30:
A Ð A Y a kr 31: end for 32: return OPpT max , V, Aq

Decoding a Solution to OP into a Solution to MDS
Algorithm 2 presents the main steps to decode a solution to an instance of OP into a solution to an instance of MDS.In other words, this algorithm decodes a path of an OP instance into a summary of an MDS instance.The while loop in Algorithm 2 iterates at most |N| ´2, or in other words, |D| times.Therefore, in the worst case, the algorithm runs in Op|D|q time.
Algorithm 2 Decoding of a solution to OP into a solution to MDS. i Ð nextpiq Ź Get the number of the next vertex in P 7: end while 8: return S

Correctness of the Reduction
The correctness of the reduction of an MDS instance into an OP instance is presented and proved as follows.
Theorem 1.Let MDSpD, Lq be an MDS instance where D is a set of related documents to be summarized and L is the maximum summary length.Let OPpT max , V, Aq be an OP instance where T max represents its time budget, and V and A be the sets of vertices and arcs, respectively.An MDS instance has a solution summary S with a length up to L and its content coverage is maximized if and only if its corresponding OP instance given by Algorithm 1 has a solution path P that maximizes the total gained profit, while the total traveled time is less than or equal to T max .
Proof of Theorem 1. Suppose that an MDS instance has a solution summary S, which is a sequence of sentences xs s1 , . . ., s sq y where s sk is a k th sentence at S and q is the number summary sentences.The corresponding OP instance given by Algorithm 1 has a solution path p " xv 1 , v ps1q`1 , . . ., v psqq`1 , v N y.Additionally, v pskq`1 in P represents the k th summary sentence (i.e., s sk ) of the corresponding MDS instance.Based on Algorithm 1, the profit of v pskq`1 in P represents the coverage score of s sk in S.Moreover, the weight of the arc from, for example, vertex v sr to vertex v sk in P, represents the added length by including s pskq´1 in S. Furthermore, time budget T max represents summary length L. Thus, the following can be concluded:

•
The length of S is less than or equal to L, so the total traveled time of P is less than or equal to T max : Maximizing the overall content coverage score of S will maximize the total gained profit of P: maxp ř s i PS cov i q ñ maxp ř v i PP p i q (maximize the profit).
Conversely, suppose that the OP instance has a solution path P, which is a sequence of vertices xv 1 , v p2 , . . ., v py , v N y, where pk represents the k th vertex at P and y `1 is the total number of visited vertices.Therefore, based on Algorithm 2, the corresponding MDS solution summary S " xs pp2q´1 , . . ., s ppyq´1 y is created by appending a sequence of sentences that are represented by the visited vertices in P starting from the second vertex until reaching the vertex located before the last one (i.e., ignoring the starting and the ending vertices).In other words, the k th vertex in P represents the sentence s ppkq´1 in S. As a result,

•
If the traveled time of P is less than or equal to T max , then the total length of S is less than or equal to L: Maximizing the gained profit of P will maximize the score of the overall content coverage of S: maxp ř v i PP p i q ñ maxp ř s i PS cov i q (maximize the coverage).

ACS for OP
An ACS algorithm is proposed to approximate a solution to an OP instance.The original ACS algorithm was proposed by Dorigo and Gambardella [5] for the TSP.The main steps of the proposed ACS algorithm for the OP are outlined by Algorithm 3. Specifically, due to the time constraint, each ant could terminate its path and become inactive at different times based on the vertices (i.e., sentences) it includes in its solution (i.e., path).A set, called active_set, is used to keep track of the active ants, which are the ants whose own traveled time does not reach the time budget (i.e., max summary length), and thus can move further to another vertex and complete its solution.In addition, to maximize the coverage objective, the content score of each sentence (i.e., the profit of each vertex) is dynamic.This means that the coverage objective is updated at each cycle based on the last vertex that joins the path.Therefore, each ant has its own graph to keep track of the current profits values of the graph vertices.
To satisfy the time constraint by ants, each ant k has its own time T k .The path of ant k is stored in the path P k .Moreover, each ant k keeps track of the set of vertices J k that have not yet been visited.

Algorithm 3
Approximating an OP solution using ACS.for each arc a ij in A do 14: for each ant k do Ź Initialize each ant 18: T k Ð T max 20: if pt 1pk`1q `tpk`1qN q ď T max then Ź Check the time budget constraint J k pr k1 q Ð t1, . . ., nu ´rk1 ´1 ´N Ź J k pr k1 q is the set of vertices that can be visited by ant k at vertex r k1 in addition to v N

29:
end for 30: Ź Building the paths of ants J k pd k q Ð J k pr k q ´dk 38: Ź Pheromone global updating using Equation ( 9) 56: for each a ij included in P best do 57: end for

59:
Ź Check if the best current path P best is better than all the paths that have been discovered so far 60: if pL best ą L best_so_ f ar q then 61: L best_so_ f ar Ð L best

62:
P best_so_ f ar Ð P best

63:
end if 64: end while 65: return P best_so_ f ar At the beginning, the first and last vertices are both removed from J k .Then, after adding the first vertex v 1 , each ant is moved to a different vertex.The number of ants is equal to the number of vertices minus 2 (representing the first and the last vertices.)The heuristic value used by each ant to move from its current location to a new vertex is based on the profit gained from the current vertex.For the ACS parameters, the same values recommended by Dorigo and Gambardella [5] were used, except the number of ants, which was set to the number of sentences in the input text (see Table 1.)

Experiments
Several experiments were conducted on a PC equipped with an Intel(R) Core(TM) i7-6500U CPU, a 2.5 Ghz processor, and 12 GB of RAM.MDS-OP was implemented by using the Java programming language.

Corpora
Since 1998, a number of conferences and workshops have been organized to address automatic summarization problems.In this study, two corpora were selected to evaluate the performance of MDS-OP.The first one is DUC 2004 (Task 2).DUCs (http://www.nlpir.nist.gov/projects/duc/index. html) were an important series of conferences that addressed issues of automatic summarization.They were held yearly between 2001 and 2007.Competitions were organized to compare summarization systems on different corpora related to different tasks.The corpus of Task 2 from DUC 2004 consists of 50 English clusters where each one of them contains around 10 documents, and the main task is to create a short summary with a length up to 665 bytes for each cluster.The published results include the scores of eight human summarizers, a baseline, and 16 participating systems (see Table 2).* The official DUC website does not contain any details on these participants.
The second corpus used to evaluate the algorithm is the corpus of the MMS task at MultiLing 2015 (http://multiling.iit.demokritos.gr/pagesview/1516/multiling-2015)[67], which is a special session at SIGdial 2015 (http://www.sigdial.org/workshops/conference16/).It was built upon the corpus of the MultiLing 2013 (http://www.nist.gov/tac/2011/Summarization/index.html)workshop at ACL 2013, which in turn is based on the TAC 2011 MultiLing Pilot (http://multiling.iit.demokritos.gr/pages/view/662/multiling-2013) corpus.This corpus contains sets of documents written in 10 languages.MDS-OP was evaluated on the English version which consists of 15 sets, each including 10 documents.The participants were asked to provide partially or fully language-independent multi-document summarizers, which produce at most 250 words summaries.Each summarizer was applied to at least two different languages.Ten teams participated to MMS task in MultiLing 2015 (see Table 3).

Evaluation Metrics
This study used ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [6] to evaluate the performance of MDS-OP.ROUGE is a set of metrics used to automatically evaluate summarization systems by counting the overlapping units (e.g., n-gram) between the automatically produced summary (i.e., the peer) and one or a set of human created summaries (i.e., references).ROUGE has been adapted by DUC since DUC 2004 [68] and it comprises different metrics, including ROUGE-N (N-gram Co-Occurrence Statistics), ROUGE-L (Longest Common Subsequence), ROUGE-S (Skip-Bigram Co-Occurrence Statistics), and ROUGE-W (Weighted Longest Common Subsequence).The recall version of ROUGE-N measure, for example, evaluates a given summary by calculating the n-gram recall between the summary obtained and a set of reference (i.e., model) summaries as follows: where Count match is the maximum number of the shared n-grams between the summary to be evaluated and reference ones, Count is the sum of the number of n-grams in the reference summaries, and n is the n-gram length.ROUGE-L evaluates the summary based on the shared longest common subsequence (LCS).ROUGE-W is similar to ROUGE-L except it gives more weight to consecutive matches.Finally, ROUGE-S and ROUGE-SU evaluate the summary based on the shared skip-bigrams with and without the addition of unigram as counting unit, respectively.The recent version of ROUGE software package (i.e., ROUGE 1.5.5)calculates the recall, precision, and F-measure scores.In this paper, for the DUC 2004 corpus, all the results of the baseline, humans, and rival systems were re-evaluated using this version so all the the comparison results are based on this version.The same values of ROUGE parameters provided at the competition were used.Different ROUGE metrics, similar to those used at the competition, were also used in this study.Specifically, ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-4, ROUGE-L, and ROUGE-W were used.Similar evaluation process was applied to MultiLing 2015 corpus.The evaluation results are based on using the same ROUGE metrics (ROUGE-1, ROUGE-2, and ROUGE-SU4) and parameters used at the TAC 2011 MultiLing Pilot.The performance results of MDS-OP are reported in terms of the average F-measure scores of five runs.
In addition to ROUGE metrics, three official evaluation methods at MultiLing 2015 were used.The first one is AutoSummENG (AUTOmatic SUMMary Evaluation based on N-gram Graphs) [7], which is a language-independence method that creates an n-gram graph for each reference and peer summaries.It calculates the performance of a summary by averaging the similarities between this summary and each reference summaries.The second method is MeMoG (Merged Model Graph) [8], which is a variation of AutoSummENG where one merged graph represents all reference summaries.Finally, NPowER (N-gram graph Powered Evaluation via Regression) [9] is the third method, which is a machine-learning-based method where the features are the evaluation methods and the target is the human evaluation grade.It uses liner regressions to combine the previous two evaluation methods.

Evaluation Results
Teams that participated to DUC 2004 were allowed up to three runs (i.e., three variants of each system).In the comparison results depicted in Table 4 and Figures 3-8, only the best results among the runs of these systems are presented.Similarly, several teams at MultiLing 2015 participated with several variants, so only the best variants are included in the evaluation.Moreover, three sets of documents (M001, M002, and M003) were not included in the evaluation since they were provided to the participants as a training set (see Tables 5 and 6, and Figures 9-14).
MDS-OP achieved the best ROUGE-1 and ROUGE-L scores in comparison to the 16 participated systems and a baseline system (Figures 3 and 7).It obtained the second best ROUGE-2 and ROUGE-W-1.2scores (Figures 4 and 8), and the third best ROUGE-3 and ROUGE-4 scores (Figures 5 and 6).Figures 4-6 show that CCSNSA04 is the top ranked system based on ROUGE-2, ROUGE-3, and ROUGE-4 metrics.Figure 8 show that MEDLAB_Fudan is the top ranked system in regard to ROUGE-W.The relative improvements of MDS-OP over the systems CCSNSA04 and MEDLAB_Fudan are 1.78% (ROUGE-1) and 0.14% (ROUGE-L), respectively.The average improvements of MDS-OP over all the other systems are 14.06% (ROUGE-1) and 13.56% (ROUGE-L).An ANOVA test (p-value = 0.5) was performed on MDS-OP and the other participated systems, and MDS-OP significantly outperformed eight systems in terms of ROUGE-1, five systems in terms of ROUGE-2, five systems in terms of ROUGE-3, three systems in terms of ROUGE-4, nine systems in terms of ROUGE-L, and eight systems in terms of ROUGE-W.Finally, although MDS-OP was outperformed by CCSNSA04 (ROUGE-2, ROUGE-3, and ROUGE-4), MEDLAB_Fudan (ROUGE-W), and crl_nyu.duc04(ROUGE-3 and ROUGE-4), there were no statistically significant differences between these systems and MDS-OP.
The F-measure scores achieved by MDS-OP and those of the 10 participated systems on MultiLing 2015 (MMS task) are presented in Table 5 in terms of ROUGE-1 (R-1), ROUGE-2 (R-2), and ROUGE-SU4 (R-SU4), and in Table 6 in terms of the evaluation methods AutoSummENG, MeMoG, and NPowER.MDS-OP produced the best ROUGE-SU4 scores (Figure 11), and the second best ROUGE-1 and ROUGE-2 scores (Figures 9 and 10).The systems MMS8 and MMS2 are the top ranked in regard to ROUGE-1 and ROUGE-2, respectively.The relative improvements of MDS-OP over MMS2 and MMS8 in terms of ROUGE-SU4 are 0.3% and 2.22%, respectively .Moreover, MDS-OP outperformed all the other systems based on the evaluation methods AutoSummENG (Figure 12), MeMoG (Figure 13), and NPowER (Figure 14).It outperformed MMS8 (an improvement of 12.05% in terms of AutoSummENG) and MMS2 (an improvement of 13.56% in terms of MeMoG and an improvement of 4.66% in terms of NPowER).The average improvements of MDS-OP over all the other systems are 12.83% (ROUGE-SU4), 26.32% (AutoSummENG), 31.79%(MeMoG), and 9.07% (NPowER).ANOVA test was also conducted on the ROUGE results for this corpora, and it showed that MDS-OP significantly outperformed the systems MMS11 and MMS12 in terms of ROUGE-1 and the systems MMS1, MMS11, and MMS12 in terms of ROUGE-2 and ROUGE-SU4.Finally, in regard to the overall performance of MDS-OP on both corpora, the average ROUGE-1 and ROUGE-2 results are 0.42721 and 0.13084, respectively.

Conclusions
In this paper, we proposed to solve the extractive MDS problem by encoding it as an OP, and approximate its solution by an ACS meta-heuristic.The implemented system MDS-OP was evaluated on two benchmark corpora including DUC 2004 (Task 2) and MultiLing 2015 (MMS task) using several ROUGE metrics and the three official evaluation methods adopted at MultiLing 2015 (AutoSummENG, MeMoG, and NPowER).Its performance was compared to those of 26 systems, which participated in DUC 2004 and MultiLing 2015 competitions.The F-measure scores show that MDS-OP outperformed the 16 systems that participated at DUC 2004 (Task 2) in terms of ROUGE-1 and ROUGE-L.It also outperformed the 10 systems which participated at MultiLing 2015 (MMS task) in terms of ROUGE-SU4, AutoSummENG, MeMoG, and NPowER.The performance of MDS-OP in terms of other ROUGE metrics (ROUGE-2, ROUGE-3, ROUGE-4, and ROUGE-W) ranked it among the best three systems.These results demonstrate the effectiveness of the proposed approach for MDS.MDS-OP does not need a training phase as required by machine-learning based systems.It relies on only statistical and graph-based features.However, the robustness of its performance depends on the tuning of the parameters of ACS.
In a future work, we plan to study other semantic features and their impact on the performance of MDS-OP, as well as other SI metaheuritics which were examined for solving OP.In addition, to improve the readability of a summary, the text coherency can be included as a second objective, which would motivate a bi-objective formulation and solving of the MDS problem.This would be performed by adding the coherence scores between pairs of sentences in the OP graph and optimizing the order of summary sentences.Different methods would be examined to calculate the local coherence scores between sentences pairs.

Algorithm 1 1 :
Encoding of an MDS instance into an OP instance.Input: MDSpD, Lq: MDS instance 2: D: the set of related documents to be summarized 3: L: maximum summary length 4: Output: OPpT max , V, Aq: OP instance 5:

1 :
Input: Path P: a sequence of vertices (starts at vertex v 1 and ends at vertex v N ) 2: Output: Summary S: a sequence of sentences 3: i Ð nextp1q Ź Get the number of the second vertex in P 4: while i ‰ N do 5: S Ð S `si´1 Ź Get the sentence and append it to the end of S 6:

1 : 7 :Ź
Input: OPpT max , V, Aq: an OP instance 2:T max : the time budget 3:V: the set of graph vertices (N " |V|) Output: P best : a solution (i.e., path) to the input OP instance.8: L best_so_ f ar Ð 0 Ź Initialize the content score of the best path found so far 9: P best_so_ f ar Ð ∅ Ź Initialize the best path found so far 10: while I " 0 do 11: Starting the initialization step 13:

21 :active_ant Ð active_ant `ant k 22 :T k Ð T k ´t1r k 26 :
P k Ð P k `vr 1 Ź Append the first vertex to the path 23: r k1 Ð k `1 Ź r k1 is the second vertex for ant k 24: P k Ð P k `vr k1 25: r k Ð r k1 Ź The vertex r k is current location of ant k 27:

7 :
A: the set of graph arcs 8: T max Ð L 9: V Ð ∅ 10: A Ð ∅ 11: Create s 0 Ź Create an empty sentence s 0 to be represented by v 1 12: cov 0 Ð 0 13: l 0 Ð 0 14: Create s |D|`1 Ź Create an empty sentence s |D|`1 to be represented by v N 15: l |D|`1 Ð 0 16: cove |D|`1 Ð 0 17: i Ð 0 Ź Adding the sentences as vertices 18: while i ď |D| `1 do the initial pheromone level of arc a ij if v d k is not exist then Ź Can't add any vertex and satisfy the time constraint T k or ant k 31:while active_ant " ∅ do32:for each ant k in active_ant do the traveled time to reach v d k r k d k Ð p1 ´ρq τ r k d k `ρ τ 0 Ź a r k d k is the arc from r k to d k τ

Table 1 .
ACS parameter settings.´1.L nn , L nn is the overall coverage (i.e., total profit) of the summary generated by following the nearest neighbor heuristic, and n is the number of sentences in this summary.

Table 3 .
Systems that participated at MultiLing 2015 (MMS task).The official MultiLing 2015 website does not contain any details on these participants. *

Table 4 .
F-measure scores of ROUGE-1, ROUGE-2, and ROUGE-3, ROUGE-4, ROUGE-L, and ROUGE-W-1.2metrics of MDS-OP, the baseline, and the participating systems at DUC 2004 (Task 2).The highest values are written in bold.The highest and the lowest improvements (%) of MDS-OP are indicated by ‹ and ˚, respectively.

Table 6 .
Scores of MDS-OP and the participating systems on MultiLing 2015 (MMS task) obtained with evaluation methods AutoSummENG, MeMoG, and NPowER.The highest values are written in bold.The highest and the lowest improvements (%) of MDS-OP are indicated by ‹ and ˚, respectively.