Evaluation of Diversification Techniques for Legal Information Retrieval †

“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information institutes of the world, a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques should be employed in the context of legal information retrieval, as to increase user satisfaction. We address the diversification of results in legal search by adopting several state of the art methods from the web search, network analysis and text summarization domains. We provide an exhaustive evaluation of the methods, using a standard dataset from the common law domain that we objectively annotated with relevance judgments for this purpose. Our results: (i) reveal that users receive broader insights across the results they get from a legal information retrieval system; (ii) demonstrate that web search diversification techniques outperform other approaches (e.g., summarization-based, graph-based methods) in the context of legal diversification; and (iii) offer balance boundaries between reinforcing relevant documents or sampling the information space around the legal query.


Introduction
Nowadays, as a consequence of many open data initiatives, more and more publicly available portals and datasets provide legal resources to citizens, researchers and legislation stakeholders.Thus, legal data that was previously available only on a specialized audience and in "closed" format is now freely available on the internet.an endpoint to access millions of regulations, legislation, judicial cases, or administrative decisions.Such portals allow for multiple search facilities, as to assist users to find the information they need.For instance the user can perform simple search operations or utilize predefined classificatory criteria e.g., year, legal basis, subject matter to find relevant to her information needs legal documents.
At the same time, however, the amount of Open Legal Data makes it difficult, both for legal professionals or the citizens to find relevant and useful legal resources.For example, it is extremely difficult to search for a relevant case law, by using boolean queries or the references contained in the judgment.Consider, for example, a patent lawyer who want to find patents as reference case and submits a user query to retrieve information.A diverse result, i.e. a result containing several claims, heterogeneous statutory requirements and conventions -varying in the numbers of inventors and other characteristics-is intuitively more informative than a set of homogeneous results that contain only patents with similar features.In this paper, we propose a novel way to efficiently and effectively handle similar challenges when seeking information in the legal domain.
Diversification is a method of improving user satisfaction by increasing the variety of information shown to user.As a consequence, the number of redundant items in a search result list should decrease, while the likelihood that a user will be satisfied with any of the displayed results should increase.There has been extensive work on query results diversification (see Section 2), where the key idea is to select a small set of results that are sufficiently dissimilar, according to an appropriate similarity metric.
Diversification techniques in legal information systems can be helpful not only for citizens but also for law issuers and other legal stakeholders in companies and large organizations.Having a big picture of diversified results, issuers can choose or properly adapt the legal regime that better fits their firms and capital needs, thus helping them operate more efficiently.In addition, such techniques can also help lawmakers, since deep understanding of legal diversification promotes evolution to better and fairer legal regulations for the society [1].
In this work, we address result diversification in the legal IR.To this end, we adopt various methods from the literature that are introduced for text summarization [LexRank [2] and Biased LexRank [3]], graph-based ranking [DivRank [4] and Grasshopper [5]] and web search result diversification [MMR [6], Max-Sum [7], Max-Min [7] and MonoObjective [7]].We evaluate the performance of the above methods on a legal corpus subjectively annotated with relevance judgments using metrics employed in TREC Diversity Tasks.To the best of our knowledge none of these methods were employed in the context of diversification in legal IR and evaluated using diversity-aware evaluation metrics.
Our findings reveal that i) diversification methods, employed in the context of legal IR, demonstrate notable improvements in terms of enriching search results with otherwise hidden aspects of the legal query space and ii) web search diversification techniques outperform other approaches e.g., summarization-based, graph-based methods, in the context of legal diversification.Furthermore, our accuracy analysis can provide helpful insights for legal IR systems, wishing to balance between reinforcing relevant documents, result set similarity, or sampling the information space around the query, result set diversity.
The remainder of this paper is organized as follows: Section 2 reviews previous work in query result diversification, diversified ranking on graphs and in the field of legal text retrieval, while it stresses out the differentiation and contribution of this work.Section 3 introduces the concepts of search diversification and presents diversification algorithms, while section 4 describes our experimental results and discuss their significance.Finally, we draw our conclusions and future work aspects in Section 5.

Related Work
In this section, we first present related work on query result diversification, afterwards on diversified ranking on graphs and then on legal text retrieval techniques.
Users of (Web) search engines typically employ keyword-based queries to express their information needs.
These queries are often underspecified or ambiguous to some extent [18].Different users who pose exactly the same query may have very different query intents.Simultaneously the documents retrieved by an IR system may reflect superfluous information.Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible.Published literature on search result diversification is reviewed in [19,20].
The maximal marginal relevance criterion (MMR), presented in [6], is one of the earliest works on diversification and aims at maximizing relevance while minimizing similarity to higher ranked documents.Search results are re-ranked as the combination of two metrics, one measuring the similarity among documents and the other the similarity between documents and the query.In [7] a set of diversification axioms is introduced and it is proven that it is not possible for a diversification algorithm to satisfy all of them.Additionally, since there is no single objective function suitable for every application domain, the authors propose three diversification objectives, which we adopt in our work.These objectives differ in the level where the diversity is calculated, e.g., whether it is calculated per separate document or on the average of the currently selected documents.
In another approach, researchers utilized explicit knowledge as to diversify search results.In [21] the authors proposed a diversification framework, where the different aspects of a given query are represented in terms of sub-queries and documents are ranked based on their relevance to each sub-query, while in [22] the authors proposed a diversification objective that tries to maximize the likelihood of finding a relevant document in the top-k positions given the categorical information of the queries and documents.Finally, the work described in [23] organizes user intents in a hierarchical structure and proposes a diversification framework to explicitly leverage the hierarchical intent.
The key difference between these works and the ones utilized in this paper is that we do not rely on external knowledge e.g.taxonomy, query logs to generate diverse results.Queries are rarely known in advance, thus probabilistic methods to compute external information are not only expensive to compute, but also have a specialized domain of applicability.Instead, we evaluate methods that rely only on implicit knowledge of the legal corpus utilized and on computed values, using similarity (relevance) and diversity functions (e.g., tf-idf cosine similarity) in the data domain.

Diversified Ranking on Graphs
Many network-based ranking approaches have been proposed to rank objects according to different criteria [24] and recently diversification of the results has attracted attention.Research is currently focused on two directions: a greedy vertex selection procedure and a vertex reinforced random walk.The greedy vertex selection procedure, at each iteration, selects and removes from the graph the vertex with maximum random walk based ranking score.One of the earlier algorithms that address diversified ranking on graphs by vertex selection with absorbing random walks is Grasshopper [5].A diversity-focused ranking methodology, based on reinforced random walks, was introduced in [4].Their proposed model, DivRank, incorporates the rich-gets-richer mechanism to PageRank [25] with reinforcements on transition probabilities between vertices.We utilize these approaches in our diversification framework considering the connectivity matrix of the citation network between documents that are relevant for a given user query.

Legal Text Retrieval
In respect to legal text retrieval that traditionally relies on external knowledge sources, such as thesauri and classification schemes, various techniques are presented in [26].Several supervised learning methods have been proposed to classify sources of law according to legal concepts [27][28][29].Ontologies and thesaurus have been employed to facilitate information retrieval [30][31][32][33] or to enable the interchange of knowledge between existing legal knowledge systems [34].Legal document summarization [35][36][37] has been used as a way to make the content of the legal documents, notably cases, more easily accessible.We also utilize state of the art summarizations algorithms but under a different objective: we aim to maximize diversity of the result set for a given query.
Finally, a similar approach with our work is described in [38], where the authors utilize information retrieval approaches to determine which sections within a bill tend to be outliers.However, our work differs in a sense that we maximize the diversify of the result set, rather than detect section outliers within a specific bill.
In another line of work citation analysis has been used in the field of law to construct case law citation networks [39]  6 .Case law citation networks contain valuable information, capable of measuring legal authority [40], identifying authoritative precedent 7 [41], evaluating the relevance of court decisions [42] or even assisting summarizing legal cases [43], thus showing the effectiveness of citation analysis in the Case law domain.While the American legal system has been the one that has undergone the widest series of studies in this direction, recently various researchers applied network analysis in the Civil law domain as well.The authors of [44] propose a network-based approach to model the law.Network analysis techniques where also employed in [45] demonstrating an online toolkit allowing legal scholars to apply Network analysis and visual techniques to the entire corpus of EU case law.In this work, we also utilize citation analysis techniques and construct the Legislation Network, as to cover a wide range of possible aspects of a query.

Legal Document ranking using diversification
At first, we define the problem addressed in this paper and provide an overview of the diversification process.Afterwards, legal document's features relevant for our work are introduced and distance functions are defined.Finally, we describe the diversification algorithms used in this work.

Diversification Overview
Result diversification is a trade-off between finding relevant to the user query documents and diverse documents in the result set.Given a set of legal documents and a query, our aim is to find a set of relevant and representative documents and to select these documents in such a way that the diversity of the set is maximized.More specifically, the problem is formalized as follows: Definition 1 (Legal document diversification).Let q be a user query and N a set of documents relevant to the user query.Find a subset S ⊆ N of documents that maximize an objective function f that quantifies the diversity of documents in S.
Figure 1, illustrates the overall workflow of the diversification process.At the highest level, the user submits his/ her query as a way to express an information need and receives relevant documents.6 case documents usually cite previous cases, which in turn may have cited other cases and thus a network is formed over time with these citations between cases.7 legal norm inherited from English common law that encourages judges to follow precedent by letting the past decision stand.

Figure 1. Diversification Overview
From the relevance-oriented ranking of documents we derive a diversity-oriented ranking, produced by seeking to achieve both coverage and novelty at the same time.Significant components of the process include: • Ranking Features, features of legal documents that will be used in the ranking process.
• Distance Measures, functions to measure the similarity between two legal documents and the relevance of a query to a given document.• Diversification Heuristics, heuristics to produce a subset of diverse results.

Ranking Features/ Distance Measures
Typically, diversification techniques measure diversity in terms of content, where textual similarity between items is used in order to quantify information similarity.In the Vector Space model [46], each document u can be represented as a term vector U = (is w1u , is w2u , ..., is wmu ) T , where w 1 , w 2 , ..., w m are all the available terms, and is can be any popular indexing schema e.g.t f , t f − id f , logt f − id f .Queries are represented in the same manner as documents.
Following we define: • Document Similarity.Various well-known functions from the literature (e.g.Jaccard, cosine similarity etc.) can be employed at computing the similarity of legal documents.In this work, we choose cosine similarity as a similarity measure, thus the similarity between documents u and v, with term vectors U and V is: • Document Distance.The distance of two documents is • Query Document Similarity.The relevance of a query q to a given document u can be assigned as the initial ranking score obtained from the IR system, or calculated using the similarity measure e.g.cosine similarity on the corresponding term vectors r(q, u) = cos(q, u) Peer-reviewed version available at Algorithms 2017, 10, 22; doi:10.3390/a10010022

Diversification Heuristics
Diversification methods usually retrieve a set of documents based on their relevance scores, and then re-rank the documents so that the top-ranked documents are diversified to cover more query subtopics.Since the problem of finding an optimum set of diversified documents is NP-hard, a greedy algorithm is often used to iteratively select the diversified set S.
Let N the document set, u, v ∈ N, r(q, u) the relevance of u to the query q, d(u, v) the distance of u and v, S ⊆ N with |S| = k the number of documents to be collected and λ ∈ [0..1] a parameter used for setting trade-off between relevance and similarity.In this paper, we focus on the following representative diversification methods: • MMR: Maximal Marginal Relevance [6], a greedy method to combine query relevance and information novelty, iteratively constructs the result set S by selecting documents that maximizes the following objective function Algorithm 1 Produce diverse set of results with MMR initialize with the highest relevant to the query document Set S = S ∪ {i} iteratively select document that maximize Eq. 5 Set S = S ∪ {u} Set T = T \ {u} end while MMR incrementally computes the standard relevance-ranked list when the parameter λ = 0, and computes a maximal diversity ranking among the documents in N when λ = 1.For intermediate values of λ ∈ [0..1], a linear combination of both criteria is optimized.In MMR Algorithm 1, the set S is initialized with the document that has the highest relevance to the query.Since the selection of the first element has a high impact on the quality of the result, MMR often fails to achieve optimum results.
• MaxSum: The Max-sum diversification objective function [7] aims at maximizing the sum of the relevance and diversity in the final result set.This is achieved by a greedy approximation, Algorithm 2, that selects a pair of documents that maximizes Eq. 6 in each iteration.
where (u, v) is a pair of documents, since this objective considers document pairs for insertion.
When |S| is odd, in the final phase of the algorithm an arbitrary element in N is chosen to be inserted in the result set S.
MaxSum Algorithm 2, at each step, examines the pairwise distances of the candidate items N and selects the pair with the maximum pairwise distance, to insert into the set of diverse items S.
• MaxMin: The Max-Min diversification objective function [7] aims at maximizing the minimum relevance and dissimilarity of the selected set.This is achieved by a greedy approximation, Select pair of docs that maximize Eq 6 Set S = S ∪ {u, v} Algorithm 3, that initially selects a pair of documents that maximize Eq. 7 and then in each iteration selects the document that maximizes Eq. 8 MaxMin Algorithm 3, at each step, it finds, for each candidate document its closest document belonging to S and calculates their pairwise distance d MI N .The candidate document that has the maximum distance d MI N is inserted into S.

Algorithm 3 Produce diverse set of results with MaxMin
Input: Set of candidate results N, size of diverse set k Output: initially selects documents that maximize Eq. 7 Set S = S ∪ {u, v} while |S| < k do Find u = argmax x∈N\S ( f MAXMI N (x, q)) select document that maximize Eq. 8 Set S = S ∪ {u} end while • MonoObjective: MonoObjective [7] combines the relevance and the similarity values into a single value for each document.It is defined as: Algorithm 4 approximates the Mono-Objective.The algorithm, at initialization step, calculates a distance score for each candidate document.The objective function weights each document's similarity to the query with the average distance of the document with the rest documents.After the initialization step, where scores are calculated, they are not updated after each iteration of the algorithm.So, each step consists in selecting the document from the remaining candidates set with the maximum score and inserting it into S. • LexRank: LexRank [2], is a stochastic graph-based method for computing the relative importance of textual units.A document is represented as a network of inter-related sentences, and a connectivity matrix based on intra-sentence similarity is used as the adjacency matrix of the graph representation of sentences.
Calculate scores based on Eq. 9 end for while In our setting, instead of sentences, we use documents that are in the initial retrieval set N for a given query.In this way, instead of building a graph using the similarity relationships among the sentences based on an input document, we utilize document similarity on the result set.If we consider documents as nodes, the result set document collection can be modeled as a graph by generating links between documents based on their similarity score as in Eq. 2. Typically, low values in this matrix can be eliminated by defining a threshold so that only significantly similar documents are connected to each other.But as in all discretization operations, this means an information loss.Instead we choose to utilize the strength of the similarity links.This way we use the cosine values directly to construct the similarity graph, obtaining a much denser but weighted graph.Furthermore we normalize our adjacency matrix B, as to make the sum of each row equal to 1.
Thus, in LexRank scoring formula Eq. 10, Matrix B captures pairwise similarities of the documents and square matrix A, which represents the probability of jumping to a random node in the graph, has all elements set to 1/M, where M is the number of documents.
The LexRank Algorithm 5 applies a variation of PageRank [25] over a document graph.A random walker on this Markov chain chooses one of the adjacent states of the current state with probability 1 − λ, or jumps to any state in the graph, including the current state, with probability λ.Note that we interchanged 1 − λ and λ interpolation parameters in the original LexRank formula [2], as to acquire comparable results across all tested algorithms.

Algorithm 5 Produce diverse set of results with LexRank
Algorithm 5 is also used to produce a diversity oriented ranking of results with the Biased LexRank method.In Biased LexRank scoring formula Eq. 11, we set Matrix B as the connectivity matrix based on document similarity for all documents that are in the initial retrieval set N for a given query and Matrix A elements proportional to the query document relevance.• DivRank: DivRank [4] balances popularity and diversity in ranking, based on a time-variant random walk.In contrast to PageRank [25] which is based on stationary probabilities, DivRank assumes that transition probabilities change over time, they are reinforced by the number of previous visits to the target vertex.If p T (u, v) is the transition probability from any vertex u to vertex v at time T, p * (d j ) is the prior distribution that determines the preference of visiting vertex d j , and p 0 (u, v) is the transition probability from u to v prior to any reinforcement then, where N T (d j ) is the number of times the walk has visited d j up to time T and, DivRank was originally proposed in a query independent context, thus it is not directly applicable to diversification of search results.We introduce a query dependent prior and thus utilize DivRank into a query dependent ranking schema.In our setting, we use documents that are in the initial retrieval set N for a given query q, create the citation network between those documents and apply DivRank algorithm to select top-k divers documents in S.

Algorithm 6 Produce diverse set of results with DivRank
Input: Set of candidate results N, size of diverse set k Output: connectivity matrix is based on citation network adjacency matrix end for p = f powermethod (B) Calculate stationary distribution of Eq. 12. (Omitted for clarity) • Grasshopper: A similar with DivRank ranking algorithm, is described in [5].This model starts with a regular time-homogeneous random walk and in each step the vertex with the highest weight is set as an absorbing state.where N T (d j ) is the number of times the walk has visited d j up to time T and, Since Grasshopper and DivRank utilize a similar approach and will ultimately present rather similar results we utilized Grasshopper distinctively from DivRank.In particularly, instead of creating the citation network of documents belonging to the initial result set, we form the adjacency matrix based on document similarity, as previously explained in LexRank Algorithm 5.

Experimental Setup
In this section, we describe the legal corpus we use, the set of query topics and the respective methodology for subjectively annotating with relevance judgments for each query, as well as the metrics employed for the evaluation assessment.Finally, we provide the results along with a short discussion.

Legal Corpus
Our corpus contains 3.890 Australian legal cases from the Federal Court of Australia8 .The cases were originally downloaded from AustLII9 and were used in [47] to experiment with automatic summarization and citation analysis.The legal corpus contains all cases from the Federal Court of Australia spanning from 2006 up to 2009.From the cases, we extracted all needed text and citation links for our diversification framework.Our index was built using standard stop word removal and porter stemming, with log based t f − id f indexing technique, resulting in a total of 3.890 documents, 9.782.911terms and 53.791 unique terms.
Table 1 summarizes testing parameters and their corresponding ranges.To obtain the candidate set N, for each query sample we keep the top − n elements using cosine similarity and a log based t f − id f indexing schema.Our experimental studies are performed in a two-fold strategy: i) qualitative analysis in terms of diversification and precision of each employed method with respect to the optimal result set and ii) scalability analysis of diversification methods when increasing the query parameters.

Evaluation Metrics
As the authors of [48] claim that "there is no evaluation metric that seems to be universally accepted as the best for measuring the performance of algorithms that aim to obtain diverse rankings", we have chosen to evaluate diversification methods using various metrics employed in TREC Diversity Tasks 10 .In particular, we report: • a-nDCG: a-Normalized Discounted Cumulative Gain [49] metric quantifies the amount of unique aspects of the query q that are covered by the top − k ranked documents.We use a = 0.5, as typical in TREC evaluation.• ERR-IA: Expected Reciprocal Rank -Intent Aware [50] is based on inter-dependent ranking.
The contribution of each document is based on the relevance of documents ranked above it.The discount function is therefore not just dependent on the rank but also on the relevance of previously ranked documents.• S-Recall: Subtopic-Recall [51] is the number of unique aspects covered by the top − k results, divided by the total number of aspect.It measures the aspect coverage for a given result list at depth k.

Relevance Judjements
Evaluation of diversification requires a data corpus, a set of query topics and a set of relevance judgments, preferably assessed by domain experts for each query.One of the difficulties in evaluating methods designed to introduce diversity in the legal document ranking process is the lack of standard testing data.While TREC added a diversity task to the Web track in 2009, this dataset was designed assuming a general web search, and so it not possible to adapt it to our setting.Having only the document corpus, we need to define (a) the query topics, (b) a method to derive the subtopics for each topic, and, (c) a method to subjectively annotate the corpus for each topic.In the absence of a standard dataset specifically tailored for this purpose, we looked for an subjective way to evaluate and assess the performances of various diversification methods on our corpus. 11o this end, we have employed an subjective way to annotate our corpus with relevance judgments for each query: User Profiles/ Queries.We used the West Law Digest Topics 12 as candidate user queries.In other words, each topic was issued as candidate query to our retrieval system.Outlier queries, whether too specific/rare or too general, where removed using the interquartile range, below or above values Q1 and Q3, sequentially in terms of number of hits in the result set and score distribution for the hits, demanding in parallel a minimum cover of min|N| results.In total, we kept 289 queries.Table 2 provides a sample of the topics we further consider as user queries.
Query assessments and ground-truth.For each topic/ query we kept the top − n results.An LDA [52] topic model, using an open source implementation 13 , was trained on the top − n results for each query.Topic modeling gives us a way to infer the latent structure behind a collection of documents.Based on the resulting topic distribution, with an acceptance threshold of 20%, we infer whether a document is relevant for an topic/ aspect.Thus, using LDA we create our ground-truth data consisting aspect assessments for each query.
We have made available our complete dataset, ground-truth data, queries and relevance assessments in standard qrel format, as to enhance collaboration and contribution in respect to diversification issues in legal IR. 14 .

Results
As a baseline to compare diversification methods, we consider the simple ranking produced by cosine similarity and log based t f − id f indexing schema.For each query, our initial set N contains the top − n query results.The interpolation parameter λ ∈ [0..1] is tuned in 0.1 steps separately for each method.We present the evaluation results for the methods employed, using the aforementioned evaluation metrics, at cut-off values of 5, 10, 20 and 30, as typical in TREC evaluations.Results are presented with fixed parameter n = |N|.Note that each of the diversification variations, is applied in combination with each of the diversification algorithms and for each user query.Figure 2 shows the a-Normalized Discounted Cumulative Gain (a-nDCG) of each method for different values of λ.Interestingly, web search result diversification methods (MMR, MaxSum, MaxMin and Mono) outperformed the baseline ranking, while text summarization methods (LexRank, Biased LexRank and GrassHopper, as it was utilized without a network citation graph) failed to improve the baseline ranking performing lower than the baseline ranking at all levels across all metrics.Graph-based methods (DivRank) results vary across the different values of λ.We attribute this finding to the extreme sparse network of citations since our dataset covers a short time period (3 years).
The trending behavior of MMR, MaxMin, and MaxSum is very similar especially at levels @10, and @20, while at level @5 MaxMin and MaxSum presented nearly identical a-nDCG values in many λ values (e.g., 0.1, 0.2, 0.4, 0.6, 0.7).Finally, MMR constantly achieves better results in respect to the rest methods, following by MaxMin and MaxSum.MONO despite the fact that performs better than the baseline in all λ values, still always presents the lower performance when compared to MMR, MaxMin, and MaxSum.It is clear that web search result diversification approaches (MMR, MaxSum, MaxMin and Mono) tend to perform better than the selected baseline ranking method.Moreover, as λ increases, preference to diversity as well as a-nDCG accuracy increases for all tested methods.
Figure 3 depicts the normalised Expected Reciprocal Rank -Intent Aware (nERR-IA) plots for each method in respect to different values of λ.It is clear that web search result diversification approaches (MMR, MaxSum, MaxMin and Mono) tend to perform better than the selected baseline ranking method.Moreover, as λ increases, preference to diversity as well as nERR-IA accuracy increases for all tested methods.Text summarization methods (LexRank, Biased LexRank) and GrassHopper, once again failed to improve the baseline ranking at all levels across all metrics, while as in a-nDCG plots DivRank results vary across the different values of λ.MMR constantly achieves better results in respect to the rest methods.We also observed that MaxMin tends to perform better than MaxSum.There were few cases where both methods presented nearly similar performance especially in lower recall levels (e.g., for nERR-IA@5 when λ equals to 0.1, 0.4, 0.6, 0.7).Once again, MONO presents the lower performance when compared to MMR, MaxMin, and MaxSum for nERR-IA metric for all λ values applied.
Figure 4 shows the Subtopic-Recall at various levels @5, @10, @20, @30 of each method for different values of λ.It is clear the web search result diversification methods (MMR, MaxSum, MaxMin and Mono) tend to perform better than the baseline ranking.As λ increases, preference to diversity increases for all methods except MMR.Subtopic-Recall accuracy of all methods, except MMR, increases when increasing λ.For lower levels (e.g., @5, @10) MMR clearly outperforms other methods, while for upper levels (e.g., @20, @30) MMR and MAXMIN scores are comparable.We also observe that MAXMIN tends to perform better than MAXSUM, which in turn constantly achieves better results than MONO.Finally, LexRank, Biased LexRank and GrassHopper approaches fail to improve the baseline ranking at all levels across all metrics.Overall, we noticed a similar trending behavior with the ones discussed for Figure 2 and Figure 3.
In summary, among all the results, we note that the trends in the graphs look very similar.Clearly enough, the utilized web search diversification methods (MMR, MAXSUM, MAXMIN, MONO) statistically significantly 15 outperform the baseline method, offering legislation stakeholders broader insights in respect to their information needs.Furthermore, trends across the evaluation metric graphs, highlight balance boundaries for legal IR systems between reinforcing relevant documents or sampling the information space around the legal query.
Table 3 summarizes average results of the diversification methods.Statistically significant values, using the paired two-sided t-test with p value < 0.05 are denoted with • and with p value < 0.01 with * . 15Statistical significance with the paired two-sided t-test (p − value < 0.05 and p value < 0.01) Method @5 @10 @20 @30 @5 @10 @20 @30 @5 @10 @20 @30 λ = 0.  Method @5 @10 @20 @30 @5 @10 @20 @30 @5 @10 @20 @30 λ = 0.The effectiveness of diversification methods is also depicted in Table 4 which illustrates the result sets for three example queries, using our case law dataset (|S| = 30 and N = 100) with λ = 0 (no diversification), λ = 0.1 (light diversification), λ = 0.5 (moderate diversification) and λ = 0.9 (high diversification).Only MMR results are shown since, in almost all variations, it outperforms other approaches.Due to space limitations, we show the case title for each entry hyper linked to the full text for that entry.When λ = 0 the result set contains the top-5 elements of S ranked with the sim scoring function.The result sets with no diversification contain several almost duplicate elements, defined by terms in the case title.As λ increases, less "duplicates" are found in the result set, and the elements in the result set "cover" many more subjects again as defined by terms in the case title.We note that the result set with high diversification contains elements that have almost all of the query terms, as well as other terms indicating that the case is related to different subjects among the other cases in the result set.Table 4. Result sets (document titles for three example queries, using the dataset (|S| = 30 and N = 100) with λ = 0 (no diversification), λ = 0.1 (light diversification), λ = 0.5 (moderate diversification) and λ = 0.9 (high diversification)

Conclusions
In this paper, we studied the problem of of diversifying results in legal documents.We adopted and compared the performance of several state of the art methods from the web search, network analysis and text summarization domains as to handle the problems challenges.We evaluated all the methods using a real data set from the Common Law domain that we subjectively annotated with relevance judgments for this purpose.Our findings reveal that diversification methods offer notable improvements and enrich search results around the legal query space.In parallel, we demonstrated that that web search diversification techniques outperform other approaches e.g., summarization-based, graph-based methods, in the context of legal diversification.Finally, we provide valuable insights for legislation stakeholders though diversification, as well as by offering balance boundaries between reinforcing relevant documents or information space sampling around legal queries.
A challenge we faced in this work was the lack of ground-truth.We hope on an increase of the size of truth-labeled data set in the future, which would enable us to draw further conclusions about the diversification techniques.To this end, our complete dataset is publicly available in open and editable format, along with ground-truth data, queries and relevance assessments.
In future work, we plan to further study the interaction of relevance and redundancy, in historical legal queries.While access to legislation generally retrieves the current legislation on a topic, point-in-time legislation systems address a different problem, namely that lawyers, judges and anyone else considering the legal implications of past events need to know what the legislation stated at some point in the past when a transaction occurred, or events occurred which have led to a dispute and perhaps to litigation [53].

23 November 2016 doi:10.20944/preprints201611.0116.v1
Peer-reviewed version available at Algorithms 2017, 10, 22; doi:10.3390/a10010022Biased LexRank scoring formula Eq. 11, is analogous to LexRank scoring formula Eq. 10, with matrix A, which represents the probability of jumping to a random node in the graph, proportional to the query document relevance.
[3] {u}Set N = N \ {u} end while• Biased LexRank: Biased LexRank[3]provides for a LexRank extension that takes into account a prior document probability distribution e.g., the relevance of documents to a given query.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted:

Table 1 .
Parameters tested in the experiments

Table 2 .
West Law Digest Topics as user queries

Table 3 .
Retrieval Performance of the tested algorithms with interpolation parameter λ ∈ [0..1] tuned in 0.1 steps for N = 100 and k = 30.Highest scores are shown in bold.Statistically significant values, using the paired two-sided t-test with p value < 0.05 are denoted with • and with p value < 0.01 with *

Table 3 -
Continued from previous page

Table 3 -
Continued from previous page

Table 4 -
Continued from previous page