Construction Disputes and Associated Contractual Knowledge Discovery Using Unstructured Text-Heavy Data: Legal Cases in the United Kingdom

: Construction disputes are one of the main challenges to successful construction projects. Most construction parties experience claims—and even worse, disputes—which are costly and time-consuming to resolve. Lessons learned from past failure cases can help reduce potential future risk factors that likely lead to disputes. In particular, case law, which has been accumulated from the past, is valuable information, providing useful insights to prepare for future disputes. However, few efforts have been made to discover legal knowledge using a large scale of case laws in the construction ﬁeld. The aim of this paper is to enhance understanding of the multifaceted legal issues surrounding construction adjudication using large amounts of accumulated construction legal cases. This goal is achieved by exploring dispute-related contract terms and conditions that affect judicial decisions based on their verdicts. This study builds on text mining methods to examine what type of contract conditions are frequently referenced in the ﬁnal decision of each dispute. Various text mining techniques are leveraged for knowledge discovery (i.e., analyzing frequent terms, discovering pairwise correlations, and identifying potential topics) in text-heavy data. The ﬁndings show that (1) similar patterns of disputes have occurred repeatedly in construction-related legal cases and (2) the discovered dispute topics indicate that mutually agreed upon contract terms and conditions are import in dispute resolution. Contributions: Conceptualization, J.-S.Y., curation methodology, J.L.; writing—original draft preparation, J.L.; writing—review and editing, Y.H. and J.-S.Y., supervision,


Introduction
The construction industry has witnessed an increase in the number of litigation cases arising from disputes over the last few years [1][2][3][4]. According to the report by the National Bureau of Standards [5], 30% of construction firms were involved in at least one dispute over a 12-month period. If a construction dispute cannot be resolved through negotiation and compromise between the parties, it becomes necessary for an impartial third party to resolve the disagreement [6,7]. Litigation and arbitration are common dispute resolution procedures, but they are costly and time-consuming [8]. Recent reports including [9] show that an average of $42.4 million has been spent on each dispute resolution in the global construction market. This demonstrates that construction disputes are one of the main factors that prevent construction projects from being completed successfully [1]. To improve sustainability in construction, undesirable economic and social impacts of construction disputes should be avoided, taking into account uncertainty.
Many studies have found that one of the notable reasons for construction disputes is contractual issues that could have been avoided [10][11][12][13]. In a recent report by Arcadis [9], poor contract administration was identified as one of the main causes of construction disputes. Since contractual facts stated in contracts are binding agreements between the parties and are important pieces of evidence for legal judgments [14], clearly expressed contract conditions are essential to prevent conflicts of a contractual interpretation. Moreover, as the conditions of construction contracts have become increasingly complex and sophisticated [10], some parties often have hidden provisions in contracts that can be beneficial to them [3]. Hughes [15] pointed out that some owners tend to maliciously modify or even remove the terms of standard conditions, thereby shifting the risks to contractors. In practice, disputes caused by materially adverse contracts are often found in litigation [11]. Therefore, avoiding construction disputes requires understanding of the contractual terms and relationships that exist on a construction project [12].
Given the nature of the experience-oriented construction industry, knowledge and judgments of past experiences have been considered key factors in preventing and resolving disputes and associated problems that may occur (e.g., delays, cost overruns) [16]. In particular, accumulated case law is valuable information that provides useful lessons for future disputes. Case law, which plays a critical role in legal reasoning and decision making, is the corpus of decisions on cases that judges have made [17]. As every legal judgement is made on the basis of contractual evidence [18], contract conditions that are cited in the legal judgement can be useful to understand potential risk factors. In other words, frequently appearing contract conditions in legal cases can be critical contract terms that should be carefully reviewed at the contract level because courts seem to rely on contract interpretation whenever disputes result in litigation [18,19]. Therefore, by examining construction legal cases focusing on contract terms and provisions, there is the potential to obtain meaningful contractual knowledge that can provide a better understanding of avoiding uncertainties, reducing future disputes.
Prior works on construction legal cases including [18,[20][21][22] have mainly focused on manual content analysis of construction cases and review of specific reports that identified the disputes. Since large-scale manual analysis of text and content is time-consuming, error prone, and subject to data bias, little research exists on exploring large amounts of case laws. Moreover, the case base is comprised of a large number of cases and grows every year, thus the legal professional faces difficulties in retrieving and interpreting information from the case base [17]. Advancements in text analysis technologies have made possible the discovery of previously unknown but potentially useful knowledge in large amounts of unstructured text data [23]. For example, text mining-based reviews of research articles have been performed in the construction domain [23,24] to identify research trends of specific issues from unstructured or semi-structured textual data. A major benefit of using text mining lies in the possibilities of discovering document topics and detailed properties and relationships from a massive volume of text data. Given the nature of construction projects, which have different distinctions from project to project and cannot be replicated so that the same tasks and resources provide the same results, it is noted that as more cases are analyzed, more diverse and meaningful knowledge can be obtained.
This study aims to provide a better understanding of the multifaceted legal issues surrounding construction disputes through a bottom-up investigation starting from the legal cases. To achieve the goal, we explore knowledge about dispute-related contract terms and commonly occurring patterns that affect judicial decisions based on their verdicts by automatically identifying legal arguments, properties, and relationships in accumulated legal cases from a long period of time. This study builds on text analytics to identify the types of disputes in large amounts of legal cases without examining hundreds of cases manually. Given that construction practitioners may need to understand the dynamics of complex construction law in order to deal with construction conflicts and disputes, text analytics and natural language processing (NLP) can play a significant role in this field. Therefore, such text-based automatic analysis will provide a powerful and scalable approach and thus expand accessibility in legal information, particularly to those with little knowledge of contract management and legal information. This study aims to facilitate construction practitioners in actively engaging in understanding construction legal issues using text analytics.
The collected legal dispute cases were categorized by contractual topics, based on the text topic modeling method, latent Dirichlet allocation (LDA). The results of this study are expected to contribute to expanding a body of knowledge of sustainability in construction by providing (1) interpretable and visualized outcomes that can help to understand construction dispute issues while minimizing human effort; (2) accessibility of legal information and efficient use of such knowledge for contract management; and (3) advanced understanding of contractual knowledge that helps to avoid uncertainties and thus prevent future disputes proactively. The paper begins by providing a review of previous studies on construction disputes and knowledge discovery methods and discusses research gaps to be filled in this study. Then, the proposed framework is explained in detail. Finally, construction disputes and related contractual issues are examined based on the proposed framework and the experimental results are discussed.

Literature Review
Construction stakeholders are frequently faced with a variety of conflicts and disputes during a project. There are differences between conflicts and disputes; conflicts can be managed among construction parties, but disputes, resulting from conflicts, are associated with distinct justiciable issues and require third party interventions such as courts and arbiters [20,25].
Given the nature of the experience-driven construction project, learning from previous experiences and failures has been considered valuable; however, once a project is completed, the associated experiences are likely inaccessible, and the learning opportunity to the wider community wasted, only remaining as tacit or experiential knowledge [26]. Previous studies have put forth models and examined factors to enhance the understanding of what causes construction disputes. For example, Semple et al. [12] pointed out that if one party feels that the contractual obligations or expectations have not been met, and if there is disagreement between the parties, construction disputes occur. They presented the results of a pilot study of 24 construction claims in order to identify common causes of claims and problem areas within the construction process. In addition, Mitkus and Mitkus [27] investigated the causes of conflicts arising between clients and contractors and found that a contract allowing room for different interpretations by the parties is one of the main causes of conflicts in construction projects. Abdul-Malak and Senan [28] investigated factors and trends that have been governing the implementation of adjudication in construction practice. However, the authors explored a total of 28,000 references manually, thus reducing the opportunity for knowledge reuse. Information generated in construction projects is usually unstructured and text-heavy, which is disadvantageous in terms of data utilization efficiency compared to structured numeric data. Despite the importance of text data in construction projects, transferring tacit and experiential knowledge from construction documents to practitioners has been limited, thereby reducing knowledge reuse and sharing.
With the advancement of unstructured text data analytics, recent studies have attempted to analyze text information from vast amounts of documents using text mining techniques. As part of these efforts, Al Qady and Kandil [29] presented a knowledge discovery that extracts contractual knowledge in contract documents. The proposed methodology facilitates quick access and efficient use of such knowledge for project management and contract administration. In construction dispute management, historical dispute cases have been retrieved in order to easily obtain relevant information from unstructured text documents. For example, Liu et al. [16] proposed a dispute settlement model for international construction projects based on case-based reasoning (CBR), mining successful experiences in historical dispute settlement cases. Fan and Li [30] applied text mining to find similar construction accidents occurring in the past for resolving disputes through alternative dispute resolution. Jallan et al. [31] also built on natural language processing and text mining to identify frequently appearing patterns and themes associated with construction-defect litigation. Previous studies have indicated that discovering useful knowledge hidden in construction documents or historical cases is valuable for achieving a successful project. While several studies including Abdul-Malak and Senan [28] have shown the feasibility of knowledge discovery using historical dispute data (i.e., legal cases), a detailed analysis of dispute-related contract clauses that govern litigation outcomes has not been fully investigated with large corpora. Being aware of potential contractual risks ahead will help construction practitioners when drafting contracts, but the large scale of legal data has made it difficult to explore commonly occurring patterns and themes that affect judicial decisions from the legal cases. Given that having solutions for construction disputes and litigation begins with an understanding of critical contract terms and clauses, analysis of a large number of legal cases focusing on contract terms and provisions reveals meaningful contractual tacit knowledge to prevent future disputes.

Research Method: Construction Disputes and Related Contractual Knowledge Discovery
This study presents a data-driven framework to identify the causes of construction legal disputes, specifically focusing on contractual issues, with the aim of providing knowledge-inspired dispute causes related to contract issues using unstructured and text-heavy data. Since a case study is used to explain, describe, or explore events or phenomena in the everyday contexts in which they occur [32], this case study explores multifaceted and complex legal issues in construction using UK legal cases. According to Arcadis [9], the UK continues to experience most disputes being resolved after they have crystallized, rather than parties seeking to avoid or mitigate a potential dispute situation. Thus, investigating the legal cases of UK construction industry as a case study will provide a better understanding and possible solutions to dealing with complex legal issues. Figure 1 illustrates the research process used to address the research goal and R programming language with packages such as Tidytext and LDAvis has been used to implement the entire text data processing. find similar construction accidents occurring in the past for resolving disputes through alternative dispute resolution. Jallan et al. [31] also built on natural language processing and text mining to identify frequently appearing patterns and themes associated with construction-defect litigation. Previous studies have indicated that discovering useful knowledge hidden in construction documents or historical cases is valuable for achieving a successful project. While several studies including Abdul-Malak and Senan [28] have shown the feasibility of knowledge discovery using historical dispute data (i.e., legal cases), a detailed analysis of dispute-related contract clauses that govern litigation outcomes has not been fully investigated with large corpora. Being aware of potential contractual risks ahead will help construction practitioners when drafting contracts, but the large scale of legal data has made it difficult to explore commonly occurring patterns and themes that affect judicial decisions from the legal cases. Given that having solutions for construction disputes and litigation begins with an understanding of critical contract terms and clauses, analysis of a large number of legal cases focusing on contract terms and provisions reveals meaningful contractual tacit knowledge to prevent future disputes.

Research Method: Construction Disputes and Related Contractual Knowledge Discovery
This study presents a data-driven framework to identify the causes of construction legal disputes, specifically focusing on contractual issues, with the aim of providing knowledge-inspired dispute causes related to contract issues using unstructured and textheavy data. Since a case study is used to explain, describe, or explore events or phenomena in the everyday contexts in which they occur [32], this case study explores multifaceted and complex legal issues in construction using UK legal cases. According to Arcadis [9], the UK continues to experience most disputes being resolved after they have crystallized, rather than parties seeking to avoid or mitigate a potential dispute situation. Thus, investigating the legal cases of UK construction industry as a case study will provide a better understanding and possible solutions to dealing with complex legal issues. Figure 1 illustrates the research process used to address the research goal and R programming language with packages such as Tidytext and LDAvis has been used to implement the entire text data processing.  The first step is pre-processing of collected legal cases. This study used construction adjudication cases as the main data source for text analysis. A total of 624 dispute cases were collected between 1999 to 2019 from the British and Irish Legal Information Institute (BAILII) and an adjudication nomination body of the UK and all legal cases associated with construction contracts were crawled from the webpage [33]. Cases related to other types of contracts (e.g., facility management contracts, maintenance contracts) were excluded from the analysis. Most of the contract forms in the collected data were under Joint Contracts Tribunals (JCT), one of the most common forms of construction contracts used in the UK. On average, the legal cases collected were 10-11 pages in PDF format, and the documents usually consisted of four parts: (1) an introductory section that provides a preamble and a dispute background, (2) the relevant provisions of the contract, (3) issues that arise among parties containing the elements that describe the disputes, and (4) decisions and conclusions of the disputes.
The collected legal dispute cases in the unstructured text format were converted to a tidy data frame format, which is a table with one token (i.e., basic meaningful unit of text) per row. Using tidy data is a powerful way to make handling data more effective, and this is no less true when it comes to dealing with text [34]. A set of text documents is first segmented by sentence unit, and tokenization is performed, which is the process of splitting text into tokens. The initial text format is a linear sequence of words or phrases, which can be segmented into linguistic units by tokenization, making it easier to analyze. Several stop-words, including numbers, punctuation marks, and meaningless symbols are removed through pre-processing. Construction legal cases are a special kind of textual data with many legal terms; thus, some of the recurring legal terms (e.g., plaintiff, jurisdiction, adjudication) that are less important to the inference of dispute causes need to be cleaned to obtain robust results. Text pre-processing is performed to remove unnecessary information from text documents as well as to structure the data format so that the computer can easily understand it.
To better understand semantic relationships and hierarchy of words in legal cases, we built upon the contract domain lexicon [3,35] in the text pre-processing step. The lexicon was developed to identify vocabularies used in the construction contract domain, and it has seven sub-classes (environment, resource, actor, payment, action, process, and right and responsibility) that are used to categorize each term as relating to the contractual use. The lexicon allows for text pre-processing effectively by providing a domain-dependent lexical dictionary. For example, 'change order' and 'variation' can generally be used interchangeably when a mutually agreed scope of work in a construction contract is altered. The developed lexicon puts these two terms into a single class so that they can be within similar topics. Once pre-processing is complete, the collected data are ready for further text analysis. To discover dispute-related knowledge from the text data, frequently used terms and word correlations are discovered. The results of analyzing frequently used terms can provide a general understanding of issues that often arise in legal cases. In order to better understand potential contractual issues (i.e., topics) underlying legal cases, topic modeling is performed based on a document-term matrix whose rows represent the documents in the collection and columns represent terms.

Frequently Occurring Dispute Issues
As the first step of the text analysis of construction legal cases, we investigated the overall issues in construction disputes by analyzing frequently used terms. Examining frequently used terms in a given text dataset provides a general understanding of frequent issues listed in legal documents. Figure 2 shows the most frequently used words, which occurred more than 4000 times in a total of 624 documents from 1999 to 2019. The most frequently occurring word was 'clause', which occurred 11,803 times in the legal cases, followed by 'payment', and 'issue'. Despite only a simple frequency count, the results enable insightful interpretation. For example, a frequently used term such as 'notice' provides a clue to infer how important the obligation of notice is in communication among parties. It shows that the presence or absence of notice in the process of a construction project can be a significant part of the final legal decision of a construction dispute.
parties. It shows that the presence or absence of notice in the process of a construction project can be a significant part of the final legal decision of a construction dispute.

Figure 2. Frequent terms in the construction legal cases.
Although a single term gives straightforward information, additional supplemental words that explain the context in text data need to be studied. The keyword 'payment' is a good example that shows why additional analysis is needed; for example, 'payment due', 'withhold payment', and 'interim payment' each have different meanings, which depend on the combination of 'payment' with other words. To address this, we combined adjacent words together for tokenization, based on which more interpretable results can be obtained [23], in order to discover the relationships among adjacent terms in the given documents. Table 1 shows the top 10 frequent terms by unigrams and bigrams, respectively. As shown in Table 1, the most frequent unigram 'clause' does not appear in the list of the top 10 frequent bigrams, because the adjacent words used with 'clause' are mostly numbers (e.g., clause 1.1, clause 1.2) that were removed in the text pre-processing. 'Extension time', which is the most frequently used bigram in the legal cases (a total of 2383 times in the cases), is a common issue in construction projects.  Although a single term gives straightforward information, additional supplemental words that explain the context in text data need to be studied. The keyword 'payment' is a good example that shows why additional analysis is needed; for example, 'payment due', 'withhold payment', and 'interim payment' each have different meanings, which depend on the combination of 'payment' with other words. To address this, we combined adjacent words together for tokenization, based on which more interpretable results can be obtained [23], in order to discover the relationships among adjacent terms in the given documents. Table 1 shows the top 10 frequent terms by unigrams and bigrams, respectively. As shown in Table 1, the most frequent unigram 'clause' does not appear in the list of the top 10 frequent bigrams, because the adjacent words used with 'clause' are mostly numbers (e.g., clause 1.1, clause 1.2) that were removed in the text pre-processing. 'Extension time', which is the most frequently used bigram in the legal cases (a total of 2383 times in the cases), is a common issue in construction projects. Almost every construction project faces delays resulting in extension of time (EOT), and the terms of the contract bind which of the two contract parties absorbs the responsibility for the delay, in which there is room for interpretation of the contract [36]. Moreover, 'liquidated damages' caused by the extension of time was listed as a frequently occurring bigram in the legal cases, indicating that a construction delay-related issue is one of the biggest dispute issues in construction projects. Another frequent bigram, 'final account', denotes a contractual concept, which is the conclusion of the contract sum (i.e., including all necessary adjustments) and signifies the agreed amount that the employer will pay the contractor on completion of the work written in the contract [37]. Most standard contracts for construction projects include clauses dealing with final accounts in order to impose a consensus on the possible conflicts arising from quality of work, decisions on extension of time, and loss or expense. Therefore, the results of term frequency show how many legal cases there are in relation to the final account issues. However, reviewing frequent terms needs to be interpreted carefully because the number of occurrences of the bigram 'extension time (EOT)' does not necessarily mean that there are 2383 disputes related to EOT. Rather, it can be interpreted that EOT was discussed by the majority of verdicts in legal cases; therefore, further analysis was conducted in the following section.
Discovering terms that are not adjacent but concurrent can also help to give useful insights because several terms, which are key clues to understand the meaning of a sentence, can occur concurrently, but not adjacently in the same sentence [38]. The pairs of words that occur together most often were investigated through pairwise correlation focusing on the phi coefficient, a common measure for binary correlation considering how much more likely it is that either both words A and B appear, or that neither do, than that one appears without the other [34]. Figure 3 illustrates networks of co-occurring words in order to better understand relationships between words. As shown in Figure 3, several words are connected around the term 'payment'. In particular, weak clusters are connected with each other in the networks; as an example, contractual notice-related terms (e.g., notice, written, effective) are clustered with each other and are linked to another cluster such as contract clauserelated terms (e.g., clause, condition, provision, term). This shows that the contractual notice-related terms and the contract clause-related terms frequently occurred together in the legal cases. Given that several notice obligations of contract parties are stated under a JCT building contract, this verifies that one of the parties' obligations, the obligation of notice, should be obeyed in accordance with the contract statement, which is an important factor when a legal decision is made. In addition, the link between the notice-related terms and payment-related terms (e.g., payment, pay, less) shows that the payment notices for construction contracts (e.g., pay less notice) frequently occur in the case of legal disputes.   [39]. Since the matrix is mirrored around its diagonal, correlation values of any two attributes of both sides are   Figure 4 shows another perspective of the observed word correlations. Positive correlations are displayed in blue and negative correlations in red circles. Color intensity and the size of the circle are proportional to the correlation coefficients [39]. Since the matrix is mirrored around its diagonal, correlation values of any two attributes of both sides are the same based on the diagonal. For example, Figure 4 indicates that 'submission' has a strong positive correlation with 'clear' because it has a large blue circle at the intersection between 'submission' and 'clear'. If we look at the correlation in terms of 'clear', strong positive correlation with 'submission' can be observed as well. The strong correlation between 'submission' and 'clear' means that clarity of information should be met when the contractor submits construction documents.   [39]. Since the matrix is mirrored around its diagonal, correlation values of any two attributes of both sides are the same based on the diagonal. For example, Figure 4 indicates that 'submission' has a strong positive correlation with 'clear' because it has a large blue circle at the intersection between 'submission' and 'clear'. If we look at the correlation in terms of 'clear', strong positive correlation with 'submission' can be observed as well. The strong correlation between 'submission' and 'clear' means that clarity of information should be met when the contractor submits construction documents.

Construction Dispute Trends by Period
There have been findings that the economic circumstances surrounding a construction project can affect the number of construction claims and disputes [40,41]. Therefore, analyzing the construction disputes in the context of the construction economic situation is needed for the trend analysis. Considering that disputes may vary under different circumstances, the collected legal cases were grouped based on the construction industry trends in the UK from 1999 to 2019. A total of 624 legal cases were grouped into four different periods based on the growth of the construction market during 1999-2019 [42] as follows: strates that mutually agreed contract terms and conditions play an important role in the process of legal dispute resolution. Understanding which contract conditions appear often in the legal dispute documents, especially in the final legal decisions, has the effect of being able to help prevent future disputes. Therefore, in order to focus on the contractual issues that affect the legal decisions of disputes, we extracted only the "relevant provision of the contract" and "decisions and conclusions of the disputes" from each dispute case. Using the extracted data, we investigated dispute trends by period and identified potential topics in text data. Figure 5 depicts what aspects of disputes have been addressed in four different periods. The results show that similar issues occurred repeatedly in construction legal cases throughout the entire period, showing that there have been similar historical experiences that can be used to solve the present issues. As shown in Figure 5, delay-related disputes such as 'extension time' and 'liquidated damages' are still main issues throughout the entire period. In addition, 'practical completion', which is also referred to as 'substantial completion', appears equally in every group. A construction project is considered to be practically complete when there are no outstanding defects (except for minor items or snagging) and the building or infrastructure can be put to its intended use [43]. Practical completion is an important milestone because it means that the contractor is entitled to the release of any retainage, excluding deductions for uncompleted works, and the contractor is no longer liable for liquidated damages beyond practical completion [44]. Given that almost every legal case collected in this study is under the JCT contract that has no definition of practical completion, it can be understood why the term 'practical completion' has been referenced often as one of the causes of disputes in the collected legal cases. In addition, the fact that 'commissioning program' appeared in the 3rd period (2010 to 2014) implies that commissioning-related issues have increased in construction legal cases as the practice of construction commissioning has matured recently. The period from 2015 to 2019 seems to have slight tendencies compared to other periods, since it has several strong issues associated with 'interim payment' and 'notice'. This shows that disputes caused by failing to issue an adequate and timely interim certificate (i.e., a payment notice) have occurred more frequently recently. Additionally, several common words (e.g., 'project network', 'network model'), which are not shown in Figure 5 but still appear often in the 4th group, indicate that there have been some disputes regarding a project's scheduling model.  Figure 6 illustrates how the major issues (i.e., frequent terms) change over time. As shown in Figure 6, several major issues, such as 'extension time' and 'practical completion', appear steadily over time; however, payment-related issues (e.g., 'interim payment',  Figure 6 illustrates how the major issues (i.e., frequent terms) change over time. As shown in Figure 6, several major issues, such as 'extension time' and 'practical completion', appear steadily over time; however, payment-related issues (e.g., 'interim payment', 'payment due') became more likely to be referenced in the legal decisions. Payment-related contract conditions and clauses became more likely to be cited in the course of legal decisions, which means that payment-related contract conditions should be carefully examined during the contracting phase.

Topics Discovered
To discover the underlying themes or topics in the collected legal cases, text topic modeling, which organizes a collection of documents based on the discovered topics, was conducted. Unlike the term frequency analysis, text topic modeling allows text in a document to be classified to a particular topic so that we can better understand hidden themes from documents. To facilitate text topic modeling, we built on LDA, which is a generative probabilistic model of a corpus and is based on the idea that documents are mixtures of topics, where each topic is represented by a probability distribution over words [45]. Unlike naïve Bayes classifiers, which assume that each document is on a single topic, LDA lets each word be on a different topic. In the process of LDA, there is an assumption that the K latent topics are hidden in the given collection of text documents D, and each topic is represented as a multinomial distribution over all the M words. LDA assumes the following generative process for a corpus D consisting of N documents [45][46][47][48]:

2.
For each word w n,m in the document d: (a) Randomly choose a topic from the distribution over topics: i.e., choose a topic z n,m ∼ Multinominal(θ n ) Randomly choose a word from the corresponding topic (distribution over the vocabulary): i.e., find the corresponding topic distribution ϕ z n,m ∼ Dirichlet(β) and a symmetric parameter β typically is sparse; and (c) Sample a word w n,m ∼ Multinominal ϕ z n,m .
By repeating the above procedures N times, document d can be generated word by word and the whole corpus can be generated. Taking the product of the marginal probabilities of each document, the probability of a whole corpus is estimated: where, F(θ, ϕ) = ∏ M m=1 ∑ z n,m P(z n,m |θ n )P w n,m ϕ z n,m The topic-word distributions ϕ and document-topic distributions θ in Equation (1) could be learned through the Gibbs sampling and expectation-maximization (EM) algorithm via maximizing the probability P(D|α, β) [46]. Previous studies on text topic modeling have shown that if the overall length of the document is too long, it is likely to be problematic for topic models since it includes an arbitrary mixture of too many topics in a document and makes it difficult to identify meaningful topics [49,50]. Therefore, we split the legal cases that have four parts (i.e., introductory section, relevant provisions of the contract, issues arising among parties, and decisions and conclusions of the disputes) into thematically coherent parts, and only the part of the relevant provisions of the contract and the decisions of the disputes were used for the topic modeling to focus on final legal decisions and corresponding contract evidence. For that purpose, additional stop words referring to the contract clauses (e.g., condition, clause, provision) were removed. Figure 7 indicates the discovered topics through text topic modeling. Topic modeling was performed using a unigram document-term matrix whose rows represent the documents in the collection and columns represent unigram terms [51]. The reason why the unigrams, not the bigrams, were applied for the topic modeling analysis is that most bigram words, except for some frequent words, were too sparse to get meaningful results in the topic modeling analysis. A total of seven topics were finally obtained using the harmonic mean method to determine the optimal number of topics. The scalar value of the Dirichlet hyper-parameter for topic proportions, alpha, was set at 0.02 and the number of iterations was 2000 in our study. Several terms such as 'payment' were assigned more than one group based on their pairwise correlations. For example, 'payment' showed a high correlation coefficient with the terms 'amount' and 'total', which are part of Topic 7. At the same time, 'payment' had high correlation with 'notice' and 'date', which are part of Topic 6. Each topic has been labeled as shown in Table 2 and the seven key aspects of construction disputes are discussed in detail in the following subsection.

Topic 1: Letter of Intent (LOI)
Topic 1 has been labeled as a 'letter of intent (LOI)' based on the most salient terms in this group. Topic 1 consists of terms such as 'letter', 'evidence', 'intent', and 'signed'. It indicates that the legal dispute cases are associated with issues of the letter of intent. A letter of intent is a document expressing the employer's intention to enter into a formal written contract for works described in the letter, and asking the contractor to begin those works before the formal contract is executed [52]. The term 'letter of intent' itself has no legal meaning, but the legal effect of a letter of intent may differ depending on the particular meaning and purpose defined in each project. Therefore, disputes can arise if contractors are engaged in projects and on site carrying out work in advance of finalizing the formal contract. Topic 1 shows several past disputes related to letters of intent in construction projects, and it demonstrates that contract parties should be careful to make sure that a letter of intent was clearly written and defined the scope and purpose of whether or not it imposed a binding obligation on the parties. high correlation coefficient with the terms 'amount' and 'total', which are part of Topic 7. At the same time, 'payment' had high correlation with 'notice' and 'date', which are part of Topic 6. Each topic has been labeled as shown in Table 2 and the seven key aspects of construction disputes are discussed in detail in the following subsection.    As shown in Figure 7, the most frequent terms of this group are 'loss', 'cost', 'expense', 'variation', 'design', and 'change'. These terms indicate that one of the main issues in the collected legal cases is linked to variation issues and corresponding loss and expense. A variation is a deviation from an agreed upon, well-defined original scope of works in a construction contract and variations are common in all types of construction projects [13]. Variations or change orders are likely to incur construction rework or extra work and they cause issues associated with cost and time, which can lead to conflicts and disputes between parties. Under a JCT standard building contract, the contractor must carry out instructions requiring variations unless it makes a reasonable objection, and valuation of variations must be carried out in accordance with rules in the JCT standard building contract [53]. However, several disputes were discovered in the collected cases when the parties were not in mutual agreement about the valuation of variations carried out by contractors. As an example of the classified with this topic, some conflicts show that contractors have no right to claim for the extra payment for extra work unless such work is stated in the contract clause.

Topic 3: Document Submittals
As shown in Figure 7, there are a few words that help to understand the theme of the group in Topic 3. Frequently occurring terms such as 'submission', 'time', 'document', and 'submit' suggest that this group is related to the 'document submittals' that are one of the contractor's obligations. Document submittals in construction projects should comply with contractual procedures written in contracts. The major issues in which disputes arise among the parties are likely to be associated with time and clarity of documents, in terms of document submittals. If contractors fail to submit construction documents and information in a timely manner, it could violate the terms of the contract and cause problems. In addition, any submissions that contain errors and incomplete information can be rejected by the employer's party; worse, they can be used to demonstrate the contractor's poor performance under a construction dispute. These types of dispute cases have been discovered in this analysis, demonstrating that accurate and timely submission of construction documents can prevent related unnecessary construction disputes.

Topic 4: Delay and Liquidated Damages
Topic 4 has been labeled as 'delay and liquidated damages'. It is common for construction projects to face delays that can be characterized as excusable, non-excusable, or compensable. Therefore, conflicts arising over who is responsible for the delay and whether the contractor is entitled to an extension of time can be considered major issues of construction projects. As shown in Figure 7, terms like 'completion', 'delay', 'time', 'extension', 'liquidate', and 'damage' show that legal cases related to construction delays and liquidated damages are one of the main topics in the given documents. In addition, 'practical completion' seems to be one of the most relevant terms in Topic 4. Few disputes have arisen regarding the criteria for judging whether the contractor has reached the 'practical completion', which is an important and appropriate benchmark for the contractor's completion of obligation.
Moreover, terms like 'possession' and 'access' indicate that 'possession of the site' is an issue in the legal cases of Topic 4. The JCT standard building contract requires the employer to give the contractor possession of the site of the work on the date stated in the contract. If the employer fails to give possession on the stated date, it is a serious breach of contract [53]. Some legal cases have been found to be related to delays caused by the employer's failure to give possession on a specified date. This issue demonstrates that if certain contract conditions intentionally exclude the contract terms associated with the contractor's possession of the site, the contractors cannot be assured in case of delays caused by the employer's failure to give possession.

Topic 5: Construction Operation
Topic 5 includes terms such as 'operation', 'order', and 'site' that invoke corresponding topics. However, it is not easy to infer what a common theme is with these words alone. We thus further investigated associated words of some frequent terms to find additional clues. According to the word association results, the term 'operation' has strong connections with 'mean', 'definition', 'housing', and 'act'. This shows that the definition of construction operation by the Housing Grants, Construction, and Regeneration Act 1996 (HGCRA, also known as the Construction Act) [54] has been referenced many times in the given legal cases. The Act defined several activities (e.g., construction alteration, repair, maintenance, etc.) as construction operations; however, some activities (e.g., drilling for oil or natural gas, extraction of minerals, etc.) are expressly excluded. There are thus some disputes where a contract relates partly to construction operations and partly to being excluded in the construction projects. Issues remain if the dispute relates partly to matters that are not construction operations and there is no express adjudication clause [55]. Therefore, given that terms indicating the scope of construction operations frequently appear in this group, we can learn that the issue of 'construction operation', particularly how to resolve disputes about excluded activities from the construction operations, should be clearly defined in advance to prevent related disputes.
3.3.6. Topic 6: Payment-Related Notice Topic 6 has been labeled as 'payment-related notice' based on terms like 'payment', 'notice', 'pay', 'less', 'interim', and 'due'. If a construction contract has been made under a JCT contract and the employer fails to issue an adequate and timely interim certificate (i.e., a payment notice), the employer may be in an unnecessary dispute. Furthermore, where the employer's party has failed to serve a valid pay less notice, it results in the employer being obliged to pay the entire amount applied for as per the contractor's payment application (which will be deemed as default payment notice) [56]. In the collected legal dispute cases, there are a few cases in which the pay less notice was late and therefore invalid. The following contract condition is an example of contact terms and a condition that are frequently cited in legal decisions: "If the Employer intends to pay less than the sum stated as due from him in the Payment Notice or Interim Application, as the case may be, he shall not later than five days before the final date for payment give the Contractor notice of that intention". Therefore, the employer purported to serve an adequate pay less notice in a timely manner if there is a clear reason to provide less pay; otherwise, a violation of the terms of the contract (i.e., breach of contract) can be made.

Topic 7: Others
Topic 7 has been classified as 'Others' due to its various topics, including legal fees and expense, which are relatively less important than other major topics. The most distinctive terms in Topic 7 are 'cost', 'pay', 'issue', 'fee', 'date', and more specifically, relevant contract provisions have been discovered as follows:

•
The parties shall each bear their own legal costs and other expenses incurred in the adjudication; • Where the referring party is awarded in the aggregate a sum less than 50% of the amount claimed, he shall reimburse the other party the legal costs and other expenses which the non-referring party incurred in the adjudication.
This topic is not related to the cause of the dispute, but rather to the contractual outcome itself resulting from disputes. However, some disputes are associated with legal costs and expenses in the collected legal cases, showing that how to allocate the legal fees and expenses should be clearly defined in advance in the course of construction disputes.

Conclusions and Discussion
This study offers domain-specific insights by analyzing construction legal cases and investigating corresponding contractual issues of real project cases to understand possible contract risks that can potentially lead to legal disputes. This study analyzed legal dispute cases by frequency of terms, time periods, and topics. By analyzing the collected documents in terms of term frequency and word correlations, we identified that terms and conditions of the contract are frequently quoted in the relevant legal decisions. Thus, knowing which terms and conditions of construction contracts affect the final legal decision is essential to prevent future disputes. We focused on contract-related content and a judicial decision-related content in the collected cases for contractual knowledge discovery. The main findings of this study are as follows: (1) similar patterns of disputes have occurred repeatedly in construction-related legal cases over the entire period (1999 to 2019), which shows that the factors that have caused construction disputes in the past are being repeated today. Therefore, learning from past failures is meaningful in solving today's problems related to construction disputes. Additionally, the results demonstrate that a large amount of accumulated legal cases is available to solve similar cases, ensuring the feasibility of knowledge discovery using text analytics methods in construction dispute management; and (2) discovered dispute topics indicate that mutually agreed upon contract conditions are important in dispute resolution, thereby demonstrating that in-depth review of contract conditions should be performed in the contracting process. For example, if a certain contract was signed without an expression about the contractor's right to possession of the site, it could be a huge risk for contractors because it could cause construction delay and further conflicts. Such disputes related to possession of the site were observed in the topic modeling analysis. Being aware of contract terms and conditions that are frequently cited in legal judgements will provide practitioners knowledge of the contract terms that should be carefully reviewed at the contract stage. The findings of the study show that our proposed approach can be used to automatically uncover legal knowledge, thereby assisting decision support in contractual strategy formation.
A construction legal case collection is associated with multiple underlying contractual issues. Therefore, gaining a basic understanding of the frequently referenced contract terms and conditions in the legal cases provides useful insights into which contract conditions must considered carefully when reviewing contracts. This understanding enables contract parties to prepare mutually balanced contracts to the roles and responsibilities of the contracting parties. From a methodological perspective, the main contribution of this paper is to present an automated framework to identify trends of disputes and the corresponding contract issues using massive unstructured data while minimizing human effort. In particular, considering that construction legal cases are continuously accumulating information, it is expected that the methodology presented in this study can be expanded in the future analysis. Moreover, the findings of the study are expected to be widely used in issues associated with contract conflicts and disputes by enhancing the accessibility of legal information that requires professional knowledge. The advanced understanding of contractual legal knowledge will be helpful for avoiding uncertainties when drafting contracts.
However, the findings of this study need to be interpreted in the light of its limitations. Due to the nature of case study research, conclusions were derived from the UK legal cases that were used as a source of data for understanding dispute causes and legal decisions. Given the fact that challenges facing the construction industry worldwide are universal, and the disputes issues that are dealt with throughout the current study are common issues witnessed in many countries, the findings of the study provide a useful methodology that can be replicated with data from various countries. Therefore, it is suggested that further case studies extend to other countries, which could provide broader perspectives and understanding. Funding: This article was published with support from the Howard R. Hughes College of Engineering at the University of Nevada, Las Vegas. Data Availability Statement: Some or all data and models generated or used during the study are available from the first author by reasonable request.