Unlocking the Potential of Digital Twins in Construction: A Systematic and Quantitative Review Using Text Mining

: The construction industry has been trying to enhance the level of digitalization and autonomy by adopting various communication and information technologies (ICT), e.g., augmented reality (AR), virtual reality (VR), robotics, drones, or building information modeling (BIM). However, improvement of the safety and productivity in their domains is still a struggle. One of the main reasons for failing to accelerate their digital transformation is ignoring the deep understanding of the concept of digital twin, its usage, and the potential benefits of digital twins in the construction industry. Therefore, this paper investigated the impacts and potentials of digital twins on the construction industry through a quantitative systematic review assisted by the text mining method. The study presented the potential usability of digital twins, leading and core technologies, and applications, revealing their benefits and potential for optimizing project planning, execution, and management process. Through this comprehensive literature review, this study elucidated the distinctive features, advantages, and immense potential that digital twins bring to the construction field. The findings highlight the transformative impact of digital twins, providing critical insights for their broader adoption and groundbreaking applications in the industry. By addressing the challenges of adopting this technology, the article provided valuable insights for advancing research and the broad implementation of digital twins in the sector.


Introduction
The digital twin represents a dynamic virtual representation of physical objects and processes, synchronizing with its real-world components.It operates based on data, algorithms, and the Internet of things (IoT) to bridge the gap between cyber systems and physical structures.The utilization of digital twins in construction is shifting a paradigm from traditional methods, offering a virtual environment for design, simulation, and collaboration during the project life cycle.Advanced technologies, such as artificial intelligence (AI), machine learning, and virtuality technologies (e.g., virtual reality (VR), augmented reality (AR), or mixed reality (MR)), have a high level of capability to augment digital twins by enabling predictive analysis and real-time monitoring [1].Leveraging digital twins in construction heralds a new age of improved stakeholder communication, seamless project management, and enhanced efficiency [2].This advancement has contributed to significant reductions in time, costs, and errors in construction projects [3].
The construction sector, however, has historically been slow in embracing advanced digital technologies compared to other industries [4,5].As global demands for sustainable, efficient, and innovative construction solutions rise, the industry seeks transformative approaches to address these challenges.Digital twins, with their variety of applications and benefits, have been identified as a game-changing technology with the potential to drive substantial improvements in project outcomes [6].Some studies have been trying to review the case studies that applied digital twin modeling in some countries and disseminate the current research issues related to the digital twin by reviewing the recent studies in the domain [7,8].
However, there exists a notable gap in the literature that comprehensively evaluates the real-world applications and inherent characteristics of digital twins within the construction domain through a meta-literature analysis.This lack of a consolidated knowledge base makes it challenging for industry stakeholders to discern the technology's capabilities, benefits, and implementation challenges.There is a pressing need for a robust problem statement that encapsulates these concerns to offer a roadmap for future investigations and prompt stakeholders to delve deeper into understanding digital twins [7].Given this research gap, this study aims to shed light on the current state of digital twin technology in the construction area.A systematic review of existing meta-literature was conducted to collect meaningful literature, and sophisticated text-mining methodologies were employed to achieve the goals of this study as follows: • To understand where the construction industry currently stands in terms of digital twin adoption.• To highlight the significant benefits that can be derived from using digital twins in construction projects.• To outline the potential applications of digital twins, from the design phase right through to maintenance.• To discuss the challenges and concerns that the industry might face with the broader adoption of this technology.
In the end, it is possible to provide information regarding the potential benefits, applications, and challenges of digital twins in construction, as well as the further implementations that should be focused on.

Research Methodology
To conduct a comprehensive review, this study employed text mining that can help to report text-based information for systematic reviews and meta-analysis and document the comprehensive knowledge presented and provide further directions [9].This study consisted of three main steps: (1) determining the keywords and inclusion/exclusion criteria to select the literature to process the text mining; (2) processing text mining techniques to identify the domains related to the digital twin; and (3) synthesizing the information based on the literature review and text mining analysis.By adopting text mining techniques, this study can extract pertinent studies surrounding digital twins and focus on documenting the multifaceted applications, core technologies, and noticeable features of digital twins within the construction field.This study also scrutinized the curated literature to discern patterns and correlations among key themes.Culminating the study, this paper presented a quantitative analysis to explore the tangible applications of digital twins in construction projects.

Keywords and Search Criteria Determination
The primary objective of this investigation is to provide an exhaustive overview of the state-of-the-art digital twin technology in the construction sector.Furthermore, this exploration aims to elucidate the challenges, advantages, and prospective applications of the digital twin within the construction area.In the pursuit of this aim through the keywords for searching the literature, three steps are included: (1) designating relevant keywords to the target literature; (2) determining criteria for inclusion and exclusion to ensure the rigor and relevance of the selected studies; and (3) employing the Crossref web search system for literature identification and final selection.
To collate a comprehensive review of the current state of "digital twin" technology within the construction domain, a systematic search is initiated utilizing the Crossref API.This API is a widely acknowledged repository for academic articles, offering researchers a platform to extract bibliographic data from academic works.Our query specifically targets articles with keywords "digital twin" and "construction".The search parameters are designed to retrieve highly relevant and up-to-date articles.The primary bibliographic query is set to "digital twin AND construction" in an effort to narrow down our results to those most pertinent to our research aim.The inclusion and exclusion criteria are also defined in Table 1.

•
Proceedings with the same contents extended to the journal articles.

•
Excluded articles that cannot be downloaded through the publisher (not accessible).
The search was further constrained to retrieve a maximum of 500 articles to ensure a balance between the breadth and depth of the literature reviewed.Upon receiving a positive response from the Crossref API, this study parsed the JSON data structure to extract the title, authors, published date, and abstract.

Text Mining for Feature Extraction
With the collected papers written about the "digital twin" post 2018, it is essential to categorize these papers based on their primary domain of focus to understand the breadth of "digital twin" application across various industries and its specific implications within the construction industry.The research primarily utilizes text mining techniques to discern the industrial domain of each paper, such as construction, education, health, manufacturing, and others.

Domain Categorization
Firstly, titles of papers related to "digital twin" from 2018 onwards were extracted, reaching a total of 500 titles.These titles were stored in a text file, and each was converted to lowercase.This ensured uniformity and reduced discrepancies during the text mining analysis.A set of predefined keywords, each set specific to a domain, acted as the basis for categorizing these papers.Each title was decomposed into individual words with the "word_tokenize" function from the Natural Language Toolkit (NLTK) [10].The tokenized titles were then matched against the domain-specific keyword sets.If a word within a title correlated with a keyword from a domain, the paper was ascribed to that domain.It is worth mentioning that if a title contained keywords aligning with multiple domains, the paper was slotted under the domain it first matched based on a predefined sequence of domain checking.
In a more narrowed approach, titles of papers related to both "digital twin" and "construction" were further categorized to comprehend the "digital twin" application within the construction industry area.Therefore, the domains were transportation, construction, design, and maintenance and operation.The same text mining techniques were employed, with a set of keywords specifically crafted for these construction-related domains.The tokenized titles were cross-referenced with this new set of keywords, and each title was categorized based on its match.

Word Cloud Visualization
Beyond the quantitative categorization, a more visual approach is also employed to discern the prevalent themes within the collected papers.Word clouds serve as an illustrative method to showcase the frequency of terms and provide a snapshot of the main topics under discussion.For each set of collected titles-those from the broad "digital twin" search and the more narrowed "digital twin and construction" search-a separate word cloud is generated.NLTK's Lemmatizer is employed to ensure uniformity and to reduce words to their base or root form.The lemmatizer considers both verbs ("v") and nouns ("n") to derive the root word.This step aids in grouping similar terms together, ensuring that different forms of the same word do not appear separately.This process also involves the removal of stop words, such as "and", "the", and "is", and the elimination of special characters like punctuation marks, which may introduce noise into the data.Subsequently, the text is converted to lowercase to ensure uniformity, and sentences are tokenized into individual words to facilitate detailed analysis.Optional reconstitution of processed words into sentences is performed when necessary to preserve contextual integrity.This sequence of operations-stop word removal, special character elimination, case normalization, and tokenization-prepares the text for sophisticated analysis by reducing complexity and enhancing the focus on meaningful content for generating word clouds.
With the preprocessed texts, the word cloud is created by the Python WordCloud library [11].The size of each word within the word cloud is directly proportional to its frequency across the titles.This visual representation offers a more intuitive understanding of the primary topics and trends within the "digital twin" research area.By comparing the word clouds generated from the broad search with those from the construction-focused search, one can identify overarching themes and specific nuances that dominate the discourse in each category.

Key-Technology Correlation Analysis
This section describes a structured approach to quantifying the relationships between key technological terms within a corpus of text data.Initially, the corpus, comprising titles from a collection of documents, is preprocessed to ensure consistency in analysis.This involves converting all titles to lowercase and applying a custom function to replace key terms with standardized identifiers, thereby facilitating accurate comparison.Subsequently, this study defined a focused array of key technologies as the basis for investigating their interconnections.Algorithm 1 depicts the process of the correlation analysis between key technological terms in the selected articles.
To systematically measure the co-occurrence of these technologies within the titles, this study employed an initialized two-dimensional matrix representing all possible pairs of the specified technologies.The presence of each technology pair is then assessed by incrementing the corresponding matrix cell by one for every co-occurrence detected.This process yields a comprehensive matrix capturing the frequency of each technology's cooccurrence with others, effectively mapping their associative relationships.Following the compilation of co-occurrence data, the matrix is transformed into a data frame for enhanced analysis and visualization.With the seaborn library [12], a heatmap to visually represent these co-occurrences based on intensity is generated.

Summary of Literature Search
Based on an exhaustive search through major academic databases, a total of 500 studies were collected when searching for "digital twin", and the same number of studies were returned for "digital twin and construction" between 2018 and 2023.Utilizing a set of inclusion and exclusion criteria tailored to the study's objectives, a total of 300 studies were chosen from each search category.They could be categorized into different domains that digital twins were applied to, such as manufacturing, construction, healthcare, or infrastructure.

Synthesis of Implications for Digital Twin in Construction
The main objective of this study was to synthesize the information and implications, including core technologies, potential impacts and benefits, and challenges for digital twins in the construction industry.By examining the cooccurrence of key terms in research titles, this study aimed to identify the synergistic relationships between the most significant technologies.For the analysis, several key technologies were identified based on their relevance and prominence in the field as below:

•
Blockchain, Each title underwent a preprocessing step to standardize and simplify the terminology.Specific terminologies, like "building information modeling", were replaced with their popular acronyms, e.g., "BIM", to maintain consistency and reduce potential variations.For the cooccurrence analysis, a matrix was constructed to denote the frequency of two technologies being mentioned together in a single title.This matrix provided insights into the interrelationships between the technologies.For example, if "BIM" and "AI" appeared together in various titles, their co-occurrence count increased, signaling a potential synergistic relationship between the two in the context of digital twins in construction.Finally, the cooccurrence matrix was visualized using a heatmap.The heatmap provided a graphical representation of the interplay between the technologies, enabling easy identification of the most inter-related technologies in the dataset.
Using text mining techniques, the abstracts of the selected papers were analyzed to extract significant features and advantages associated with digital twins.Before analysis, the data underwent several preprocessing steps: (1) tokenization, (2) part-of-speech tagging, and (3) stopword removal.The developed system identified sentences containing terms like "digital twin" or its abbreviations and then extracted adjectives associated with "features", "characteristics", and "advantages".The extracted data, especially the adjectives, were then subjected to frequency analysis to identify the most associated characteristics and advantages of digital twins.Further, network analysis was employed, presenting a visual representation of these characteristics elucidating their interconnections and significance.

Classification Domains Implementing Digital Twin
This study identifies the various industry domains that implement digital twin technology.Figure 1 manifests a word cloud generated from extensive analysis of 300 selected academic papers retrieved using the descriptor "digital twin".Immediately dominant terms, such as "industry", "design", "application", and "manufacture", show the vital role digital twins play in contemporary industrial design and manufacturing schema.Words such as "data", "network", and "architecture" show the importance of interconnected systems and the rise of data-driven approaches in digital twin research.At the same time, mentioning "healthcare", "construction", and "management" shows how widely digital twin technologies are used in different fields.Overall, this word cloud offers a clear picture of the main topics in digital twin research.Figure 2 illustrates the domain classification derived from the titles of the extracted papers.Domains were automatically categorized through keyword extraction, and the significance of the keywords representing each domain was weighted based on their frequency.The pie chart demonstrates that "construction" and "manufacturing" emerge as dominant domains, accounting for 44.3% and 38.2%, respectively.This underscores the heightened relevance and application of digital twin paradigms within these sectors.The "health care" domain constitutes 14.5%, indicating its growing intersection with digital twin technologies.In contrast, "education" represents the smallest segment at 3.1%, suggesting a nascent yet evolving exploration of digital twin applications in this sphere.Collectively, the data offer a clear insight into the sectors where digital twin applications are most extensively researched and applied.Figure 3 depicts the construction industry's focus on "management", "framework", "application", and "method" in terms of the digital twin.Compared to the general "digital twin" search, this focused query emphasizes construction-specific terminologies such as "safety", "engineer", "bridge", and "build".While terms such as "industry", "data", and "design" remain prevalent in both searches, the attention to practical applications and methodologies becomes more pronounced in this context.In Figure 4, the pie chart demonstrates the application areas of digital twins within the construction sector.For the categorization using keyword analysis, this study employed the following keywords and their derivative morphemes: for building construction, terms such as "building", "architecture", "residential", "commercial", "skyscraper", "housing", and "estate" were used; for infrastructure construction, keywords like "dam", "infrastructure", "bridge", "road", "highway", "tunnel", "railway", "metro", and "utility" were utilized; and for transportation, terms included "transport", "vehicle", "car", "bus", "train", "airplane", "aviation", "shipping", "marine", and "logistics".The results show that building construction dominates with 54.1%, emphasizing the important role of digital twin technology in this area.Meanwhile, infrastructure construction and transportation represent 37.7% and 8.2%, respectively, highlighting the expanding reach and potential of digital twin applications.The chart depicted in Figure 5 provides a breakdown of the application of digital twin technology across various stages of construction.Evidently, the design phase is the leader, accounting for 45.5% of the research, closely followed by the maintenance and operation phase with 43.9%.Based on the systematic review, some studies have tried to develop a digital twin model of tunnel infrastructure in order to analyze the life cycle cost and their performance [13].In the domain of infrastructure, this study also revealed that it can aid in overcoming the data fragment issues from project design, bidding, delivery, construction, and maintenance using the digital twin model of infrastructure (e.g., roads and bridges) [14].Meanwhile, the actual construction phase seems to have less representation with only 10.6% of the papers focusing on it.This distribution underscores the prevailing interest in integrating digital twins during the design and operational stages of construction projects.

Identification of Core Technologies for Implementing the Digital Twin
In this section, this study demonstrates the findings derived from the title analysis of papers extracted using keywords "digital twin and construction".The objective of this analysis was to discern the core technologies for digital twin applications within the construction industry.Figure 6 presents a fascinating exploration into the intertwining of various core technologies for digital twins in the construction industry.As shown in the matrix heatmap in Figure 6, AI emerges as a dominant technology, with 46 papers exclusively emphasizing its role.VR is the subject of 16 papers, while CPS is explored in 8 papers.Additionally, the matrix also sheds light on interdisciplinary intersections.For instance, AI's convergence with blockchain and IoT is noticeable in nine papers, indicating the combined potential of these technologies in revolutionizing the construction industry via digital twin applications.Meanwhile, BIM seems to have a more isolated application, with 21 papers centered on it and minimal overlaps with other tech domains.

Key Features and Advantages of Digital Twins
This study aimed to analyze and discuss the characteristics and benefits of digital twins by leveraging text mining techniques on a vast collection of abstracts from the collected research papers.Abstracts from selected research papers were aggregated and subjected to rigorous analysis to gain a holistic view of the vital features and benefits of digital twins.The primary goal was to filter out adjectives that pertain directly to the features and characteristics of digital twins.The outcome was then visualized using a network graph to ascertain the connections and relationships between these features (see Figure 7).
The above network graph shows five key adjectives that define the nature of digital twins: • Virtual and Physical: Digital twins seamlessly bridge the gap between the virtual and physical realms, enabling real-time monitoring and simulation.

•
Dynamic and Real-time: These adjectives emphasize the capability of digital twins to adapt and respond in real time to changes, underscoring their dynamic nature.

•
Data-driven and Systematic: Digital twins heavily rely on data, ensuring systematic and efficient operations.

•
Intelligent and Smart: With the infusion of AI and advanced algorithms, digital twins can make intelligent decisions, optimize processes, and enhance user experience.

•
Other adjectives such as "comprehensive", "three-dimensional", and "web-based" further accentuate the multifaceted nature of digital twins.The extracted features emphasize the versatility and efficacy of digital twins.Their real-time, intelligent, and data-driven nature makes them invaluable assets for dynamic construction projects in which optimization, predictive maintenance, and enhanced user experiences are required.

Discussion
This study employed text mining techniques to carry out a quantitative analysis of the selected studies.While this approach offered valuable insights and highlighted key trends, it is often said that numbers alone do not tell the whole story.A deeper qualitative examination is essential to fully understand the intricacies of the findings.Thus, to provide a more holistic and comprehensive perspective, this chapter delves into a rigorous qualitative analysis of the core themes.

Applications of Digital Twins in the Construction Industry
Digital twins have started to revolutionize various segments of the construction industry by offering unprecedented opportunities for enhanced project planning, execution, and management.Digital twins are being employed across various sectors of the construction industry, from building construction and infrastructure development to transportation.Their utility extends beyond the design phase, finding widespread application during the construction and operational stages.For instance, in building construction, digital twins facilitate enhanced planning by providing detailed 3D models, which ensures that potential challenges are identified and addressed in the preliminary stages [15,16].Such proactive measures can lead to significant cost savings and timely project completion.When it comes to infrastructure development, especially in large-scale projects such as highway construction and ultra-high voltage tower construction [17], the precision and real-time feedback offered by digital twins prove invaluable [18,19].They enable engineers to monitor the progress and ensure that the project adheres to safety standards and guidelines.Transportation projects, such as ground transportation, railways, or airports, benefit immensely from the predictive capabilities of digital twins [20,21].A possible application of the digital twin is analysis of the building environment and energy analysis for improving the energy performance of building as well as providing better conditions for their users [22].Another potential application for digital twining can be modular construction integrating BIM for exchanging building data throughout the life cycle [21].Predictive maintenance, optimized resource allocation, and real-time traffic simulations can lead to efficient project execution and eventual operation.In the subsequent phases following construction, particularly during the operational stage, the significance of digital twins remains undiminished [23,24].Whether it pertains to the meticulous oversight of energy consumption patterns within architectural structures, the strategic enhancement of traffic dynamics in transportation infrastructures, or the vigilant surveillance of degradation in large-scale infrastructure projects, digital twins provide valuable insights for operational efficiency and sustainable practices.

Convergence of Core Technologies in Digital Twin Applications
In this study, the exploration of core technologies, especially within the digital twin applications in the construction industry, unveils profound insights.In the context of AI's dominance, the authors of [25] emphasized that artificial intelligence is crucial in enhancing predictive maintenance and real-time monitoring of digital twin systems in construction.This observation extends beyond theoretical speculation; the substantial volume of papers emphasizing AI highlights its pragmatic applications and maturity in the discipline.In addition, several studies have emphasized that the intersection of AI with other groundbreaking technologies, such as blockchain and IoT, has immense potential [26][27][28][29].They highlight that blockchain can provide secure data storage, while IoT offers real-time data collection, and when combined with AI's analytics capabilities, the trio promises to revolutionize digital twin applications.BIM is also one of the core technologies when considering digital twin applications in the construction domain.Its significance is clear, as it offers a combined 3D modeling tool and solid documentation, making it a key starting point for using digital twins in construction projects [22,30].VR offers an immersive experience, enabling stakeholders to visualize and interact with construction projects in their digital twin form [31]. Several papers focusing on VR indicate its importance in enhancing user experience and collaboration in the digital world of construction [32].To sum up, the landscape of core technologies in digital twin applications within the construction domain is both vast and varied.Some technologies integrate and create synergies, while others remain specialized, emphasizing their unique contribution.As the industry continues to evolve, these technologies can further intertwine, diverge, or converge, shaping the future of digital twin applications in construction.

Conclusions
Digital twins have emerged as a valuable technique for enhancing project planning, execution, and management in the construction industry.The use of digital twins has revolutionized various segments of the construction industry by offering unprecedented opportunities for enhanced project planning, execution, and management.Digital twins are being employed across various construction industry sectors, from building construction and infrastructure development to transportation.Their utility extends beyond the design phase, finding widespread application during the construction and operational stages.
The review demonstrates the transformative impact of digital twins and their immense potential to revolutionize the construction field.By providing detailed 3D models, digital twins enable proactive measures to be taken in the preliminary stages, leading to significant cost savings and improved project outcomes.Text mining are adopted for quantitative analysis of the selected studies, while a qualitative examination is additionally conducted to provide a more holistic and comprehensive perspective.The categorization of papers based on their primary domain of focus helps to understand the breadth of digital twin applications across various industries and their specific implications within the construction industry.
Consequently, this study underlines the importance of digital twins in the construction industry and their potential to transform the way projects are planned, executed, and

Figure 1 .
Figure 1.Word cloud generated from selected papers when searching with "digital twin".

Figure 2 .
Figure 2. Distribution of industry domains based on keyword extraction.

Figure 3 .
Figure 3. Word cloud derived from titles of 300 papers searched with "digital twin and construction".

Figure 4 .
Figure 4. Distribution of specific domains within construction in papers searched with "digital twin and construction".

Figure 5 .
Figure 5. Distribution of construction stages applied with digital twin technologies.

Figure 6 .
Figure 6.Cooccurrence of core technologies in digital twin and construction research titles.

Figure 7 .
Figure 7. Network graph illustrating the features and characteristics of digital twins derived from aggregated abstract analysis.

Table 1 .
The inclusion and exclusion criteria for the search process.
Convert co-occurrence matrix to a DataFrame with key technologies as both columns and index 15 Generate a heatmap from the DataFrame