Next Article in Journal
Heterogeneous Photocatalytic Degradation of a Glucocorticoid in Aqueous Solution and Industrial Wastewater Using TiO2-Zn(II)-Clinoptilolite Catalyst
Previous Article in Journal
Fault Diagnosis Method for Rolling Bearings Based on a Digital Twin and WSET-CNN Feature Extraction with IPOA-LSSVM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Safety Risk Factors for Shield Construction in Urban Drainage Deep Tunnel Based on Text Mining

1
School of Architecture and Material Engineering, Hubei University of Education, Wuhan 430205, China
2
School of Civil Engineering and Architecture, Wuhan University of Technology, Wuhan 430070, China
*
Author to whom correspondence should be addressed.
Processes 2025, 13(9), 2782; https://doi.org/10.3390/pr13092782
Submission received: 28 July 2025 / Revised: 21 August 2025 / Accepted: 27 August 2025 / Published: 29 August 2025
(This article belongs to the Section Process Control and Monitoring)

Abstract

Shield construction of deep tunnels for urban drainage involves many risk factors, and potential safety hazards are difficult to monitor and identify directly. In order to improve the risk management level of shield construction in urban drainage deep tunnel, this study proposes a method for identifying risk factors by combining text mining technology and the entropy weight method. By using this method, 34 safety risk factors were successfully extracted from the safety accident reports of urban drainage deep tunnel shield construction and the related text data. The results of this study show that the text mining method could play an important role in the risk management of urban drainage deep tunnel shield construction; the introduction of the entropy weight method further improved the accuracy of risk factor identification. The results of this study not only enrich the research content of risk management in urban drainage deep tunnel shield construction but also provide theoretical guidance for managers to formulate risk management measures and optimize risk management procedures.

1. Introduction

The pipe network system is an important lifeline of a city, and it is also a crucial infrastructure for ensuring the normal operation of the city [1]. In 2024, 154 cities in China suffered from urban flooding due to heavy rainstorms, causing extensive damage to urban infrastructure. The number of affected people reached 2.55 million, resulting in significant casualties and economic losses [2]. This indicates that the traditional construction mode of the pipe network system can no longer meet the current urban development needs. Deep tunnel drainage is an important means to solve urban waterlogging and improve urban drainage capacity [3], and many cities have begun to build deep tunnel drainage systems to ensure their smooth operation. The shield construction method has been widely applied in the construction of urban drainage deep tunnels due to its characteristics of high efficiency, safety, and environmental protection [4]. The diameter of drainage tunnels is small, and the small-diameter shield construction method is generally adopted [5]. Compared with the construction of large- and medium-diameter shield machines, small-diameter shield machines are smaller in size and prone to tunneling deviations. Small-diameter shield machines have insufficient adaptability to complex geological conditions, and there are considerable difficulties in equipment selection and control. Due to the narrow space of the tunnel, the transportation and installation of building materials in the construction of small-diameter shield machines are relatively difficult, and the ventilation and heat dissipation conditions are poor [1,3,5]. Moreover, urban drainage deep tunnel projects are characterized by large burial depth, small tunnel diameter, long route, complex geological conditions, and high environmental protection requirements [6], which further increase the difficulty and risk of small-diameter shield construction. How to do a good job in project risk management and reduce the probability of safety accidents is a key issue that urgently needs to be solved at present.
Identification of risk factors is an important step in risk management and an important basis for risk assessment and the formulation of risk-control measures [7]. Many experts and scholars have studied the identification of safety risk factors in shield construction from different perspectives. He et al. [8] combined the methods of literature review and expert group evaluation to identify the safety risk factors in the tunnel shield construction process under the condition of crossing the bridge. Xu et al. [9] adopted the work breakdown structure–risk breakdown structure method to identify the safety risk factors across different stages of tunnel shield construction and then established a risk evaluation index system. Liu et al. [10] proposed an automatic identification method for tunnel shield construction safety risks based on ontology. Li et al. [11] used knowledge graph analysis to examine the connections between the safety risk factors in tunnel shield construction, thereby improving the efficiency of risk factor identification. Wu et al. [12] used text mining techniques to identify the risk factors in tunnel shield construction and then used association rule mining to further analyze the interrelationships among these factors. Tang et al. [13] used text mining techniques to identify the risk factors and reuse knowledge in tunnel shield construction and constructed a corpus for risk identification in subway shield construction, thereby improving the efficiency of word segmentation.
From the above analysis, it can be seen that research on identifying risk factors in shield construction has shifted from traditional empirical analysis and literature analysis to the application of big data analysis technology. The existing research results showed that the introduction of big data technologies, such as text mining, knowledge graphs, and knowledge base construction, could enhance the efficiency and accuracy of identifying risk factors in shield tunneling. However, the degree of standardization of basic data plays a significant role in the application effect of big data technology. The standardization level of safety management materials for the shield construction of the urban drainage deep tunnel project is not high, and some accident reports also lack a unified format and standardized wording, which increases the difficulty of processing basic data and the amount of manual work required. Constructing a dedicated vocabulary database and enhancing the effectiveness of feature item selection are effective means to improve the applicability of traditional text mining techniques [13]. Therefore, this study improved the traditional text mining techniques from these two aspects. Based on the risk characteristics of the shield construction in the urban drainage deep tunnel project and the relevant norms and standards, a safety risk dictionary was constructed to improve the efficiency of word segmentation. The entropy-weighted frequency had advantages such as dynamic weight adjustment, document content sensitivity, and anti-interference ability. This method was adopted to extract the characteristic items from the risk management data. Based on this improved text mining technology, a method for identifying the safety risks in the shield construction of urban drainage deep tunnels is proposed, which provides theoretical guidance for identifying key risk factors and formulating safety management plans.

2. Literature Review

2.1. Identification of Safety Risk Factors in Tunnel Shield Construction

The accuracy of risk factor identification directly affects the overall effect of risk management [14], and it is a hot issue in the risk management research field. Xu et al. [9] and Fu et al. [15] adopted the WBS–RBS method to identify the potential risks of tunnel shield construction. Zhang et al. [16] established a risk assessment index system for shield construction from the aspects of technology, geology, equipment, management, and safety accidents, drawing on their engineering experience. Liu et al. [10] combined ontology and rule reasoning techniques and utilized semantic reasoning to achieve risk identification in shield construction. Isah and Kim [7] built a system that could automatically identify and respond to risks in tunnel construction based on knowledge graphs and generative AI. Wu et al. [17] identified the risk of water gushing in the tunnel through theoretical and practical analysis of engineering cases. Shelake and Gogate [18] analyzed the risk factors affecting the progress of tunnel construction and developed a risk identification framework. Zhou et al. [19] proposed a framework based on BIM for automatically identifying and classifying environmental risks in the metro design stage.

2.2. Text Mining in the Risk Management of Tunnel Shield Construction

Text mining refers to the process of extracting valuable information from a large amount of text data by using technologies such as computer science, statistics, and machine learning [20,21]. Tang et al. [13] adopted the text mining method and developed a professional vocabulary bank. Through the analysis of risk reports, it identified the safety risk factors of metro shield construction. Vagnoli and Remenyte-Prescott [22] proposed a data mining method for identifying risks in tunnel construction; the key areas and occurrence times where risks exist in tunnel construction were identified via data analysis. Compared with traditional data mining techniques, it has some major advantages:
(1)
Understanding semantics. With the aid of natural language processing technology, text mining is not merely about simply counting the key words and their word frequencies in the text content but can also provide a deeper understanding of the connotations in the text data.
(2)
Revealing complex relationships among risk factors. Traditional data mining techniques had difficulty capturing the correlations between words in text data and the similarities among multiple different documents, while text mining can further extract the connections between words.
(3)
Handling unstructured data. Most traditional data mining techniques can only handle structured data. Most engineering materials, such as accident investigation reports and construction plans, are unstructured data. Text mining technology can handle these text data, which not only greatly saves data processing time but also increases the objectivity of text analysis to a certain extent.

2.3. Gap in the Existing Relevant Research

The existing related studies adopted methods such as expert investigation, case analysis, literature analysis, work decomposition, and laboratory decision-making to identify the risk factors of shield construction in tunnel engineering. Since most of the shield construction data for urban drainage deep tunnel were unstructured text data, the entire shield construction process was dynamic. Many existing studies adopted structured data for analysis, and it was difficult to achieve dynamic data updates [12,23]. Some studies have applied the concept of big data analysis and data mining techniques to the research on safety risk identification in the shield construction of tunnel engineering, which improved the accuracy and objectivity of risk factor identification to a certain extent. These studies provided references and guidance for this research in terms of methodology. However, compared with general tunnel projects, urban drainage deep tunnels have specific particularities, such as the adoption of small-diameter shield construction methods. The transportation of materials is difficult. The working surface for personnel operations is narrow. At the very bottom of the underground space, the surrounding geological environment is difficult to monitor. The current list of tunnel safety risk factors and index system do not meet the needs of risk control for shield construction in the urban drainage deep tunnel.
In terms of the risk factor identification method, traditional text mining technology requires the standardization of basic data, and the selection of feature items has a significant impact on the accuracy of risk factors. At present, there is no lexicon available regarding the safety risks associated with the shield construction of the urban drainage deep tunnel project. Meanwhile, there are numerous methods for extracting features in text mining technology, such as the LDA topic model [24], BERT text embedding [25], inverse document frequency [12], etc. These methods have their own distinct advantages and application scenarios. However, the urban drainage deep tunnel project involved different cities, different regions, and different countries [5]. The safety of the shield construction in this project involves numerous and diverse risk factors, which require the risk identification methods to have strong adaptability. Compared with other methods, entropy-weighted frequency can dynamically adjust the weights based on the content of a single document, which enables it to better reflect the importance of words within the document [26]. Moreover, this method pays more attention to the distribution of words in a single document and can reflect the specific content of the document [27]. Furthermore, this method can also assign different weights to the words in each document based on their distribution, making the weight allocation more diverse and detailed. Moreover, this method can take into account the entropy of word distribution, which provides stronger resistance to interference from these noises [28]. There are few studies that apply entropy-weighted frequency analysis to extract the safety risk characteristics of the shield construction of urban drainage deep tunnels.

3. Methodology

3.1. Research Framework for Risk Factor Identification Based on Text Mining

In order to identify the key risk factors in shield construction of urban deep tunnel drainage projects, this study used text mining technology. The text mining process mainly includes data collection, text preprocessing, data mining, and result visualization [21]. First, the identification of risk factors based on text mining required basic text data. We obtained the safety management materials for the shield construction of the urban drainage deep tunnel project through various means, such as online searches and on-site investigations. Since these safety management materials did not have a unified format or expression, a corpus was established by screening the risk statements. Then, the Jieba toolkit in Python 3.12 was used to construct the word bank, perform word segmentation, and develop a professional word database to improve the accuracy of word segmentation. Then, entropy-weighted frequency was used to conduct parameterized extraction of frequent words, thereby enhancing the accuracy and efficiency of feature item extraction. Finally, from the most frequently used words, words with risk-related meanings were selected to establish a set of safety risk factors. The key risk factors were classified according to the 4M1E theory. The research framework for risk factor identification is shown in Figure 1.

3.2. Feature Selection Model of Risk Factors

Feature selection of the initial feature items was carried out by combining the word frequency analysis method and manual screening. First, high-frequency words were screened out from the initial feature items according to the feature value threshold. Then, through manual identification, words containing security risk factors were extracted from high-frequency words.
The characteristic parameters of the case text of safety accidents in shield construction of urban drainage deep tunnels mainly included Term Frequency (TF), Document Frequency (DF), and Document Frequency–Inverse Document Frequency (TF–IDF). Furthermore, in order to improve the accuracy of the research results, in this study, information entropy was introduced into the text mining method, and Term Frequency–Information Entropy (TF–H) was also used as a feature selection index.
The high-frequency vocabulary definition formula was adopted to set the threshold of high-frequency vocabulary. T represents the threshold of high-frequency vocabulary. I 1 represents the number of words related to risk factors that had only appeared once in all safety accident investigation reports. Then, the threshold calculation formula was as follows:
T = 1 + 1 + 8 × I 1 2
Word frequency represents the frequency of a certain word’s occurrence in a certain case text and is expressed by the following calculation formula:
t f i , j = n i , j k n k , j
where n i , j represents the occurrence frequency of the i-th word in the j-th case text. k n k , j represents the total number of occurrences of all feature items in all case texts.
n i represents the occurrence frequency of the characteristic item representing the i-th safety risk factor of shield construction in the urban drainage deep tunnel in all safety accident investigation reports. T F i = n i . When the TF value is higher, it indicates that this risk factor characteristic item contributes more to the occurrence of safety accidents.
D F i represents the document frequency of the occurrence of the characteristic item representing the i-th safety risk factor of the shield construction of the urban drainage deep tunnel. D F i = j : t i d j represents the number of documents containing the i-th characteristic item in all safety accident investigation reports.
TFIDF was used to assess the significance of a risk factor in the investigation report of safety accidents in the shield construction of urban drainage deep tunnels. If this risk factor appears more frequently in all accident investigation reports, it indicates that the discrimination of this risk factor is poorer and its importance is lower. D represents the number of documents in the document collection. Then, the calculation formula of TFIDF is as follows:
I D F i = log D j : t i d j
T F i I D F i = n i , j k n k , j × log D j : t i d j
TF–H reflects the distribution of the characteristic items representing risk factors in all safety accident investigation reports. If the words representing risk factors are distributed more evenly in the accident investigation report, it indicates that such risk factors occur frequently. p i represents the probability distribution of the occurrence of feature items in all safety accident investigation reports. Its calculation formula is as follows:
p i = T F j i j = 1 m T F j i
where T F j i represents the frequency of appearance of the characteristic items of safety risk factors in the shield construction of the i-th urban drainage deep tunnel project in the j-th safety accident investigation report.
Information entropy is denoted by H i , representing the distribution degree of the characteristic items of safety risk factors in the shield construction of the i-th urban drainage deep tunnel in the safety accident investigation report. If its value is larger, the uncertainty of the occurrence of risk factors is greater. Its calculation formula is as follows:
H i = p i log p i
By integrating information entropy and Term Frequency, the functional equation of TFH is obtained.
T F H = n i × p i log p i
Drawing on previous research results, the initial feature items with cumulative TFH values within the range of 0% to 90% were recorded as high-frequency words, and the rest were regarded as low-frequency words.

4. Data Analysis

4.1. Sample

In order to improve the generalizability of the method and the external validity of the results, this study collected cases of shield construction safety accidents and accident investigation reports of urban drainage tunnel projects in different countries and regions around the world over the past ten years through channels such as news reports, announcements, safety management-related websites, and on-site investigations. A total of 176 accident cases were collected. The types of accidents include object strike accidents, collapse accidents, water inrush accidents, explosion accidents, poisoning and asphyxiation accidents, falls from heights accidents, fire accidents, electric shock accidents, mechanical injury accidents, and other injury accidents. To ensure the authenticity and authority of the collected samples, the cases of shield construction safety accidents collected were verified through contacting the parties involved and relevant personnel from the safety management department. At the same time, experts in the industry were invited to review the time, completeness, and relevance of the collected samples, eliminating those with incomplete information, missing information, or an inability to fully reflect the characteristics of the accidents. For example, samples with duplicate data, excessive subjective descriptions, abnormal data, and those not having typical significance were excluded. Finally, 93 valid samples were obtained. In terms of accident types, there were 14 object impact accidents, 35 collapse accidents, 18 mechanical injury accidents, and 26 water gushing accidents.
In addition, most of the investigation reports on safety accidents in the shield construction of urban drainage tunnel projects were unstructured data. Due to human language writing errors, deviations or incompleteness in the records, etc., some incorrect words could be mixed in the investigation report text, which would affect the effect of text mining. Through text preprocessing, meaningless data, duplicate data, and defective data were processed; for example, the typos in the report were corrected, such as changing “shield excavator” and “shield drilling rig” to “shield machine” and “accident” to “safety accident”, etc.

4.2. Word Bank and Word Segmentation

4.2.1. Establishment of the Word Bank

This study adopted a dictionary-based word segmentation method to construct a safety risk word bank for shield construction in urban drainage deep tunnels. The word bank for the safety risks of urban drainage deep tunnel shield construction was mainly divided into special word banks and stop word banks, as shown in Figure 2.
In order to establish the security risk lexicon, the general dictionary of “Civil Construction” was downloaded from the Sogou input method, and “Safety Engineering”, “engineering construction”, and “Safety management” were downloaded from the Baidu lexicon to form a security risk lexicon. Furthermore, the text file, userdict.txt, was created to store the dictionary in the security risk database, and the load_userdic() method of Jieba was directly called to load the custom dictionary file when the Jieba was used.
In terms of establishing the custom vocabulary database, since the general vocabulary database cannot meet the requirements for the text mining of safety factors in the shield construction of urban drainage deep tunnels, relevant terms extracted from industry standards and specifications, such as “Urban Rail Transit Engineering Basic Terminology Standard GB/T 50833-2012” [29] and “Shield Tunnel Construction Technical Specifications” [30], were used to form a custom vocabulary database, for example, “Shield Tail Brush Wear” and “Synchronous Grouting Pressure”.
In terms of establishing a stop word library, to further enhance the efficiency of text mining, some words with no mining value can be disabled. The general dictionary mainly consists of three categories. Firstly, some descriptive words were included in the stopword dictionary. Secondly, punctuation marks and numerical forms of words were included. Thirdly, the Hit Stop Words and Baidu Stop Words lists were selected as supplementary dictionaries. In addition, in order to reduce the influence of construction sites, the “interval name” and “engineering name” were extracted, and the project names in the custom thesaurus were included in the stop word thesaurus to improve the efficiency of text mining.

4.2.2. Text Word Segmentation

Based on the collected samples, a corpus was established, and using the established security risk word library, custom word library, and stop word library, the Jieba segmentation package was employed to segment the corpus into word segments. The safety risk statements of the shield construction for the urban drainage deep tunnel project were segmented into individual words. The results of partial word segmentation are shown in Table 1.

4.3. Risk Factor Identification

The concept of cumulative word frequency was combined with the ABC classification method as the criterion for defining the threshold. The threshold for high-frequency words was determined based on the curve distribution of word frequency, word quantity, and the proportion of cumulative word frequency. The result of threshold definition is shown in Figure 3.
To extract high-frequency words from the initial feature items, three methods were adopted: the high-frequency word definition formula, the cumulative TF value, and the cumulative TF–H value. The threshold of high-frequency vocabulary was calculated by Formula (1), and the TF value and the TF–H value were calculated by Formulas (2)–(7). The calculation results are shown in Table 2.
When the high-frequency vocabulary definition formula was adopted, since the number of words that appear only once was relatively large, the number of high-frequency words was only 28. The security risk factors selected by using this method could have the problem of missing items. When the cumulative TF value was greater than 90%, the number of high-frequency words was 1216, and there was lexical redundancy. When the cumulative TF–H value was greater than 90%, there were 204 high-frequency words, which basically conformed to the proportion of 90% of the important factors in the ABC classification method. Compared with other methods, the high-frequency words extracted by the TF–H method were more accurate and effective. Table 3 shows the vocabularies with TF–H values greater than 80.
Among these extracted high-frequency words, some words had similarities, such as geological conditions, construction environment, and environmental impact. Meanwhile, there were also some terms that had little correlation with risk factors, such as construction progress, construction noise, laws and regulations, energy consumption, and lighting conditions. Thus, when identifying risk factors, it is also necessary to select high-frequency words that truly contain the meaning of risk factors based on the context and semantics. This study adopted the method of manual screening, comprehensively considering the characteristics of shield construction in urban drainage deep tunnel, and further screened out 34 high-frequency words containing the semantics of safety risk factors. The initial set of risk factors was shown in Table 4.

4.4. Risk Factor List

Through the application of text mining technology, 34 safety risk factors for shield construction in urban deep tunnel drainage projects were identified and screened from the collected valid samples. This study further used the principal component analysis method to reduce the dimensionality of these risk factors. The scatter plot is shown in Figure 4.
Combining the results of the cluster analysis with the 4ME1 theory, these risk factors were classified into four aspects: personnel management, mechanical equipment and materials, construction techniques, and surrounding environment, forming a list of safety risk factors for shield construction in urban drainage deep tunnel projects, as shown in Table 5.

5. Verification and Implication

5.1. Verification of Risk Factor Identification Methods

Based on the principles of text mining technology, this study established a risk word bank for the shield construction of urban drainage deep tunnels and extracted the characteristic items of risk factors using entropy weight and frequency. It identified the risk factors for the safety of shield construction in urban drainage deep tunnels. In order to verify the effectiveness of the proposed method, the risk factor identification results were compared with the analysis results of multiple urban drainage deep tunnel project shield construction safety accident reports collected from different regions. The results showed that most of the risk factors identified in this study appeared in the cause analysis of the safety accident reports.
In addition, in order to further verify the advantages of the method proposed in this study, the ROC curve was used for comparative analysis of the risk identification effects of entropy-weighted frequency, inverse document frequency, the LDA topic model, and the BERT text embedding method. The ROC curves of these four different methods were shown in Figure 5. According to the application principle of the ROC curve method, the evaluation indicators for assessing the effectiveness of risk factor identification mainly included accuracy, precision, recall rate, and F1–score. Based on the risk factor identification results of the four different methods, these four evaluation indicators were calculated. The results are shown in Table 6.
Compared with other methods, the entropy-weighted frequency method has the highest accuracy (AUC = 0.82) in extracting the safety risk factors for shield construction in urban drainage deep tunnel projects. This indicates that this method has a good application effect in identifying the safety risk factors of shield construction in urban drainage deep tunnel projects.

5.2. Management Implication

The TF–H value of the risk feature items indicated the significance of the corresponding risk factor in the text. The importance of these risk factors could be ranked according to the TF–H value, thereby providing a basis for the formulation of safety management measures for the shield construction of the urban drainage deep tunnel project.
Based on the list of safety risk factors for the shield construction of the urban drainage deep tunnel project, the importance of the risk types and risk factors could be ranked. In order to rank the importance of risk types, the TF–H values of risk factors in each risk type were added up, and the calculation results are as follows: The score for personnel management is 1689.56; the score for mechanical equipment and materials is 1060.51; the score for construction techniques is 1834.23; and the score for the surrounding environment is 1941.28. According to the calculation results, in the safety risk management of the shield construction of the urban drainage deep tunnel project, the risk of the surrounding environment > the risk of the construction technology > the risk of personnel management > the risk of the mechanical equipment and materials. Table 5 shows the importance ranking of risk factors in each of the different risk types.
Based on the above analysis, the importance ranking of the safety risk factors and their types for the shield construction of the urban drainage deep tunnel project was obtained. Based on this, in the safety management process of shield construction for urban drainage deep tunnels, the surrounding environmental risk management should be given priority. Before the project planning and construction, a comprehensive assessment of the surrounding environment is necessary, including geological conditions, underground pipelines, and surface buildings, in order to minimize the impact of environmental risks on the construction. Although the construction process risks are lower than those of the surrounding environment, they still remain a significant factor affecting construction safety. It is necessary to continuously optimize the construction process, adopt advanced construction technologies and methods, improve construction quality and efficiency, and reduce uncertainties during the construction. At the same time, the risk of personnel management cannot be ignored, as the operations and behaviors of the personnel directly affect construction safety. Strengthening the training and management of construction personnel, enhancing their safety awareness and skills, and ensuring that they can carry out construction in accordance with standard operating procedures is crucial. Furthermore, although the risks associated with machinery and materials are relatively low, they form the foundation for the smooth progress of the construction process. Ensuring the maintenance and proper quality control of machinery and materials is an important measure to reduce construction risks.

6. Conclusions

The shield construction of deep tunnels for urban drainage involves many risk factors that are difficult to manage. This study proposed a method for identifying the safety risk factors in the urban drainage deep tunnel shield construction based on the text mining method. Through practical application, 34 key risk factors were extracted from 93 safety accident reports, and they were classified into different risk types through cluster analysis. As a result, a safety risk list for the shield construction of the urban drainage deep tunnel project was formed. Furthermore, the ROC curve method was used to comparatively analyze the risk identification effects of the entropy-weighted frequency, inverse document frequency, LDA topic model, and BERT text embedding methods. The results showed that the entropy weight term frequency method proposed in this study had a better application effect in identifying the safety risk factors of shield construction in urban drainage deep tunnel projects. Furthermore, through the analysis of the importance of risk factors and risk types, a priority ranking of the safety risk factors and their types for the shield construction of the urban drainage deep tunnel project was proposed, providing theoretical guidance for the formulation of safety risk management measures.
This study still has some limitations, and these need to be improved and enhanced in future research. Firstly, the amount of data was relatively insufficient, which may limit the generalizability of the results. Secondly, during the text preprocessing stage, the insufficient coverage of the security risk lexicon constructed in this study affected the effectiveness of risk factor identification. Meanwhile, some words or phrases that are useful for risk identification were removed, which could affect the comprehensiveness of the mining results. Then, due to the influence of factors such as the text language mode, report writing style, the quantity and quality of the text, and time, the risk identification results were affected to some extent. The generalization ability of the model still needs to be strengthened and verified. Finally, given that different data sources may contain valuable information, future research should consider combining data from multiple sources to supplement the list of accident attributes.

Author Contributions

Conceptualization, K.H.; methodology, K.H. and J.W.; validation, J.W.; formal analysis, K.H.; data curation, K.H. and X.H.; writing—original draft preparation, K.H.; writing—review and editing, Z.C. and K.H.; project administration, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the 2022 Annual Research Plan Project of the Education Department of Hubei Province (B2022215).

Data Availability Statement

Data generated or analyzed during the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hu, K.; Wang, J.W.; Wu, H. Construction safety risk assessment of large-sized deep drainage tunnel projects. Math. Probl. Eng. 2021, 2021, 7380555. [Google Scholar] [CrossRef]
  2. Sun, J.Y.; Wu, X.W.; Wang, G.H.; He, J.G.; Li, W.T. The governance and optimization of urban flooding in dense urban areas utilizing deep tunnel drainage systems: A case study of guangzhou, China. Water 2024, 16, 2429. [Google Scholar] [CrossRef]
  3. Hu, K.; Wang, J.W.; Wu, D.H.; Wang, Y.A. Risk assessment of small-diameter shield construction in a deep drainage tunnel based on an ism-critic-cloud model. Buildings 2024, 14, 3920. [Google Scholar] [CrossRef]
  4. Hyun, K.C.; Min, S.; Choi, H.; Park, J.; Lee, I.M. Risk analysis using fault-tree analysis (fta) and analytic hierarchy process (ahp) applicable to shield tbm tunnels. Tunn. Undergr. Space Technol. 2015, 49, 121–129. [Google Scholar] [CrossRef]
  5. Ding, L.; Sun, Y.J.; Zhang, W.Z.; Bi, G.; Xu, H.Z. Stress monitoring of segment structure during the construction of the small-diameter shield tunnel. Sensors 2023, 23, 8023. [Google Scholar] [CrossRef]
  6. Jiang, J.; Liu, G.Y.; Ou, X.D. Risk coupling analysis of deep foundation pits adjacent to existing underpass tunnels based on dynamic bayesian network and n-k model. Appl. Sci. 2022, 12, 10467. [Google Scholar] [CrossRef]
  7. Isah, M.A.; Kim, B.S. Question-answering system powered by knowledge graph and generative pretrained transformer to support risk identification in tunnel projects. J. Constr. Eng. Manag. 2025, 151, 04024193. [Google Scholar] [CrossRef]
  8. He, K.; Zhu, J.; Wang, H.; Huang, Y.L.; Li, H.J.; Dai, Z.S.; Zhang, J.X.; Stathopoulos, T. Safety risk evaluation of metro shield construction when undercrossing a bridge. Buildings 2023, 13, 2540. [Google Scholar] [CrossRef]
  9. Xu, N.; Guo, C.R.; Wang, L.; Zhou, X.Q.; Xie, Y. A three-stage dynamic risk model for metro shield tunnel construction. KSCE J. Civ. Eng. 2024, 28, 503–516. [Google Scholar] [CrossRef]
  10. Liu, P.; Jin, X.Q.; Shang, Y.T. Ontology-based automated knowledge identification of safety risks in metro shield construction. J. Asian Archit. Build. Eng. 2025, 1–30. [Google Scholar] [CrossRef]
  11. Li, X.W.; Li, S.C.; Yuan, J.F.; Wan, Z.; Liu, X. A data-driven and knowledge graph-based research on safety risk-coupled evolution analysis and assessment in shield tunneling. Tunn. Undergr. Space Technol. 2025, 162, 106657. [Google Scholar] [CrossRef]
  12. Wu, K.P.; Zhang, J.S.; Huang, Y.L.; Wang, H.; Li, H.J.; Chen, H.H. Research on safety risk transfer in subway shield construction based on text mining and complex networks. Buildings 2023, 13, 2700. [Google Scholar] [CrossRef]
  13. Tang, C.; Shen, C.X.; Zhang, J.J.; Guo, Z. Identification of safety risk factors in metro shield construction. Buildings 2024, 14, 492. [Google Scholar] [CrossRef]
  14. Koc, K.; Gurgun, A.P. Stakeholder-associated life cycle risks in construction supply chain. J. Manag. Eng. 2021, 37, 4020107. [Google Scholar] [CrossRef]
  15. Fu, T.; Shi, K.B.; Shi, R.Y.; Lu, Z.P.; Zhang, J.M. Risk assessment of tbm construction based on a matter-element extension model with optimized weight distribution. Appl. Sci. 2024, 14, 5911. [Google Scholar] [CrossRef]
  16. Zhang, Z.X.; Wang, B.; Wang, X.F.; He, Y.T.; Wang, H.X.; Zhao, S.B. Safety-risk assessment for tbm construction of hydraulic tunnel based on fuzzy evidence reasoning. Processes 2022, 10, 2597. [Google Scholar] [CrossRef]
  17. Wu, B.; Chen, H.H.; Huang, W.; Meng, G.W. Dynamic evaluation method of the ew-ahp attribute identification model for the tunnel gushing water disaster under interval conditions and applications. Math. Probl. Eng. 2021, 2021, 6661609. [Google Scholar] [CrossRef]
  18. Shelake, A.G.; Gogate, N.G. An integrated risk prioritization and determination of activity-wise delay (irpad) framework for enhancing schedule management in tunnel projects. Eng. Constr. Archit. Manag. 2024; ahead-of-print. [Google Scholar]
  19. Zhou, M.K.; Tang, Y.G.; Jin, H.; Zhang, B.; Sang, X.W. Abim-based identification and classification method of environmental risks in the design of beijing subway. J. Civ. Eng. Manag. 2021, 27, 500–514. [Google Scholar] [CrossRef]
  20. Lu, L.Y.; Ji, M.L.; Wen, X.; Xiang, Y. An empirical study on construction emergency disaster management and risk assessment in shield tunnel construction project with big data analysis. Int. J. Data Min. Bioinform. 2024, 28, 406–425. [Google Scholar] [CrossRef]
  21. Brown, C.K.; Cameron, B.G. Assessing changes in reliability methods over time: An unsupervised text mining approach. Qual. Reliab. Eng. Int. 2024, 40, 3597–3619. [Google Scholar] [CrossRef]
  22. Vagnoli, M.; Remenyte-Prescott, R. An ensemble-based change-point detection method for identifying unexpected behaviour of railway tunnel infrastructures. Tunn. Undergr. Space Technol. 2018, 81, 68–82. [Google Scholar] [CrossRef]
  23. Luan, T.T.; Zhang, X.; Li, H.R.; Wang, K.; Li, X.Y. Dynamic risk analysis of hazardous materials highway tunnel transportation based on fuzzy bayesian network. J. Loss Prev. Process Ind. 2024, 92, 105443. [Google Scholar] [CrossRef]
  24. Zhou, Z.Y.; Guo, J.H.; Huang, J.H. Chemical safety risk identification and analysis based on improved lda topic model and bayesian networks. Appl. Sci. 2025, 15, 6197. [Google Scholar] [CrossRef]
  25. Macêdo, J.B.; Moura, M.D.; Aichele, D.; Lins, I.D. Identification of risk features using text mining and bert-based models: Application to an oil refinery. Process Saf. Environ. Prot. 2022, 158, 382–398. [Google Scholar] [CrossRef]
  26. Xu, N.; Ma, L.; Liu, Q.; Wang, L.; Deng, Y.L. An improved text mining approach to extract safety risk factors from construction accident reports. Saf. Sci. 2021, 138, 105216. [Google Scholar] [CrossRef]
  27. Liu, Y.P.; Wang, J.W.; Tang, S.R.; Zhang, J.J.; Wan, J.Y.J. Integrating information entropy and latent dirichlet allocation models for analysis of safety accidents in the construction industry. Buildings 2023, 13, 1831. [Google Scholar] [CrossRef]
  28. Song, W.Y.; Rong, W.; Tang, Y.Q. Quantifying risk of service failure in customer complaints: A textual analysis-based approach. Adv. Eng. Inform. 2024, 60, 102377. [Google Scholar] [CrossRef]
  29. GB/T 50833-2012; Basic Terminology Standard for Urban Rail Transit Engineering. General Administration of Quality Supervision, Inspection and Quarantine: Beijing, China, 2012.
  30. GB 50446-2008; Specifications for Construction and Acceptance of Shield Tunneling. Institute of Standards and Quantities of Ministry of Housing and Urban-Rural Development: Beijing, China, 2008.
Figure 1. Research framework for identifying safety risk factors in shield construction of deep-buried urban drainage tunnels.
Figure 1. Research framework for identifying safety risk factors in shield construction of deep-buried urban drainage tunnels.
Processes 13 02782 g001
Figure 2. The composition of the word bank.
Figure 2. The composition of the word bank.
Processes 13 02782 g002
Figure 3. High-frequency threshold determination based on TF–H.
Figure 3. High-frequency threshold determination based on TF–H.
Processes 13 02782 g003
Figure 4. Cluster scatter plot of safety risk factors for shield construction in the urban drainage deep tunnel project.
Figure 4. Cluster scatter plot of safety risk factors for shield construction in the urban drainage deep tunnel project.
Processes 13 02782 g004
Figure 5. The ROC curves of four risk-identifying methods.
Figure 5. The ROC curves of four risk-identifying methods.
Processes 13 02782 g005
Table 1. The results of partial word segmentation.
Table 1. The results of partial word segmentation.
No.Segmentation Result
1The geological conditions/of/the construction route/were/not adequately estimated/, and/the presence/of/hard strata/was/not promptly/identified/./
2The design/of/the lining structure/was/unreasonable/, and/during/the construction process/, the defects/in/the lining/were/not promptly detected/./
3The maintenance/of/mechanical equipment/was/inadequate/, failing/to/promptly detect/and/eliminate/potential faults/./
Table 2. The calculation result of feature items selection.
Table 2. The calculation result of feature items selection.
No.MethodThresholdHigh-Frequency Words NumberEstimate
1the high-frequency word definition formulaT = 3628There might be omissions.
2TFCumulative TF value ≥ 90%1216There might be redundant items.
3TF–HCumulative TF–H value ≥ 90%204Relatively reasonable
Table 3. TF–H values greater than 80, based on the high-frequency vocabulary list.
Table 3. TF–H values greater than 80, based on the high-frequency vocabulary list.
No.Characteristic ItemTF–HNo.Characteristic ItemTF–H
1Geological conditions996.2121Safety investment194.92
2Risk of water gushing835.6922Safety awareness184.56
3Collapse accident596.7523Protective measures177.32
4Shield machine failure553.3324Environmental impact171.15
5Construction plan505.5525Laws and regulations155.12
6Safety management466.6726Quality control152.21
7Emergency response426.3727Communication and coordination147.79
8Risk assessment411.1228Construction machinery144.42
9Early warning system388.5329Resource allocation143.36
10Safety training365.8930Formulation of contingency plans132.21
11Monitoring and surveillance355.4631Earthquake safety111.56
12Equipment maintenance332.6732Construction noise105.63
13Construction environment311.5333Dust pollution100.87
14Underground pipeline294.4534Lighting conditions95.34
15Casualties269.0435Energy consumption94.28
16Structural stability251.5436safe distance91.11
17Construction progress244.4837Tunnel ventilation88.86
18Material quality236.8938Foundation treatment88.21
19Construction technology212.7439Soil improvement85.34
20Management system202.1840Support structure81.87
Table 4. The initial set of safety risk factors for urban drainage deep tunnel shield construction.
Table 4. The initial set of safety risk factors for urban drainage deep tunnel shield construction.
No.High-Frequency WordsRisk FactorTF–HInterpretation of Risk Factors
R1Geological conditionsThe geological and hydrological conditions are poor.996.21The natural conditions, such as soil, rocks, and hydrology, at the construction site have affected the safety and stability of shield construction.
R2Risk of water gushingWater and sand gushing out of the tunnel835.69Groundwater suddenly rushed into the tunnel, causing waterlogging in the tunnel and interruption of construction.
R3Collapse accidentSurface subsidence and collapse596.75Sudden subsidence of the ground or tunnel structure endangers construction safety and surrounding buildings.
R4Shield machine failureThe shield machine equipment malfunctioned.553.33Mechanical failures that occur during the construction process of shield machines affect the construction progress and safety.
R5Construction planThe construction site was poorly organized.505.55The specific plans and methods of shield construction directly affect construction safety.
R6Safety managementThe investigation of potential safety hazards was inadequate.466.67The safety management system of the shield construction site, including safety regulations and rules, safety training, etc.
R7Monitoring and surveillanceInadequate monitoring355.46Continuously observe and record the process and results of shield tunneling construction to improve the construction quality.
R8Equipment maintenanceThe correction of the shield machine’s advancement was not timely.332.67Regular inspection and maintenance of shield construction equipment should be carried out.
R9Underground pipelineDamage to underground pipelines294.45The impact and protection of underground pipelines during shield construction
R10Material qualityDamaged segments236.89The quality standards and usage conditions of materials used in shield construction
R11Safety awarenessThe safety awareness of construction workers is insufficient.184.56The awareness and emphasis of shield construction personnel on construction safety
R12Protective measuresThe safety protection of construction workers is insufficient.177.32Specific protective measures taken to prevent accidents from happening
R13Quality controlStarting base, rails152.21Measures and procedures to ensure that the construction quality of the base meets the standards
R14Tunnel ventilationThe installation accuracy is not high.88.86The ventilation situation inside the tunnel during the tunneling process
R15Soil improvementThe exhaust equipment is not set up reasonably.85.34Improve the properties of soil by physical, chemical, or biological methods to enhance its bearing capacity.
R16Support structureImproper reinforcement of the soil at the cave entrance81.87Structures used to support soil in shield construction, such as steel pipes, wooden braces, etc.
R17Operating proceduresThe installation accuracy of the reaction frame is not high.79.21Operating procedures for shield machines
R18BaseThe tunneling parameters are set improperly.76.53The installation status of the shield machine base
R19Construction waste soilThe base is damaged.67.75The soil generated during the shield machine’s tunneling process and its transportation conditions
R20SegmentThe efficiency of construction waste transportation is low.55.56The installation status of segments during the shield machine’s tunneling process
R21AssemblyThe installation accuracy of the negative ring tube sheet is not high.50.21The installation status of segments during the shield machine’s tunneling process
R22Soil stabilityThe segments were assembled improperly.43.33The stability of the soil during shield construction
R23CollapseThe cave entrance is unstable.43.31Sudden subsidence of the ground or tunnel structure
R24GroutingThe starting well collapsed.32.54The grouting situation of the tunnel during the shield tunneling stage
R25Shield cutting toolImproper grouting control25.69Cutting tools for shield machines
R26Ground upliftThe cutter head and cutting tools have worn out and failed.21.23The surface rises due to underground construction.
R27AxisThe reinforcement of the working face is insufficient.16.75Whether the tunnel control axis meets the requirements
R28HoistingReceiving axis deviation14.32Hoisting of mechanical equipment during construction
R29SealImproper hoisting of the shield machine equipment11.24Whether the sealing condition of the opening meets the requirements
R30LiquefactionThe sealing of the opening is not in place.10.54The soil loses stability due to the rise of the groundwater level.
R31Cave gateSoil loss at the cave entrance9.34Construction procedures and quality of door openings
R32SplitThe process of chiseling the cave door was unreasonable.9.25Shield machine equipment separation
R33Receiving wellThe separation of the shield machine equipment is not in place.7.69The construction quality of the receiving well at the arrival stage
R34Receiving baseThe receiving well collapsed.7.26Upon arrival, receive the construction quality of the base.
Table 5. List of safety risk factors for urban drainage deep tunnel shield construction.
Table 5. List of safety risk factors for urban drainage deep tunnel shield construction.
No.Risk TypeRisk FactorTF–H
1Personnel managementThe construction site was poorly organized.505.55
2The investigation of potential safety hazards was inadequate.466.67
3Inadequate monitoring355.46
4The safety awareness of construction workers is insufficient.184.56
5The safety protection of construction workers is insufficient.177.32
6Mechanical equipment and materialsThe shield machine equipment malfunctioned.553.33
7Damaged segments236.89
8The exhaust equipment is not set up reasonably.88.86
9The tunneling parameters are set improperly.79.21
10The base is damaged.76.53
11The cutter head and cutting tools have worn out and failed.25.69
12Construction technologyWater and sand gushing out of the tunnel835.69
13The correction of the shield machine’s advancement was not timely.332.67
14The installation accuracy of the receiving base is not high.152.21
15Improper reinforcement of the soil at the cave entrance85.34
16The installation accuracy of the reaction frame is not high.81.87
17The efficiency of construction waste transportation is low.67.75
18The installation accuracy of the negative ring tube sheet is not high.55.56
19The segments were assembled improperly.50.21
20The starting well collapsed.43.31
21Improper grouting control32.54
22The reinforcement of the working face is insufficient.21.23
23Receiving axis deviation16.75
24Improper hoisting of the shield machine equipment14.32
25The sealing of the opening is not in place.11.24
26The process of chiseling the cave door was unreasonable.9.34
27The separation of the shield machine equipment is not in place.9.25
28The receiving well collapsed.7.69
29The installation accuracy of the starting base is not high.7.26
30Surrounding environmentThe geological and hydrological conditions are poor.996.21
31Surface subsidence and collapse596.75
32Damage to underground pipelines294.45
33The cave entrance is unstable.43.33
34Soil loss at the cave entrance10.54
Table 6. The calculation results of the evaluation indicators for risk identification effectiveness.
Table 6. The calculation results of the evaluation indicators for risk identification effectiveness.
MethodAccuracyPrecisionRecallF1–Score
TF–H0.820.830.860.83
TF–IDF0.730.710.740.78
LDA0.760.780.810.75
BERT0.710.790.820.81
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, K.; Wang, J.; Hu, X.; Cheng, Z. Identification of Safety Risk Factors for Shield Construction in Urban Drainage Deep Tunnel Based on Text Mining. Processes 2025, 13, 2782. https://doi.org/10.3390/pr13092782

AMA Style

Hu K, Wang J, Hu X, Cheng Z. Identification of Safety Risk Factors for Shield Construction in Urban Drainage Deep Tunnel Based on Text Mining. Processes. 2025; 13(9):2782. https://doi.org/10.3390/pr13092782

Chicago/Turabian Style

Hu, Kai, Junwu Wang, Xuetao Hu, and Zhiyuan Cheng. 2025. "Identification of Safety Risk Factors for Shield Construction in Urban Drainage Deep Tunnel Based on Text Mining" Processes 13, no. 9: 2782. https://doi.org/10.3390/pr13092782

APA Style

Hu, K., Wang, J., Hu, X., & Cheng, Z. (2025). Identification of Safety Risk Factors for Shield Construction in Urban Drainage Deep Tunnel Based on Text Mining. Processes, 13(9), 2782. https://doi.org/10.3390/pr13092782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop