Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data

Liu, Shu; Yan, Weidong; Liu, Guoqi; Zhang, Rui

doi:10.3390/buildings16061243

Open AccessArticle

Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data

¹

School of Civil Engineering, Shenyang Jianzhu University, Shenyang 110168, China

²

School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(6), 1243; https://doi.org/10.3390/buildings16061243 (registering DOI)

Submission received: 3 February 2026 / Revised: 13 March 2026 / Accepted: 17 March 2026 / Published: 21 March 2026

(This article belongs to the Topic Advancing Construction Safety and Health: Innovations and Strategies)

Download

Browse Figures

Versions Notes

Abstract

Construction safety texts are commonly used only for descriptive statistical analysis, and systematic approaches for uncovering latent semantic risk correlations remain limited. In particular, risk identification and prioritization under unlabeled conditions remain challenging. To address this issue, this study proposes a semantic risk association and ranking framework based on Sentence-BERT (SBERT). First, a domain-specific keyword library is constructed, and representative risk terms are extracted through tokenization, stop-word removal, and TF-IDF weighting. A fine-tuned SBERT model is then employed to generate sentence embeddings. FAISS-based similarity search is applied to match safety inspection records with historical accident reports, enabling automatic identification and ranking of the most relevant accident types. In addition, a seven-day inspection window is introduced to capture the temporal accumulation effect of hazards and support risk assessment without explicit labels. Experiments conducted on 1368 accident reports and 484 inspection records show that the proposed framework achieves an accuracy of 0.75, a recall of 1.00, and an F1-score of 0.8571. Cross-project validation yields an F1-score of 0.5607, and the performance remains stable under 10% noise interference. The results demonstrate that the proposed semantic risk association and ranking framework is effective and robust for practical construction safety management.

Keywords:

construction safety; Sentence-BERT; text mining; semantic similarity; risk ranking; FAISS index

1. Introduction

The construction industry is characterized by a high incidence of accidents, often resulting in severe casualties and significant economic losses. In daily project management, numerous safety inspection records are generated on construction sites. These short and unstructured texts contain valuable information about potential hazards and therefore provide important insights for accident prevention and risk identification. However, due to fragmented content and the lack of standardized risk labels, traditional keyword-matching or rule-based approaches have limited capability in semantic understanding and implicit relationship extraction. In addition, most construction projects do not maintain standardized risk-level annotation datasets, which restricts the practical application of supervised learning models. Existing studies also pay limited attention to the temporal accumulation of hazards, making it difficult to characterize the dynamic evolution from minor risks to major accidents.

With the development of deep learning and natural language processing (NLP), text mining techniques have gradually been applied to construction safety research. Zhou et al. [1] systematically reviewed the potential of NLP applications in the construction domain. Tixier et al. [2] applied machine learning methods such as random forests and gradient boosting trees to accident injury prediction. Tixier et al. [3] and Ding et al. [4] developed artificial intelligence-based accident prediction frameworks and explored automatic extraction of accident precursors from text. Tian et al. [5] proposed a deep semantic analysis-based question–answering system for safety hazards. Zhang et al. [6] applied text mining and natural language processing techniques to construction safety texts, enabling automated content analysis of unstructured accident data.

In recent years, BERT-based models have been widely adopted in construction safety text analysis. Liu et al. [7] combined BERT with a tree-augmented naïve Bayes approach for accident risk factor identification. Shin et al. [8] developed an accident report retrieval model based on KLUE-BERT. Other studies have focused on named entity recognition and domain adaptation tasks [9,10,11,12]. Although these methods improved semantic representation capability, several limitations remain. Existing approaches lack efficient cross-text semantic association modeling, rely heavily on manually labeled data, and rarely consider the temporal accumulation effect of hazards. In addition, their computational efficiency and generalization ability may not satisfy the real-time analysis requirements of large-scale engineering data. Generative language models have been explored for equipment fault identification and accident visualization [13,14]; however, for large-scale semantic similarity computation tasks, their efficiency is generally lower than that of Sentence-BERT.

In summary, current research on construction safety text analysis presents three main limitations: (1) insufficient modeling of deep cross-text semantic associations; (2) strong dependence on labeled datasets; and (3) limited consideration of hazard accumulation mechanisms over time, along with efficiency constraints in large-scale deployment scenarios.

To address these issues, this study proposes a construction safety text semantic analysis and time-aware risk prediction framework based on Sentence-BERT (SBERT). SBERT employs a siamese network structure to generate sentence embeddings, improving computational efficiency while maintaining strong semantic representation capability [15,16]. Compared with traditional BERT variants [17,18], SBERT is more suitable for large-scale text similarity computation and retrieval tasks, thereby meeting the practical requirements of construction inspection data analysis.

The main contributions of this study are as follows:

(1): A cross-text semantic association method for construction safety based on SBERT is developed, combined with FAISS vector indexing to improve retrieval efficiency between safety inspection records and historical accident reports.
(2): A keyword-rule-based automatic risk level inference mechanism is designed to enable risk assessment under unlabeled conditions.
(3): A “7-day inspection window” strategy is proposed to characterize the temporal accumulation effect of construction hazards.
(4): A complete risk identification and management support workflow is constructed, including risk matching, level assessment, and priority ranking.

A case study based on 1368 accident reports and 484 safety inspection records was conducted. The results indicate that the proposed framework is feasible for semantic association analysis and engineering risk management applications, and it can provide technical support for construction site safety management.

2. Literature Review

Extensive research has been conducted by scholars worldwide on construction safety text analysis, which can generally be categorized into three main research directions.

2.1. Accident Prediction Based on Machine Learning

Machine learning represents one of the earliest technologies applied to construction safety text analysis. Tixier [2] was among the first to employ models such as random forests and stochastic gradient boosting trees to identify injury types, energy sources, and affected body parts from accident narratives. Through feature engineering, they developed more effective predictive models, achieving RPSS values ranging from 0.236 to 0.436, which significantly outperformed traditional parametric models. Building on this work, subsequent studies [2,3] further explored machine learning-based approaches for safety outcome prediction, improving model performance and applicability across different projects. In addition, natural language processing techniques have been applied to unstructured accident reports for safety knowledge discovery, particularly in the extraction of injury precursors and early warning features [3,4].

In recent years, machine learning models have further improved in terms of classification accuracy and sample adaptability. Gupta [19] proposed an accident classification method based on a Contextual Content Network (CCNet), enhancing classification performance in complex scenarios through semantic association modeling. Shuang [20] addressed data imbalance by proposing an improved deep learning classification model, significantly enhancing the recognition accuracy of minority accident categories. Pothina [9] improved accident classification performance in mining safety texts by optimizing contextual representations. Li [16] combined deep learning and text mining techniques to achieve automatic classification and key information extraction from construction accident narratives, further expanding the application scope of machine learning models. Despite these advances, most studies are primarily designed for English texts and exhibit limited semantic understanding of Chinese texts, particularly in handling polysemy and semantic ambiguity. Moreover, these methods generally lack adaptability to unlabeled data and fail to consider the cumulative effects of risks over time.

2.2. Knowledge Extraction Based on Natural Language Processing

To address the challenge of structuring safety knowledge, some studies have applied natural language processing (NLP) techniques for domain knowledge extraction. Xun [21] proposed a rule-based Chinese NLP approach that constructs linguistic patterns through lexical analysis and syntactic dependency parsing, extracting domain knowledge elements (DKEs) such as “cause–effect” and “hazard–measure” relationships from Chinese safety management texts. Their study revealed linguistic characteristics of Chinese safety texts, such as the predominance of 2–6 character noun phrases. Zhang et al. [6] applied text mining and natural language processing techniques to construction accident texts, enabling structured extraction of safety knowledge and facilitating knowledge discovery from unstructured data.

Current research on knowledge extraction demonstrates multidimensional development trends. First, knowledge graph techniques have been widely applied. Wu [22] constructed a construction accident knowledge graph using deep learning to visualize accident relationships. Chen [10] developed a knowledge graph for safety management standards in hydraulic engineering, providing technical support for regulatory compliance inspection. Zhang [23] constructed a collision knowledge graph for autonomous navigation vessels using an enhanced BERT model, extending knowledge extraction methods to broader engineering domains. Second, named entity recognition (NER) techniques have been further advanced. Zhou [24] proposed a deep learning-based NER method for construction impact accidents to accurately extract key accident information. Luo [25] developed a specialized NER model for long-form safety reports in prefabricated construction, while Xu [13] applied pre-trained language models to entity recognition in coal mine construction safety. Third, progress has been made in knowledge matching techniques. Im, M.-I. [26] proposed an NLP-based method for automatically matching construction codes with safety hazards, improving regulatory compliance in safety management. However, these approaches have notable limitations. On the one hand, the flexibility and variability of Chinese grammar result in high rule construction and maintenance costs. On the other hand, rule-based models exhibit weak generalization ability, struggle to cover diverse construction text scenarios (e.g., inspection records, accident reports, and safety briefings), and suffer from low retrieval efficiency, making them unsuitable for large-scale text analysis.

2.3. Applications of Deep Learning in Construction Text Analysis

In recent years, deep learning models have gradually replaced traditional methods and become the dominant approach in construction safety text analysis. Tian [5] developed an intelligent safety hazard question-answering system based on deep semantic mining, improving the efficiency of safety knowledge retrieval through semantic matching algorithms and addressing the limitations of traditional retrieval systems. Luo [27] developed machine learning-based models to predict the severity of construction collapse accidents, highlighting the potential of data-driven methods for risk-informed safety management. Zhang [11] proposed a BERTopic-based text mining method that combines pre-trained language models with topic modeling to reveal organizational vulnerabilities in construction project accidents. Smetana [12] applied large language models to highway construction safety analysis, enabling automatic classification of accident causes.

The application of deep learning in construction text analysis exhibits several emerging characteristics. First, BERT-series models have been extensively adopted. Mohamed Hassan [18] developed a BERT-based question-answering system for construction accident reports. Lee [28] combined BERT with graph models to construct a construction risk assessment knowledge base. Zhang [29] integrated construction scene graphs with BERT-based domain knowledge to enable automatic hazard identification. Shen [30] conducted hazard classification for collapse accidents using BERT models. Sadick [31] compared the performance of BERT and RoBERTa in accident report classification, while Pan [8] proposed a sentence resampling BERT-CRF model to improve causal analysis accuracy in autonomous driving accidents. Second, generative models and large language models have been explored. Ray [32] applied generative language models to equipment fault identification. Kim [14] developed an LLM-driven personalized construction safety training question-answering system. Ahmadi [33] explored the application of large language models in construction accident report analysis. Yoo [34] utilized generative pre-trained Transformers for accident prediction and saliency visualization. Third, hybrid deep learning models have been developed. Liu [8] integrated BERT with tree-augmented Naive Bayes. Gong [35] combined graph databases with language models for dam safety emergency decision-making. Shi [36] proposed an ontology-based TextCNN accident prediction model. Luo [37] developed a risk factor extraction model for chemical accidents. Zhou [38] integrated knowledge and deep learning to generate metro construction risk response measures. Jia [39] combined text classification with accident causation theory to analyze coal mine gas explosion accidents. Liu [40] integrated scene graphs and information extraction techniques for construction hazard identification.

Despite these advances, several limitations remain. First, deep semantic associations across texts are insufficiently exploited, with most studies focusing on single-text analysis. For example, Huang [41] analyzed hazardous goods transportation accidents in waterways, and Fang [42] performed near-miss event text classification, both lacking efficient cross-text semantic similarity computation. Second, many approaches rely heavily on structured labeled data, such as Tian [43] for large-scale project hazard classification and Chen [44] for accident causation classification, limiting their applicability to unlabeled safety inspection records. Third, model efficiency and real-time performance remain inadequate; although complex BERT variants can improve accuracy, they struggle to meet real-time analysis requirements at construction sites. Fourth, the cumulative effect of hazards over time is rarely considered, resulting in insufficient prediction lead time.

Most existing studies focus on accident text analysis, while semantic association mining of latent hazards in construction safety inspection records remains underexplored. Furthermore, integrated consideration of unlabeled scenarios, cumulative risk effects, and real-time requirements is lacking. Sentence-BERT, as an efficient sentence embedding model, has demonstrated superior performance in semantic similarity computation and text matching tasks [7], with Gao [45] further reviewing its technical advances in sentence embedding. However, its systematic application in construction safety—particularly within a full-process framework integrating unlabeled data adaptation, cumulative effect modeling, and efficient retrieval—remains limited. Compared with BERT-based models adopted in existing studies [15,16,18,26,31,32,33,34], Sentence-BERT offers significant efficiency advantages and is more suitable for large-scale semantic association analysis of safety inspection texts. To address this research gap, this study integrates Sentence-BERT with FAISS indexing, 7-day window aggregation, and automatic risk-level inference to construct a full-process intelligent analysis framework, enabling semantic association between unstructured inspection records and historical accidents and filling the existing methodological gap.

3. Materials and Methods

To enable associative risk identification between construction site safety inspection records and historical accident data, and to improve both the accuracy and timeliness of safety early warning, this study develops a Sentence-BERT-based temporal risk prediction model for construction sites (Sentence-BERT-based Temporal Risk Prediction Model for Construction Sites, SBERT-TRPM). The proposed model establishes a three-layer technical framework comprising text representation, semantic matching, and risk localization. The overall workflow of the framework is illustrated in Figure 1. Specifically, the framework constructs a foundational textual feature space through keyword extraction, generates deep semantic embeddings using Sentence-BERT, and integrates FAISS indexing to achieve efficient associative matching between inspection records and accident data. Finally, risk prediction results are produced through a risk-ranking mechanism. The principles and implementation details of each core module are described as follows.

3.1. Text Preprocessing

Text preprocessing aims to convert unstructured textual data into standardized formats, laying a solid foundation for subsequent semantic analysis. This process mainly consists of two key stages: text cleaning with tokenization and stop-word removal, and keyword extraction based on the TF–IDF weighting scheme.

3.1.1. Text Cleaning, Tokenization, and Stop-Word Removal

To ensure text quality and effective tokenization, regular expressions were first applied to clean both the inspection texts to be analyzed and the historical risk event texts, removing non-Chinese and non-numeric characters while handling missing values and anomalous content. Subsequently, the Jieba tokenizer (version 0.42.1) [46] was employed for word segmentation, and a domain-specific construction safety lexicon was incorporated to improve the segmentation accuracy of professional terms such as “vertical pole spacing,” “scaffold collapse,” “pressure relief valve leakage,” and “pipeline corrosion.”

Based on a general stop-word list and supplemented with domain-specific low-discriminative terms (e.g., “inspection,” “record,” and “report”), redundant information was further removed. In addition, tokens with a length of one character or less were filtered out, and only core parts of speech—namely nouns, verbs, and adjectives—were retained. This process resulted in standardized, high-quality tokenized texts and word lists, providing a reliable data foundation for subsequent feature extraction.

3.1.2. TF-IDF Keyword Extraction

The term frequency–inverse document frequency (TF-IDF) model was employed to quantify the semantic contribution of individual terms. TF-IDF is a classical term-weighting method in the field of text mining, which measures term importance by coupling term frequency with inverse document frequency [47]. The core formulation of the TF-IDF model is presented in Equation (1).

T F - I D F (t, d) = T F (t, d) \times I D F (t, d) = \frac{n_{t, d}}{\sum_{t^{'} \in d} n_{t^{'}, d}} \times \log \frac{|D|}{1 + |d \in D : t \in d|}

(1)

In this work, the term frequency (TF) of term

t

in document

d

, denoted as

T F (t, d)

, is defined as the ratio of the number of occurrences

n_{t, d}

of

t

in

d

to the total number of terms in

d

. Correspondingly, the inverse document frequency (IDF) of term

t

, denoted as

I D F (t, D)

, is defined as a metric based on

D

(the total number of documents in the dataset), where

|d \in D : t \in d|

represents the number of documents in

D

that contain term

t

. The addition of 1 in this formulation prevents a zero denominator—a standard optimization technique widely adopted in the text mining field [1,3].

To improve the discriminative capability of the extracted keywords, a two-stage filtering strategy was employed.

High-frequency term filtering: Generalized terms appearing in more than 85% of the texts (e.g., “equipment” and “safety”) were removed. This threshold was determined through systematic pre-experiments by testing six gradient thresholds within the range of 80–90% (80%, 82%, 84%, 85%, 88%, and 90%). The results indicated that the 85% threshold most effectively eliminated domain-generic terms while preserving text-specific key information;
Weight-based ranking: For each text, terms were ranked in descending order according to their TF-IDF weights, and the Top-5 keywords were retained. The optimal number of keywords was validated through pre-experimental optimization by evaluating semantic matching performance with retained keyword sizes ranging from 3 to 10. Manual annotation of the core semantics of 100 sample texts was used for comparison. When five keywords were retained, the matching degree between the extracted keywords and the core text semantics reached 0.92, significantly outperforming other settings (0.78 for three keywords and 0.86 for ten keywords). Accordingly, Top-5 was selected as the optimal setting.

Finally, the keywords extracted from all texts were used as feature dimensions to construct a sparse matrix as the basic text representation, with a dimensionality of

N \times M

, where

N

denotes the total number of texts and

M

represents the total number of unique keywords extracted across the corpus.

3.2. Automatic Risk Level Inference

To address unlabeled data scenarios, a keyword rule-based automatic risk level inference mechanism was designed. Three risk levels—high, medium, and low—were defined by constructing corresponding keyword sets for accident texts and inspection texts, respectively, ensuring consistency and rationality in risk level definitions.

For accident texts, the high-risk keyword set includes terms such as “fatality,” “fall,” and “collapse,” while the medium-risk keyword set includes “injury,” “fracture,” and “electric shock.” Texts containing any high-risk keywords are labeled as high risk; those containing medium-risk keywords are labeled as medium risk; otherwise, they are labeled as low risk.

For inspection texts, the high-risk keyword set includes “fall,” “lifting operations,” and “scaffolding,” whereas the medium-risk keyword set includes “material stacking,” “temporary electricity,” and “electric leakage.” The inference logic remains consistent with that used for accident texts. Through this mechanism, risk levels are automatically assigned to unlabeled texts, effectively overcoming the limitations of traditional models, which rely on manually labeled samples.

The construction of the risk keyword set was conducted in three stages:

(1): Preliminary candidate selection: High-weight terms identified by TF–IDF across the complete corpus of accident reports and inspection records, together with frequently occurring risk-related terms, were statistically screened to generate an initial candidate list.
(2): Expert validation: Two construction safety experts, each with more than five years of professional experience in safety management, reviewed the candidate terms and classified them into corresponding risk levels. Irrelevant or ambiguous terms were removed during this process.
(3): Consistency verification: A random sample of 200 texts was manually annotated for risk levels and compared with the rule-based inference results. The Cohen’s Kappa coefficient was 0.87, indicating a high level of agreement and reliability of the rule-based classification mechanism.

Based on this procedure, a three-tier risk keyword set, comprising high-, medium-, and low-risk categories, was ultimately established.

3.3. Semantic Embedding and FAISS Index Construction

Semantic embedding and index construction constitute the core components for achieving efficient semantic matching. This process mainly consists of two stages: semantic embedding generation using Sentence-BERT and vector index construction based on FAISS.

3.3.1. Fine-Tuning of Sentence-BERT and Definition of Evaluation Metrics

A fine-tuned Sentence-BERT model was employed to generate deep semantic embeddings of the texts. This model leverages a siamese network architecture to optimize the comparability of sentence vectors, substantially improving text matching efficiency [48]. In this study, the Sentence-BERT model used bert-base-chinese as the pre-trained backbone. Key training parameters during fine-tuning were set as follows: learning rate = 2 × 10⁻⁵, batch size = 32, and number of epochs = 3.

To clarify the definition of “semantic matching accuracy”, a sentence-pair validation dataset was constructed to evaluate model performance. The dataset consists of positive and negative pairs formed by accident reports and safety inspection records. A pair was labeled as positive when a clear risk-related semantic association was confirmed through manual annotation; otherwise, it was labeled as negative. The dataset was divided into training, validation, and test sets at a ratio of 8:1:1.

Model performance on the test set was assessed based on its classification results for sentence-pair matching. Accuracy, Recall, and F1-score were adopted as evaluation metrics. The formal definitions of semantic matching accuracy and other performance indicators are provided in Section 3.6. The experimental results are presented and discussed in detail in Section 4. Compared with the non-fine-tuned baseline model, the fine-tuned model demonstrated substantial performance improvements in the semantic matching task, thereby validating the effectiveness of the constructed domain-specific semantic dataset for model optimization.

In terms of model configuration, the maximum sequence length was set to 128 tokens (truncated if exceeded, padded if shorter), and a batch size of 32 was used to improve encoding efficiency. The model outputs 768-dimensional dense vectors, and the convert_to_numpy = True parameter was applied to enforce the output as a NumPy array. To ensure the validity of subsequent similarity computations, the SBERT output vectors were L2-normalized [49], as expressed by the following formula:

{\vec{v}}_{norm} = \frac{\vec{v}}{{‖\vec{v}‖}_{2} + ε}

(2)

where

{‖\vec{v}‖}_{2}

denotes the L2 norm of a vector. Here,

ε = 10^{- 6}

is a small constant introduced to prevent division by zero during the normalization process. After vector normalization, the inner product becomes equivalent to cosine similarity, thereby establishing a solid basis for efficient similarity-based retrieval.

3.3.2. FAISS Vector Index Construction

To enable fast semantic retrieval of large-scale texts, a FAISS vector index was constructed. The index dimensionality was set to match the SBERT output vectors (768 dimensions), using the FAISS IndexIVFFlat index type with IndexFlatIP (inner product) as the quantizer. The number of clustering centroids,

n_{l i s t} = 100

, was determined through systematic pre-experiments. Four candidate values (50, 100, 200, and 300) were evaluated, and the results indicated that

n_{l i s t} = 100

achieved optimal overall performance, with a single-query retrieval time of only 1.2 ms and peak mean average precision (MAP). When

n_{l i s t} < 100

, semantic retrieval accuracy dropped below 0.8, failing to meet matching requirements; when

n_{l i s t} > 100

, memory usage increased by 50% with no significant improvement in retrieval efficiency. Therefore, 100 clustering centroids were selected to optimize retrieval performance [50].

The normalized SBERT embeddings of accident records were used to train the index. If the training sample size was less than 500, the index was automatically downgraded to IndexFlatIP, under which the retrieval accuracy differed from the FAISS IndexIVFFlat index by no more than 0.02, reducing computational resource consumption while maintaining accuracy. After training, the accident vectors were added to the index and persisted for storage, supporting efficient association and matching between temporally aggregated inspection texts and historical accident records.

3.3.3. Data Partitioning and Cross-Project Validation

During the fine-tuning stage of Sentence-BERT, a sentence-pair matching dataset was constructed and divided into training, validation, and test sets at a ratio of 8:1:1. To evaluate the cross-project generalization capability of the proposed model, an independent residential construction project that was not involved in the training process was selected as an external test set. This project contains safety inspection records, and a sliding time-window mechanism was established for risk assessment.

Throughout the testing procedure, the training and testing datasets were strictly separated to prevent any potential data leakage, thereby ensuring the reliability of the evaluation results. The trained model was directly applied to heterogeneous project data without retraining, enabling verification of its generalization performance under different project conditions.

3.4. 7-Day Sliding Time Window for Hazard Temporal Accumulation

To characterize the temporal accumulation effect of construction safety hazards, pre-experiments were conducted to compare time windows of 5, 7, 10, and 14 days, using mean average precision (MAP) and computational efficiency as evaluation metrics. The results indicate that the 7-day window achieves the best balance between performance and efficiency, with a MAP of 0.8547. The 5-day window provides insufficient temporal coverage to fully capture hazard accumulation characteristics, resulting in a lower MAP of 0.7812. Although the 10-day and 14-day windows extend the temporal coverage, they introduce data redundancy and lead to reduced generalization performance, with both achieving an AUC below 0.62. Based on these results, a 7-day window was selected as the optimal temporal aggregation interval.

In the implementation, inspection records are first sorted by inspection date, and a 7-day sliding window is constructed starting from the date of the first record. All inspection texts within each window are aggregated to form a time-interval–level “aggregated text,” in which keywords are deduplicated while informative features are retained, and the temporal range of the window along with corresponding statistical information is recorded. The window then slides forward by setting the end date of the current window as the starting point of the next window, continuing until all inspection records are covered. This process ultimately generates a time-series aggregation dataset for cumulative risk analysis.

3.5. Semantic Matching and Risk Prediction

Building upon the procedures of text preprocessing, keyword extraction, risk level inference, semantic embedding generation, and seven-day sliding window aggregation (Section 3.1, Section 3.2, Section 3.3 and Section 3.4), this study further develops a semantic matching and risk prediction framework based on SBERT semantic embeddings and a FAISS vector index. This framework enables the identification of associative risks between inspection windows and historical accident records, and outputs risk prediction results with explicit priority levels suitable for practical decision-making.

3.5.1. Construction of Window-Level Semantic Vectors

For each 7-day inspection window generated in Section 3.4, all inspection records within the window are aggregated into a single composite text, which serves as the primary semantic representation. In addition, a window-level set of domain-specific keywords is incorporated to enhance the expression of construction safety risk information. The aggregated text and the corresponding keyword information are jointly fed into the fine-tuned Sentence-BERT model to generate a 768-dimensional window-level semantic embedding.

To ensure consistency in the vector space and the validity of subsequent similarity computations, L2 normalization is applied to the window-level semantic vectors, aligning their metric scale with that of the historical accident semantic vectors. This representation preserves the contextual semantic structure while explicitly introducing risk-related features, thereby improving the model’s ability to identify high-risk scenarios.

3.5.2. FAISS-Based Retrieval of Similar Accidents

Using the FAISS vector index constructed in Section 3.3, a nearest-neighbor search is performed for each window-level semantic vector to retrieve the Top-10 historical accident records with the highest semantic similarity. The semantic similarity between an inspection window and a historical accident is defined as:

Sim (ω, a_{i}) = v_{ω} \cdot v_{a_{i}}

(3)

Herein,

v_{ω}

denotes the L2-normalized SBERT semantic embedding corresponding to the

ω

-th 7-day inspection window, while

v_{a_{i}}

represents the L2-normalized semantic embedding of the

a

-th historical accident record. Owing to the L2 normalization applied to both vectors, the inner product formulation described above is mathematically equivalent to the cosine similarity between the inspection window text and the historical accident text.

Based on the results of preliminary experiments, the similarity threshold is empirically set to

θ = 0.65

. When

Sim (ω, a_{i}) \geq θ

, the inspection window and the historical accident are considered to exhibit a significant semantic association. Accordingly, the corresponding accident is identified as a candidate risk incident and incorporated into the subsequent risk assessment and ranking process.

3.5.3. Keyword-Enhanced Semantic Matching Mechanism

Considering that high-risk scenarios in construction safety are often triggered by a limited number of critical terms, a keyword-enhanced strategy is introduced into the semantic matching process to compensate for the insufficient sensitivity of pure semantic embeddings to fine-grained risk cues. Specifically, candidate accident records are first coarsely filtered based on high-risk keywords, and then re-ranked according to SBERT-based semantic similarity. When multiple accident records exhibit similar semantic similarity scores, the TF-IDF weight distribution of the associated keywords is further considered, giving higher priority to accident texts with stronger risk-indicative keywords. This mechanism improves the stability and interpretability of high-risk accident identification.

3.5.4. Identification of Core High-Risk Accidents and Risk Prediction Rules

“Major high-risk accidents” refer to incidents that may result in three or more fatalities, ten or more serious injuries, or direct economic losses exceeding RMB 5 million, corresponding to the ‘major’ or higher accident categories defined in the ‘Regulations on the Reporting, Investigation, and Handling of Work Safety Accidents’ [51].

For each inspection window, within the Top-10 semantically similar historical accidents retrieved, the automatically inferred accident risk levels described in Section 3.2 are incorporated, and a “high-risk-priority” decision rule is applied to identify the core reference accident for the window. This rule ensures that accidents with higher risk levels are preferentially selected as the primary basis for risk prediction, thereby enhancing the conservativeness and reliability of the proposed risk forecasting framework. A risk-prioritized decision rule is adopted to identify the core reference accident associated with the inspection window

ω

. Let

A_{ω}

denote the set of the Top-10 historical accidents exhibiting the highest semantic similarity to window

ω

, and let

{A_{ω}}^{(H)} \subseteq A_{ω}

represent the subset of accidents labeled as having a high risk level.

If

{A_{ω}}^{(H)} \neq

∅, the core maximum-risk reference accident for window

ω

is defined as the high-risk accident with the highest semantic similarity:

a_{ω}^{*} = \arg \max_{a_{i} \in A_{ω}^{(H)}} Sim (ω, a_{i})

(4)

In the absence of high-risk accidents, i.e., when

{A_{ω}}^{(H)} =

∅, the core reference accident degenerates to the most semantically similar accident within the Top-10 set:

a_{ω}^{*} = \arg \max_{a_{i} \in A_{ω}} Sim (ω, a_{i})

(5)

Through the above rules, an explicit mapping between window-level risk prediction results and historical accidents is established, providing a robust basis for accurate identification and prioritized ranking of construction site safety risks.

3.5.5. Output of Risk Prediction Results

Finally, the risk prediction output for each 7-day inspection window includes the following elements: the start and end dates of the window, the number of inspection records within the window, the set of window-level core keywords, the identified core high-severity risk accident (including the accident ID, textual description, and similarity score), and a Top-10 list of associated historical accidents ranked by semantic similarity. This structured output provides safety managers with clear guidance on risk prioritization, thereby facilitating proactive intervention and refined management of potential hazards at construction sites.

3.5.6. Risk Assessment and Management Decision Support

Although the proposed model primarily focuses on semantic matching and risk ranking, its outputs were further transformed into a structured risk assessment and management decision-support tool to facilitate practical implementation in construction safety management.

(1): Quantitative Risk Scoring Mechanism

To convert semantic similarity results into interpretable and manageable risk indicators, a composite risk scoring mechanism was established by integrating two core factors: (i) the semantic similarity score between inspection texts and representative accident scenarios (

S i m

, ranging from 0 to 1), and (ii) the normalized cumulative frequency of hazard records within a 7-day time window (

N

, ranging from 0 to 1). The overall risk score is defined as:

R i s k S c o r e = α \cdot S i m + (1 - α) \cdot N

(6)

where

α

represents the relative weight between semantic similarity and hazard accumulation. Based on validation experiments and expert consultation,

α

was set to 0.7 to emphasize the dominant role of semantic relevance in risk inference while accounting for the complementary influence of hazard density within the time window.

According to the calculated RiskScore, risks are categorized into four levels:

Extremely High Risk (RiskScore ≥ 0.80);
High Risk (0.60 ≤ RiskScore < 0.80);
Moderate Risk (0.40 ≤ RiskScore < 0.60);
Low Risk (RiskScore < 0.40).

This dual-dimensional scoring mechanism, combining semantic similarity and temporal hazard accumulation, enables systematic risk classification and priority ranking. The overall trend of the evaluation results was consistent with on-site safety managers’ judgments, indicating that the proposed quantitative mechanism effectively reflects the actual risk level of construction sites.

(2): Management-Oriented Risk Response Guidelines

Based on the four-level risk classification, a corresponding management response framework was developed to support resource allocation and intervention prioritization:

Extremely High Risk: Immediate on-site inspection and corrective action are recommended;

High Risk: Targeted hazard investigation should be completed within 24 h;
Moderate Risk: Included in the scope of weekly key safety inspections;
Low Risk: Routine inspection and continuous monitoring are maintained.

This mapping mechanism translates abstract semantic similarity scores and risk rankings into actionable management recommendations, addressing the limitation of conventional algorithmic models that typically remain at the stage of risk identification. It thereby extends the framework from risk detection to practical decision support.

(3): Observed Improvements in Management Efficiency

During pilot implementation, safety managers were able to prioritize high-risk time windows based on the model-generated risk ranking, leading to optimized allocation of inspection resources. Compared with traditional non-prioritized inspection procedures, the structured prioritization mechanism demonstrated:

Faster response to high-risk situations;
Reduced redundant inspections of low-risk records;
Improved overall efficiency of the safety inspection workflow;
These practical observations suggest that the proposed framework not only enhances risk identification accuracy but also contributes to improved operational efficiency in construction safety management.

3.6. Evaluation Metrics

To accurately evaluate model performance on imbalanced construction safety datasets—particularly under practical scenarios where high-risk samples account for a relatively small proportion—this study adopts Accuracy, Recall, and F1-score as the core evaluation metrics. These indicators comprehensively assess model performance from three perspectives: overall correctness, positive sample identification capability, and balanced classification effectiveness.

First, four fundamental components are defined based on the confusion matrix: True Positive (

TP

), representing correctly predicted risk samples; False Positive (

FP

), representing normal samples incorrectly predicted as risks; True Negative (

TN

), representing correctly predicted normal samples; and False Negative (

FN

), representing risk samples incorrectly predicted as normal.

The evaluation metrics are defined as follows:

(1): Accuracy: Accuracy measures the overall proportion of correctly classified samples among all samples:

Acc = \frac{T P + T N}{T P + T N + F P + F N}

(7)

(2): Recall: In construction safety management, failing to detect risk cases (FN) may lead to severe accidents; therefore, Recall is a critical indicator. It measures the proportion of actual risk samples correctly identified by the model [52]:

Recall = \frac{TP}{TP + FN}

(8)

(3): F1-Score: As the harmonic mean of Precision and Recall, the F1-score balances the trade-off between precision and sensitivity in imbalanced datasets and effectively reflects overall classification performance:

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(9)

where

Precision = \frac{T P}{T P + F P}

.

(4): Mean Average Precision (MAP): MAP is employed to evaluate the quality of risk ranking results [53], reflecting the model’s ability to distinguish different risk levels. It is defined as:

MAP = \frac{1}{Q} \sum_{q = 1}^{Q} AP (q)

(10)

where

AP (q)

denotes the average precision

q

of the risk assessment task and

Q

represents the total number of risk assessment tasks. MAP ranges from 0 to 1; values closer to 1 indicate better ranking performance and stronger discrimination capability across different risk levels.

(5): Area Under the Curve (AUC): AUC is used to measure the discriminative capability of the risk classification model [53], ranging from 0.5 to 1.0. Values closer to 1 indicate stronger classification performance. AUC is defined as the area under the Receiver Operating Characteristic (ROC) curve:

AUC = \int_{0}^{1} TPR (F P R) d (F P R)

(11)

where

TPR

denotes the True Positive Rate and

F P R

denotes the False Positive Rate.

Overall, the proposed evaluation metrics form a coherent and robust assessment framework that enables systematic validation of both classification reliability and risk-ranking discrimination in imbalanced construction safety datasets.

4. Experiments and Results

4.1. Dataset Description and Text Preprocessing

This study employs three types of construction safety textual datasets to evaluate the model’s performance and generalization ability:

Construction Safety Accident Dataset: Sourced from publicly released accident reports on the official websites of the Ministry of Housing and Urban-Rural Development and provincial/municipal emergency management departments, covering the period from 2002 to 2020, with a total of 1368 valid accident records. The dataset includes major project types such as building construction and municipal engineering, with accident types including typical construction incidents such as high-altitude falls and collapses. Each record contains key information including the occurrence time, location, casualties, and causes of accidents. The dataset is stored in the “China Construction Engineering Site Safety Accident Data Platform” (http://39.98.221.201/accidentltem/admlogin.html, accessed on 15 March 2026).
Construction Safety Inspection Record Dataset: Collected from the new student dormitory construction project of Shenyang Jianzhu University, containing 484 daily safety inspection records from 26 August 2022 to 15 October 2023. Each record includes the inspection date, description of hazards, inspection location, and rectification requirements. The records are terminology-dense, providing a solid basis for cross-text semantic matching analysis.
Cross-Project Safety Inspection Dataset: Collected from a residential construction project, containing 331 daily safety inspection records from 3 February 2023, to 11 June 2025. Each record includes hazard descriptions, risk locations, and management requirements, with annotated risk levels of “Major Hazard,” “Moderate Hazard,” and “General Hazard.” This dataset serves as an independent test set to evaluate model generalization performance.

All textual data undergo a unified preprocessing pipeline: non-Chinese characters and non-numeric symbols are filtered using regular expressions; tokenization is performed with the Jieba tokenizer enhanced by a construction safety domain-specific dictionary; low-discriminative common words are removed using a domain-specific stopword list, and tokens of length ≤ 1 are discarded. Keywords are extracted using the TF-IDF method, retaining the Top-5 keywords per record, while high-frequency words appearing in more than 85% of texts are filtered to reduce redundancy and highlight potential risk features.

4.2. Performance of the Proposed Model

The proposed model adopts a hybrid architecture of ‘domain-fine-tuned SBERT + 7-day sliding window aggregation + FAISS semantic retrieval’. First, the fine-tuned Sentence-BERT is used to extract sentence-level semantic features (MAX_SEQ_LENGTH = 128, batch_size = 32) with L2 normalization. Then, a 7-day sliding window aggregates consecutive inspections records to capture temporal risk accumulation effects. An FAISS IndexIVFFlat is constructed based on accident text embeddings (nlist = 100), and inner product similarity is used to quickly retrieve the Top-10 most similar accidents. A high-risk priority strategy is applied to achieve accurate risk matching.

Even without relying on manually labeled risk levels, the model achieves excellent performance (Table 1): recall = 1.00, accuracy = 0.75, F1-score = 0.8571, MAP = 0.8547, and AUC = 0.6593. The AUC value is substantially higher than the 0.5 random-ranking baseline, demonstrating meaningful discriminative capability between high- and low-risk time windows.

The experimental results indicate that, under a unified threshold setting, the proposed model achieves zero missed detections (Recall = 1.0000), thereby covering all high-risk time windows. Meanwhile, the relatively high F1-score reflects strong consistency between the predicted results and the actual high-risk windows. The MAP metric further demonstrates stable performance in risk-priority ranking.

It is important to note that, in construction safety management scenarios, ranking capability provides greater decision-making value than single classification outcomes. Compared with traditional TF-IDF and TextRank methods, the proposed model exhibits more stable performance on ranking-oriented metrics (MAP and AUC). These results highlight the positive contribution of deep semantic representations and time-window modeling to high-risk priority identification and ranking.

4.3. Comparison with Baseline Models

To further analyze the performance differences among various models for construction safety risk prediction, a systematic comparison was conducted under a unified risk threshold and evaluation protocol between the proposed core model and four baseline models.

TF-IDF (without domain dictionary): A classical statistical text representation method relying solely on term frequency–inverse document frequency (TF-IDF) to compute text vectors, without adaptation to the construction safety domain [51].
TextRank-based Keyword Matching: A keyword extraction and matching method based on word co-occurrence graphs, which measures text similarity by the degree of keyword overlap [51].
BERT[CLS] vector: Uses the [CLS] token embedding from the pre-trained BERT-base-Chinese model to represent sentence semantics, without domain-specific fine-tuning [48].
SVM + TF-IDF: Combines TF-IDF features with support vector machine (SVM) for traditional text classification, representing a classic supervised learning paradigm [51].

Under a unified risk classification threshold, all models exhibit comparable performance in classification metrics (Accuracy, Recall, and F1-score) in identifying high-risk time windows. However, substantial differences are observed in ranking-oriented metrics (MAP and AUC), as shown in Table 2.

The results indicate that traditional methods (TF-IDF and TextRank) achieve relatively low performance on ranking-oriented metrics (MAP and AUC). In particular, the AUC of TextRank is close to the random baseline (0.5), demonstrating limited capability in distinguishing risk levels across time windows. BERT[CLS] and SVM + TF-IDF show improvements in ranking metrics; however, their performance remains inferior to that of the proposed model.

Under the present experimental setting, the proposed model achieves comparatively higher MAP and AUC values (MAP = 0.8547, AUC = 0.6593), indicating its advantage in risk-level differentiation and priority ranking. This improvement can be attributed to the domain-fine-tuned SBERT sentence-level semantic representations and the 7-day sliding window modeling of temporal risk accumulation features. Figure 2 presents a visual comparison of MAP and AUC results across all models.

As shown in Figure 2, the proposed model achieves the highest performance in both MAP and AUC metrics (MAP = 0.8547, AUC = 0.6593), demonstrating a clear advantage in risk-level differentiation and priority ranking. This improvement primarily stems from the high-quality semantic representations provided by the domain-fine-tuned SBERT and the effective modeling of temporal risk accumulation using the 7-day sliding window. Although all models exhibit similar classification performance under a fixed threshold, the proposed model outperforms others in the risk ranking task, which better aligns with construction safety decision-making requirements, thereby validating the effectiveness of the adopted modeling strategy.

4.4. Ablation Study

To systematically analyze the contribution of each core module to the overall model performance, ablation experiments were designed along three dimensions: text representation (SBERT embeddings vs. traditional features), temporal window modeling (7-day aggregation vs. single record), and keyword enhancement strategy (with or without keyword fusion). Three baseline variants were constructed and compared with the full model (Table 3 and Figure 3). The specific experimental configurations are as follows:

(1): Baseline 1 (TF-IDF + single record): Uses only TF-IDF features without SBERT embeddings or keyword enhancement, and no window aggregation is performed.
(2): Baseline 2 (SBERT + single record): Introduces domain-fine-tuned SBERT embeddings, but only a single inspection record is used without sliding window aggregation.
(3): Baseline 3 (TF-IDF + BERT[CLS] + 7-day window): Combines BERT[CLS] vectors with TF-IDF features and performs 7-day window aggregation, but no keyword enhancement is applied.
(4): Full model (keywords + SBERT + 7-day window): Integrates SBERT embeddings, 7-day sliding window aggregation, and the keyword enhancement strategy, representing the configuration of the proposed core model.

Table 3. Ablation Study Results.

Experimental Group	Accuracy	Recall	F1 Score	MAP
Baseline 1 (TF-IDF + single record)	0.3190	0.8811	0.4684	0.2923
Baseline 2 (SBERT + single record)	0.2955	1.0000	0.4561	0.4634
Baseline 3 (TF-IDF + BERT[CLS] + 7-day window)	0.7500	1.0000	0.8571	0.7959
Full model (keywords + SBERT + 7-day window)	0.7500	1.0000	0.8571	0.8547

Note: The MAP values of each experimental group are visualized to clearly illustrate the contribution of each component to the model’s risk-ranking capability.

Figure 3. MAP Comparison Across Different Ablation Settings.

The ablation study results (Table 3 and Figure 3) demonstrate that each component contributes complementarily to the model’s overall performance. Among them, temporal window aggregation is the primary driver of performance improvement: incorporating the 7-day sliding window increased the MAP from below 0.5 to 0.7959, and the full model further achieved 0.8547, confirming its effectiveness in capturing the temporal accumulation of risks. Domain-finetuned SBERT significantly outperforms TF-IDF in semantic representation, boosting MAP by 58.6% under the single-record condition and further enhancing risk discrimination in the full model. On this basis, the introduction of keyword enhancement provides an additional 7.4% MAP improvement without affecting classification performance. In summary, the synergistic integration of temporal window modeling, SBERT semantic embeddings, and keyword enhancement underpins the model’s overall performance advantage.

4.5. Generalization and Robustness Analysis

In practical construction safety management scenarios, a model must not only perform well on data from the same source but also exhibit robustness to textual noise and generalization capability across heterogeneous project data. Therefore, the proposed model was systematically evaluated from two perspectives: robustness and cross-project generalization.

4.5.1. Robustness Under Textual Noise

To simulate the textual noise commonly present in real construction safety inspection records, a 10% noise level was introduced into the original inspection texts. The noise types included: (1) misspellings of domain-specific terms, (2) insertion of redundant descriptions, and (3) synonym replacements. Under the same experimental settings as the original evaluation, the model performance was reassessed, and the results are summarized in Table 4.

The experimental results indicate that textual noise has a limited impact on model performance. The F1-score remains unchanged before and after noise injection, while MAP and AUC decrease by less than 2%, demonstrating the model’s strong robustness in high-risk identification, risk ranking, and boundary differentiation. This stability is primarily attributed to the domain-fine-tuned SBERT sentence-level semantic embeddings, which focus on core risk semantics rather than superficial lexical features. Consequently, the model can accurately capture critical risk associations even in the presence of noise, making it well-suited to real-world construction inspection texts with inconsistent formatting and reducing reliance on complex text-cleaning procedures.

4.5.2. Cross-Project Generalization Test Results

To further evaluate the model’s transferability across different construction projects, 331 safety inspection records from an external residential construction project were used as an independent test set, forming 77 effective 7-day sliding windows, including 30 high-risk windows. The cross-project test results are summarized in Table 5.

The observed performance variation can be attributed to the following factors:

(1): Differences in textual expression: Inspection records from the heterogeneous project exhibit more colloquial and fragmented language styles. The same hazard may be described using multiple expressions, increasing the difficulty of semantic matching.
(2): Differences in hazard types: The cross-project dataset includes certain emerging hazard patterns that were not sufficiently represented in the training data. This leads to some low-risk windows being classified as high-risk windows.
(3): Imbalanced sample distribution: Among the 77 time windows, low-risk samples account for a relatively large proportion. Under a fixed threshold setting, the model adopts a more permissive risk judgment strategy to maintain a high recall rate, thereby increasing the false-positive rate.

It should be emphasized that, in the cross-project evaluation, the risk classification threshold was not re-tuned. Instead, the fixed threshold determined from the source project was directly applied to assess the model’s genuine transfer capability. Under this condition, maintaining the zero-miss characteristic resulted in a relatively high false-positive rate on heterogeneous data, which in turn reduced overall Accuracy. This phenomenon reflects the inherent trade-off between recall and precision.

Although Accuracy and F1-score decrease, MAP (0.5153) and AUC (0.6418) remain above the random baseline, indicating that the model preserves a certain level of risk-ranking capability in cross-project scenarios. At the same time, the model maintains the zero-miss characteristic, which aligns with the core principle of construction safety management that prioritizes avoiding missed detections over false alarms. In practical applications, a certain proportion of false positives can be rapidly verified and corrected by on-site managers. The associated cost is substantially lower than the potential safety consequences resulting from missed high-risk hazards.

In practical construction safety management, achieving zero missed detections is typically prioritized over reducing the false-positive rate. A certain proportion of false alarms can be corrected through manual verification, whereas missed identification of high-risk hazards may lead to more severe consequences. Therefore, from an engineering application perspective, the proposed model retains acceptable practical value in cross-project environments. Future improvements may be achieved through threshold adjustment or limited recalibration using a small amount of target-project data, thereby further enhancing cross-project classification accuracy.

As shown in Figure 4, the model maintains a 100% recall rate in the cross-project scenario, consistent with the results obtained on the source dataset, thereby satisfying the fundamental requirement of “zero missed detections” in construction safety management. However, the F1-score decreases from 0.8571 to 0.5607, and Accuracy declines from 0.7500 to 0.3896, indicating a noticeable performance reduction. These results suggest that, without target-project retraining or threshold adaptation, direct model transfer is accompanied by a certain degree of performance degradation.

Nevertheless, MAP (0.5153) and AUC (0.6418) remain above the random-ranking baseline (0.5), demonstrating that the model retains a certain level of risk-ranking capability under cross-project conditions. This finding indicates that the proposed approach captures domain-general semantic characteristics in construction safety to some extent, while also being affected by project-specific expression differences and changes in sample distribution.

Overall, without any threshold recalibration or target-project retraining, the model preserves the zero-miss property and maintains basic ranking discrimination ability, demonstrating a degree of cross-project transfer potential. At the same time, the decline in classification accuracy reveals sensitivity to project-specific contextual variations. Future work may incorporate limited target-project data for recalibration or adaptive threshold optimization to further enhance cross-scenario robustness.

4.6. Computational Efficiency Analysis

To evaluate the operational efficiency of the proposed model in large-scale scenarios, experiments were conducted under different batch sizes (32, 64, 128). The optimal batch size (128) was selected for further analysis, and 4840 inspection records (approximately 10 times the size of the original dataset) were processed in batches. The test results are summarized in Table 6.

The results indicate that the model can achieve millisecond-level processing per sample under a standard computing environment, supporting real-time risk assessment immediately after inspection record entry. Moreover, its minute-level batch processing capacity significantly exceeds the daily data scale requirements of typical construction projects, enabling concurrent processing across multiple projects.

Additionally, comparison experiments with different batch sizes show that a batch size of 128 achieves an optimal balance between throughput and memory usage, providing a clear parameter configuration reference for practical deployment of the model.

The efficiency evaluation results indicate that the proposed model achieves millisecond-level processing speed per sample under standard computing environments, enabling rapid semantic matching and risk assessment for batch text data. Compared with typical semantic matching pipelines based on full BERT encoding, the proposed approach demonstrates higher computational efficiency during the inference stage, highlighting its strong potential for practical engineering deployment.

5. Discussion

This study develops an intelligent analytical framework for construction safety risk prediction by integrating domain-specific semantic modeling with time-series aggregation. Through comparative experiments, ablation studies, and generalization and robustness tests, the effectiveness of the proposed model in terms of predictive performance, stability, and practical applicability is systematically validated. The following discussion focuses on four aspects: semantic modeling approach, time-window mechanism, component synergy, and engineering application value.

5.1. Impact of Semantic Modeling on Construction Safety Risk Identification

Experimental results show that text semantic representation methods are critical to construction safety risk prediction, especially in risk-ranking metrics like MAP and AUC. Compared with frequency-based or keyword-matching approaches like TF-IDF and TextRank, domain-adapted SBERT exhibits significant advantages in risk-ranking tasks. This is mainly because construction safety texts are generally short, loosely structured, and contain implicit risk semantics; many potential hazards are not directly expressed through explicit risk terms but are embedded in descriptions of construction behaviors or site conditions.

Domain-adapted SBERT effectively models the deep semantic associations between “hazardous behaviors—potential accident outcomes” through sentence-level embeddings, allowing expressions such as “scaffold connections loose” or “edge protection missing” to be consistently mapped to accident risks like high-altitude falls in the semantic space. Therefore, even when classification performance is similar under a fixed threshold, the proposed model demonstrates superior risk-priority ranking, aligning well with the practical decision-making requirement in construction safety management of “prioritizing interventions for high-risk scenarios with limited resources.”

5.2. Critical Role of Time-Window Modeling in Risk Prediction

Ablation studies further reveal the core value of the temporal dimension in construction safety risk modeling. Results indicate that models relying solely on single inspection records cannot simultaneously achieve high classification accuracy and risk-ranking performance, which aligns with the empirical reality that construction accidents often result from hazards that accumulate and evolve over time.

The 7-day sliding window aggregates inspection records across continuous periods, allowing the model to explicitly capture the repetition and persistence of hazards, shifting from “isolated risk detection” to a holistic characterization of “systemic risk states.” Introducing the temporal window significantly improves F1 and MAP metrics, indicating that time-series modeling enhances risk discrimination and ranking while maintaining high recall. This mechanism also mirrors the way safety managers integrate multiple inspections to assess risk trends, thereby improving model interpretability and practical credibility.

5.3. Synergistic Effect of Keyword Enhancement and Deep Semantic Embeddings

On top of the dominant contribution from deep semantic embeddings, experiments show that the keyword-enhancement strategy provides noticeable marginal gains in risk-ranking tasks. With classification performance unchanged, incorporating domain-specific keywords further improves MAP, reinforcing the core semantic features associated with high-risk scenarios and providing a more stable ranking advantage for high-risk windows. This demonstrates that in highly specialized, small-sample domains like construction safety, combining domain-driven keyword information with SBERT sentence-level embeddings can compensate for the deep model’s limitations in fine-grained risk discrimination without significantly increasing model complexity, highlighting the practical value of integrating data-driven approaches with domain knowledge.

5.4. Generalization, Robustness, and Engineering Application Value

The cross-project experimental results indicate that the model maintains a 100% recall rate in heterogeneous engineering projects while preserving a certain level of risk-ranking capability. These findings suggest that the model captures cross-project shared semantic characteristics to some extent. However, the observed performance degradation also indicates sensitivity to project-specific contextual variations. This demonstrates that the model possesses a degree of transferability, although it has not yet achieved fully domain-independent generalization.

Robustness tests further show that performance variations remain relatively limited under textual noise conditions, including typographical errors, redundant descriptions, and expression differences. This indicates a certain level of adaptability to common textual perturbations and reduces reliance on complex text preprocessing procedures.

From an engineering application perspective, the model adheres to the “zero missed detection” principle while tolerating a certain level of false positives. This design aligns with the risk-prevention-oriented logic of construction safety early warning systems. In practical scenarios, false alarms can typically be corrected through manual verification, whereas missed identification of high-risk hazards may result in more severe consequences. Therefore, this trade-off is considered reasonable within the context of safety management.

In addition, the model demonstrates high computational efficiency and can be integrated into existing safety management information systems without complex structural modifications. It can support risk assessment and auxiliary early warning based on safety inspection records. Overall, the proposed approach achieves a relative balance among predictive performance, robustness, and engineering applicability, providing a technically feasible pathway for intelligent identification of construction safety risks.

5.5. Engineering Practice Outcomes

To further validate the applicability of the proposed framework in real-world settings, a six-month pilot implementation was conducted in a newly constructed student dormitory project. The system was applied to daily safety inspection records throughout the entire construction process, covering multiple phases, including the main structural works and installation engineering.

(1): Hazard Identification Performance

During the pilot period, the model processed 484 inspection records and established semantic associations with 1368 historical accident cases. Based on the proposed risk-ranking and grading mechanism, a total of 12 high-risk time windows and 27 medium-risk time windows were identified. Within the high-risk windows, the model proactively detected several critical hazard scenarios, including loosened scaffold connections, missing edge protection, and overdue inspections of lifting equipment. These hazards were subsequently confirmed during on-site verification and were rectified accordingly. Although it is difficult to directly quantify the “accident avoidance effect” through counterfactual analysis, the early identification and timely rectification of these hazards prove the model’s effectiveness in supporting proactive safety management.

(2): Optimization of Management Processes

The risk-ranking results were integrated into the existing project safety supervision workflow. Unlike the conventional approach of conducting uniform inspections across all records, on-site safety managers prioritized verification of high-risk time windows identified by the model.

During the pilot phase, the following improvements in management practices were observed:

Inspection resources were more concentrated on high-risk areas;
Redundant checks on low-risk records were reduced;
Response speed to critical hazards was enhanced.

Feedback from on-site safety managers indicated that the risk-ranking list improved situational awareness of project safety conditions and supported more targeted supervisory decision-making. It should be emphasized that the system operated as a decision-support tool, with final judgments and corrective actions remaining under the responsibility of human managers.

(3): Data Accumulation and Knowledge Reuse Value

Beyond its direct on-site support function, the pilot implementation established a semantic association database linking inspection records with historical accident cases. This structured knowledge base enhanced the traceability of hazard–accident relationships and provided reusable data resources for risk identification in subsequent projects. The transferability of the semantic risk-matching framework across different projects suggests that the proposed method has certain scalability potential in large-scale construction environments.

(4): Engineering Significance

Overall, the pilot results demonstrate that the proposed risk-ranking framework is technically feasible and exhibits strong compatibility with existing management processes. The system contributes to shifting construction safety management from retrospective statistical analysis toward structured, data-driven risk-priority management.

Although long-term validation across additional projects is still required to quantitatively assess its impact on overall safety performance, the current practical outcomes indicate that embedding a semantic similarity-based risk inference mechanism into routine safety management workflows can enhance the proactivity of hazard identification and improve the timeliness of managerial responses.

6. Conclusions

This study addresses key challenges in construction safety management, including identifying latent risks in safety inspection texts, inadequate prioritization guidance for accident prediction, and the limited utility of unstructured textual data for engineering decision-making. To tackle these issues, a risk-ranking framework integrating domain-specific semantic embeddings, time-window modeling, and vector retrieval mechanisms was proposed. Without requiring manual, record-by-record annotation, the framework enables semantic association modeling and priority identification between construction hazard texts and potential accident risks.

The experimental results demonstrate that domain-fine-tuned Sentence-BERT effectively captures the deep semantic relationship between “hazardous behaviors” and “potential accident consequences.” By incorporating a 7-day sliding time window, the model is able to represent the temporal accumulation characteristics of risk, thereby shifting risk identification from single-text matching to stage-based risk perception. Compared with TF-IDF, TextRank, and non-fine-tuned BERT approaches, the proposed framework exhibits superior performance in terms of risk-ranking stability and discrimination capability for high-risk samples. More importantly, validation in an engineering pilot project indicates a high degree of consistency between the model-generated high-risk rankings, subsequent accident records, and expert review outcomes. High-risk inspection records that exhibited strong consistency trends enabling on-site safety managers to identify potential major hazards in advance, optimize inspection resource allocation, reduce the backlog of risk rectification tasks, and improve response efficiency. These findings suggest that the proposed method not only demonstrates advantages in algorithmic metrics but also provides practical value in hazard identification, risk prioritization, and safety management support.

Nevertheless, several limitations remain. First, the keyword enhancement stage relies on a manually constructed domain-specific keyword set. The selection and validation of keywords are influenced by expert experience, which may introduce a degree of subjectivity. Second, accident data were primarily derived from officially published reports, in which the proportion of major and severe accidents is relatively high, while records of minor incidents are comparatively limited. This imbalance may affect the comprehensive representation of risk patterns. Third, cross-project generalization remains influenced by differences in textual expression styles and management standards across projects, and the current model does not yet incorporate an adaptive project-adaptive feature transfer mechanism. In addition, a fixed 7-day time window was adopted for risk accumulation modeling without dynamic adjustment according to construction stages or task characteristics, which may limit fine-grained predictive performance in complex construction scenarios. From a practical deployment perspective, several engineering challenges remain, including inconsistent on-site text recording practices, fluctuations in data quality, compatibility issues with existing safety management information systems, and variations in regional risk classification standards requiring localized adaptation. Therefore, large-scale implementation still necessitates continuous validation and parameter optimization based on long-term, multi-project operational data.

Overall, the semantic-driven risk-ranking framework proposed in this study provides a feasible technical pathway for transforming construction safety management from retrospective statistical analysis toward proactive risk early warning. Future research may focus on cross-regional multi-project validation, adaptive time-window modeling, domain transfer learning, and model interpretability mechanisms to further enhance generalization capability and engineering reliability, thereby delivering more robust and sustainable technical support for construction safety management.

Author Contributions

Conceptualization, S.L. and W.Y.; Methodology, S.L., W.Y. and G.L.; Software, G.L.; Investigation, R.Z.; Data curation, G.L. and R.Z.; Writing—original draft, S.L. and R.Z.; Writing—review & editing, W.Y. and G.L.; Supervision, W.Y. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, Z.; Goh, Y.M.; Li, Q. Overview and analysis of safety management studies in the construction industry. Saf. Sci 2015, 72, 337–350. [Google Scholar] [CrossRef]
Tixier, A.J.P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Application of machine learning to construction injury prediction. Autom. Constr. 2016, 69, 102–114. [Google Scholar] [CrossRef]
Tixier, A.J.P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Autom. Constr. 2016, 62, 45–56. [Google Scholar] [CrossRef]
Ding, L.Y.; Zhong, B.T.; Wu, S.; Luo, H.B. Construction risk knowledge management in BIM using ontology and semantic web technology. Saf. Sci 2016, 87, 202–213. [Google Scholar] [CrossRef]
Tian, D.; Li, M.; Ren, Q.; Zhang, X.; Han, S.; Shen, Y. Intelligent question answering method for construction safety hazard knowledge based on deep semantic mining. Autom. Constr. 2023, 145, 104670. [Google Scholar] [CrossRef]
Zhang, F.; Fleyeh, H.; Wang, X.; Lu, M. Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 2019, 99, 238–248. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence embeddings using siamese and triplet networks with BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3973–3983. [Google Scholar]
Pan, R.; Yuan, Q.; Cao, J.; Zhang, C.; Yu, C.; Liu, Q.; Liang, X. Sentence-resampled BERT-CRF model for autonomous vehicle crash causality analysis from large-scale accident narrative text data. Accid. Anal. Prev. 2025, 221, 108184. [Google Scholar] [CrossRef]
Pothina, R.; Ganguli, R. Contextual representation in NLP to improve success in accident classification of mine safety narratives. Minerals 2023, 13, 770. [Google Scholar] [CrossRef]
Chen, Y.; Lu, G.; Wang, K.; Chen, S.; Duan, C. Knowledge graph for safety management standards of water conservancy construction engineering. Autom. Constr. 2024, 168, 105873. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, W.; Mi, L.; Sun, G.; Qiao, L.; Tao, M.; Wang, L. Uncovering the Organizational Vulnerability toward Construction Project Accidents: BERTopic-Based Text Mining Analysis. J. Constr. Eng. Manag. 2025, 151, 04025179. [Google Scholar] [CrossRef]
Smetana, M.; Salles de Salles, L.; Sukharev, I.; Khazanovich, L. Highway construction safety analysis using large language models. Appl. Sci. 2024, 14, 1352. [Google Scholar] [CrossRef]
Xu, N.; Liang, Y.; Guo, C.; Meng, B.; Zhou, X.; Hu, Y.; Zhang, B. Entity recognition in the field of coal mine construction safety based on a pre-training language model. Eng. Constr. Archit. Manag. 2025, 32, 2590–2613. [Google Scholar] [CrossRef]
Kim, M.; Kim, D.; Lee, J.; Lee, D. LLM-Driven Q&A System with Question-Level Classification for Personalized Construction Safety Training. Available online: https://ssrn.com/abstract=5703983 (accessed on 7 March 2026).
Liu, S.; Shen, J.; Zhang, J. An integrated model combining BERT and tree-augmented naive Bayes for analyzing risk factors of construction accident. Kybernetes 2025, 54, 5651–5675. [Google Scholar] [CrossRef]
Shin, S.H.; Won, J.H.; Jeong, H.J.; Kang, M.G. Development of a site information classification model and a similar-site accident retrieval model for construction using the KLUE-BERT model. Buildings 2024, 14, 1797. [Google Scholar] [CrossRef]
Li, J.; Wu, C. Deep learning and text mining: Classifying and extracting key information from construction accident narratives. Appl. Sci. 2023, 13, 10599. [Google Scholar] [CrossRef]
Mohamed Hassan, H.A.; Marengo, E.; Nutt, W. A BERT-based model for question answering on construction incident reports. In Natural Language Processing and Information Systems; Springer: Cham, Switzerland, 2022; pp. 215–223. [Google Scholar]
Gupta, A.K.; Pardheev, C.G.V.S.; Choudhuri, S.; Das, S.; Garg, A.; Maiti, J. A novel classification approach based on context connotative network (CCNet): A case of construction site accidents. Expert Syst. Appl. 2022, 202, 117281. [Google Scholar] [CrossRef]
Shuang, Q.; Liu, X.; Wang, Z.; Xu, X. Automatically categorizing construction accident narratives using the deep-learning model with a class-imbalance treatment technique. J. Constr. Eng. Manag. 2024, 150, 04024107. [Google Scholar] [CrossRef]
Xun, M.L.; Wang, L.; Deng, Y.; Ni, G. Extracting domain knowledge elements of construction safety management: Rule-based approach using Chinese natural language processing. J. Manag. Eng. 2021, 37, 4021001. [Google Scholar] [CrossRef]
Wu, W.; Wen, C.; Yuan, Q.; Chen, Q.; Cao, Y. Construction and application of knowledge graph for construction accidents based on deep learning. Eng. Constr. Archit. Manag. 2025, 32, 1097–1121. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, X.; Sun, L.; Sun, Y.; Kang, J. Research on constructing and reasoning the collision knowledge graph of autonomous navigation ship based on enhanced BERT model. Expert Syst. Appl. 2025, 278, 127429. [Google Scholar] [CrossRef]
Zhou, Z.; Wei, L.; Luan, H. Deep learning for named entity recognition in extracting critical information from struck-by accidents in construction. Autom. Constr. 2025, 173, 106106. [Google Scholar] [CrossRef]
Luo, Q.; Zhang, G.; Sun, Y. A Novel Method for Named Entity Recognition in Long-Text Safety Accident Reports of Prefabricated Construction. Buildings 2025, 15, 3063. [Google Scholar] [CrossRef]
Im, M.I. Automatic Matching of Safety Hazards with Provisions in Construction Specifications Based on Natural Language Processing. Master’s Thesis, The Graduate School Seoul National University, Seoul, Republic of Korea, 2025. [Google Scholar]
Luo, X.; Li, X.; Goh, Y.M.; Song, X.; Liu, Q. Application of machine learning technology for occupational accident severity prediction in the case of construction collapse accidents. Saf. Sci. 2023, 163, 106138. [Google Scholar] [CrossRef]
Lee, W.; Lee, S. Development of a Knowledge Base for Construction Risk Assessments Using BERT and Graph Models. Buildings 2024, 14, 3359. [Google Scholar] [CrossRef]
Zhang, L.; Wang, J.; Wang, Y.; Sun, H.; Zhao, X. Automatic construction site hazard identification integrating construction scene graphs with BERT based domain knowledge. Autom. Constr. 2022, 142, 104535. [Google Scholar] [CrossRef]
Shen, Y.; Wu, Y.; Gan, S. Classification Study of Collapse Accident Potentials Based on BERT Modeling. In Proceedings of the 2023 5th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), Hangzhou, China, 1–3 December 2023; pp. 504–507. [Google Scholar]
Sadick, A.M.; Smith, B.; Nasirzadeh, F.; Sadeghi, S.; Ayal, S.; Bouadjenek, M.R. Comparative Analysis of BERT and RoBERTa for Construction Site Incident Report Classification. In CIB Conferences; Purdue University: West Lafayette, IN, USA, 2025; p. 248. [Google Scholar]
Ray, U.; Arteaga, C.; Ahn, Y.; Park, J. Enhanced identification of equipment failures from descriptive accident reports using language generative model. Eng. Constr. Archit. Manag. 2026, 33, 2298–2313. [Google Scholar] [CrossRef]
Ahmadi, E.; Muley, S.; Wang, C. Automatic construction accident report analysis using large language models (LLMs). J. Intell. Constr. 2025, 3, 1–10. [Google Scholar] [CrossRef]
Yoo, B.; Kim, J.; Park, S.; Ahn, C.R.; Oh, T. Harnessing generative pre-trained transformers for construction accident prediction with saliency visualization. Appl. Sci. 2024, 14, 664. [Google Scholar] [CrossRef]
Gong, S.; Sun, F.; Chen, K. Dam Safety Emergency Response Decision-Making Method Based on Graph Database and Language Model. Asian J. Water Environ. Pollut. 2026, 23, 166–179. [Google Scholar] [CrossRef]
Shi, D.; Li, Z.; Zurada, J.; Manikas, A.; Guan, J.; Weichbroth, P. Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents. Knowl. Inf. Syst. 2024, 66, 2651–2681. [Google Scholar] [CrossRef]
Luo, X.; Feng, X.; Ji, X.; Dang, Y.; Zhou, L.; Bi, K.; Dai, Y. Extraction and analysis of risk factors from Chinese chemical accident reports. Chin. J. Chem. Eng. 2023, 61, 68–81. [Google Scholar] [CrossRef]
Zhou, H.; Tang, S.; Huang, W.; Zhao, X. Generating risk response measures for subway construction by fusion of knowledge and deep learning. Autom. Constr. 2023, 152, 104951. [Google Scholar] [CrossRef]
Jia, Q.; Fu, G.; Xie, X.; Xue, Y.; Hu, S. Enhancing accident cause analysis through text classification and accident causation theory: A case study of coal mine gas explosion accidents. Process Saf. Environ. Prot. 2024, 185, 989–1002. [Google Scholar] [CrossRef]
Liu, X.; Jing, X.; Zhu, Q.; Du, W.; Wang, X. Automatic construction hazard Identification integrating on-site scene graphs with information extraction in outfield test. Buildings 2023, 13, 377. [Google Scholar] [CrossRef]
Huang, X.; Wen, Y.; Zhang, F.; Li, H.; Sui, Z.; Cheng, X. Accident analysis of waterway dangerous goods transport: Building an evolution network with text knowledge extraction. Ocean. Eng. 2025, 318, 120176. [Google Scholar] [CrossRef]
Fang, W.; Luo, H.; Xu, S.; Love, P.E.; Lu, Z.; Ye, C. Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv. Eng. Inform. 2020, 44, 101060. [Google Scholar] [CrossRef]
Tian, D.; Li, M.; Han, S.; Shen, Y. A novel and intelligent safety-hazard classification method with syntactic and semantic features for large-scale construction projects. J. Constr. Eng. Manag. 2022, 148, 04022109. [Google Scholar] [CrossRef]
Chen, Z.; Huang, K.; Wu, L.; Zhong, Z.; Jiao, Z. Relational graph convolutional network for text-mining-based accident causal classification. Appl. Sci. 2022, 12, 2482. [Google Scholar] [CrossRef]
Gao, L.; Yao, X.; Sun, X. A survey on sentence embedding. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
Sun, J. Jieba Chinese Text Segmentation Tool (Version 0.42.1, Released on 20 January 2020) [Computer Software]. Available online: https://github.com/fxsjy/jieba (accessed on 16 March 2026).
Salton, G. A blueprint for automatic indexing. ACM SIGIR Forum 1997, 31, 23–36. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. IEEE Trans. Big Data 2019, 5, 330–341. [Google Scholar] [CrossRef]
State Council of the People’s Republic of China. Regulations on the Reporting, Investigation and Handling of Production Safety Accidents (Decree No. 493 of the State Council); China Legal Publishing House: Beijing, China, 2007. [Google Scholar]
Ao, X.; Yu, X.; Liu, D.; Tian, H. News keywords extraction algorithm based on TextRank and classified TF-IDF. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 1364–1369. [Google Scholar] [CrossRef]
Liu, T.Y. Learning to Rank for Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]

Figure 1. Architecture of the SBERT-TRPM Model.

Figure 2. Comparison of Ranking Performance (MAP and AUC) among different models.

Figure 4. Performance Comparison Between In-project and Cross-project Data.

Table 1. Performance Metrics of the Proposed Model.

Metric	Accuracy (Acc)	Recall	F1-Score	MAP	AUC
Value	0.7500	1.0000	0.8571	0.8547	0.6593

Table 2. Performance Comparison of Different Models.

Metric	MAP	AUC
Traditional TF-IDF (without domain dictionary)	0.7383	0.5467
TextRank keyword matching	0.7500	0.5000
BERT[CLS] vector	0.8224	0.6533
SVM + TF-IDF	0.8117	0.5630
Proposed Model (SBERT + 7-day window)	0.8547	0.6593
Traditional TF-IDF (without domain dictionary)	0.7383	0.5467

Note: The classification metrics (Accuracy, Recall, and F1-score) are close to 0.75–1.0 under the fixed threshold setting; detailed values are reported in Table 1. Table 2 primarily presents the performance differences among models in the high-risk time-window ranking task.

Table 4. Comparison of Model Performance Before and After Noise Injection in Robustness Testing.

Metric	Before Noise (Original Data)	After Noise	Change (%)
F1-score	0.8571	0.8571	0.00
MAP	0.8547	0.8455	1.08
AUC	0.6593	0.6489	1.58

Table 5. Model Generalization Performance on the Cross-Project Safety Inspection Dataset.

M1.	Recall	Accuracy	F1-Score	MAP	AUC
Value	1.0000	0.3896	0.5607	0.5153	0.6418

Table 6. Computational Efficiency Evaluation of the Proposed Model on Large-Scale Inspection Records.

Metric	Number of Samples	Total Time (s)	Average Time per Sample (ms)	Throughput (Entries per min)
Value	4840	94.65	19.56	3068

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Yan, W.; Liu, G.; Zhang, R. Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data. Buildings 2026, 16, 1243. https://doi.org/10.3390/buildings16061243

AMA Style

Liu S, Yan W, Liu G, Zhang R. Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data. Buildings. 2026; 16(6):1243. https://doi.org/10.3390/buildings16061243

Chicago/Turabian Style

Liu, Shu, Weidong Yan, Guoqi Liu, and Rui Zhang. 2026. "Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data" Buildings 16, no. 6: 1243. https://doi.org/10.3390/buildings16061243

APA Style

Liu, S., Yan, W., Liu, G., & Zhang, R. (2026). Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data. Buildings, 16(6), 1243. https://doi.org/10.3390/buildings16061243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Time-Aware Construction Site Risk Prediction Based on Sentence-BERT and 7-Day Window Aggregation with Unlabeled Data

Abstract

1. Introduction

2. Literature Review

2.1. Accident Prediction Based on Machine Learning

2.2. Knowledge Extraction Based on Natural Language Processing

2.3. Applications of Deep Learning in Construction Text Analysis

3. Materials and Methods

3.1. Text Preprocessing

3.1.1. Text Cleaning, Tokenization, and Stop-Word Removal

3.1.2. TF-IDF Keyword Extraction

3.2. Automatic Risk Level Inference

3.3. Semantic Embedding and FAISS Index Construction

3.3.1. Fine-Tuning of Sentence-BERT and Definition of Evaluation Metrics

3.3.2. FAISS Vector Index Construction

3.3.3. Data Partitioning and Cross-Project Validation

3.4. 7-Day Sliding Time Window for Hazard Temporal Accumulation

3.5. Semantic Matching and Risk Prediction

3.5.1. Construction of Window-Level Semantic Vectors

3.5.2. FAISS-Based Retrieval of Similar Accidents

3.5.3. Keyword-Enhanced Semantic Matching Mechanism

3.5.4. Identification of Core High-Risk Accidents and Risk Prediction Rules

3.5.5. Output of Risk Prediction Results

3.5.6. Risk Assessment and Management Decision Support

3.6. Evaluation Metrics

4. Experiments and Results

4.1. Dataset Description and Text Preprocessing

4.2. Performance of the Proposed Model

4.3. Comparison with Baseline Models

4.4. Ablation Study

4.5. Generalization and Robustness Analysis

4.5.1. Robustness Under Textual Noise

4.5.2. Cross-Project Generalization Test Results

4.6. Computational Efficiency Analysis

5. Discussion

5.1. Impact of Semantic Modeling on Construction Safety Risk Identification

5.2. Critical Role of Time-Window Modeling in Risk Prediction

5.3. Synergistic Effect of Keyword Enhancement and Deep Semantic Embeddings

5.4. Generalization, Robustness, and Engineering Application Value

5.5. Engineering Practice Outcomes

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI