To achieve the research objective, a mixed method that integrates qualitative and quantitative analyses is adopted. Specifically, the method comprises a comprehensive literature review, a questionnaire survey, and a BP neural network analysis, as shown in
Figure 1. These processes can be divided into four parts, which includes (1) the primary identification of driving factors through the literature review based on the extended STS-TOE framework, (2) the determination of driving factors based on the questionnaire survey, (3) the construction of a weight calculation model based on BP neural networks, and (4) the determination of critical factors and their influences on lean practice in PC.
3.1. Identifying Primary Factors Under Extended Sts-Toe Framework
This study identified digitalization-driven factors with the extended STS-TOE framework. Accordingly, the factors can be identified from five key dimensions: (1) organization-based, e.g., culture, capability, and coordination, (2) social-based, e.g., process, planning, workflows, and interoperability, (3) technology-based, e.g., tools, maturity, and platforms, (4) economy-based, e.g., investment, cost, and benefits, and (5) environment-based, e.g., standards, support, and incentives. Then, an in-depth literature review was conducted to identify the primary factors of digitalization-driven lean in PC projects from the five dimensions. Searches were performed across global databases, i.e., “Google Scholar”, “Web of Science”, “Scopus”, as well as the Chinese database, i.e., “CNKI”. The search strategy used Boolean string: Title/Abstract/Keyword = (“lean” OR “just-in-time”) AND (“prefabricated construction/buildings” OR “prefabrication” OR “precast” OR “off-site construction” OR “industrial building system”) AND (“BIM” OR “digitalization” OR “information technology”). Time restrictions were not set to cover a full list of relevant studies. To ensure methodological rigor, only peer-reviewed articles were included, as they undergo a more rigorous review process than conference papers [
89,
90]. The search was further refined by selecting articles as the “document” type and limiting language to “English”. The search initially yielded around 80 papers. Studies that were not directly relevant to the research scope, lacked factor-related evidence, or were not peer-reviewed journal articles were excluded. After removing duplicates and excluding non-SCI journal articles through visual inspection, 30 high-quality and highly relevant papers were selected for full-text review. Similar factor-identification studies have also used a focused corpus of high-relevance papers rather than a large but weakly related literature pool, as the quality and relevance of sources are more critical than the absolute number of papers for thematic factor extraction.
These 30 papers formed the basis for identifying preliminary factors, while the initial broader set of 80 papers was also referred to ensure comprehensive coverage. During the literature-reviewing process, particular attention was paid to avoiding conceptual redundancy among the extracted factors. Similar or overlapping expressions identified from different studies were compared, merged, and standardized according to their theoretical meanings and practical implications. This coding and refinement process helped avoid repeated counting of semantically similar items and ensured that each preliminary factor represented a relatively distinct aspect of digitalization-driven lean implementation. From these, a total of 27 preliminary factors relating to organizational, social, technological, economic and environmental dimensions were identified. To further enhance objectivity, only factors that appeared in at least three reviewed papers were retained as preliminary factors, ensuring that the selected items reflected recurring themes in the literature rather than isolated or repetitive expressions.
3.2. Data Collection Through a Questionnaire Survey
To refine the 27 preliminary factors, a structured questionnaire survey was conducted targeting construction practitioners in China who have working experience with prefabricated, off-site, or modular construction projects. Additionally, the questionnaire survey should be used to calculate the factors’ weights based on BP neural networks [
84]. The questionnaire was divided into two sections: Part 1 gathered demographic and professional profile information, including gender, age, position, and years of experience. Part 2 asked respondents to rate the importance of each identified factor and the overall effectiveness of lean implementation in PC projects, using a five-point Likert scale, ranging from 1 (minimal impact) to 5 (very high impact). The overall effectiveness of lean implementation is the output variable of the BP model. This item was designed to capture the overall outcome of lean implementation as perceived by practitioners, rather than to measure each sub-dimension of lean performance separately. Before distributing, a pilot study was conducted with 6 experts working on PC projects to ensure that the wording and phrasing of the 27 factors and the overall evaluation of lean implementation were clear and easy to understand. Based on their feedback, minor adjustments were made. Then, the survey was administered online via the Wenjuanwang platform using a combination of random and convenience sampling to enhance representativeness.
Notably, to thoroughly consider the impact of regional difference in construction management practices, technical standards, and economic conditions on digitalization-driven lean implementation in PC, the different regions should be considered [
82]. This not only ensures the reliability of the data but also considers the influence of regional differences. Therefore, the data collected from the questionnaire survey should cover various regions. Regions with a higher level of PC development and lean–digital application can more comprehensively reflect the driving role of digital tools in lean implementation, thereby enabling the identification of effective and representative influencing factors. This is because regions with higher levels of development possess richer experience, which allows for more comprehensive evaluations. Since the study aims to identify how digital technologies support lean implementation in PC projects, regions with more mature PC and digital construction practices provide a more suitable empirical context, as digitalization-driven lean practices are more observable and practitioners are more likely to evaluate the relevant factors based on actual project experience. Moreover, according to the “Guidelines on Vigorously Developing Prefabricated Buildings” issued by the General Office of the State Council [
91], Beijing–Tianjin–Hebei, the Yangtze River Delta, and the Pearl River Delta are designated as priority regions for PC advancement. Consequently, PC in these areas is relatively mature, and, combined with their advanced economic conditions, the level of digital technology application and lean is relatively higher. Therefore, this study selects these regions and China’s eastern coastal regions as the focus areas for analysis. As a result, 11 regions are selected as the survey regions, i.e., Beijing City, Tianjin City, Hebei Province, Shanghai City, Jiangsu Province, Zhejiang Province, Anhui Province, Guangdong Province, Fujian Province, Shandong Province and Liaoning Province. Therefore, data from these regions can support the identification of key factors that are meaningful for PC projects seeking to advance lean implementation through digital technologies.
Over a two-week period, 148 valid responses were collected from 11 regions. Respondent profiles are summarized in
Table 1. The respondents covered different professional roles in PC projects, including production managers, construction managers, project managers, cost managers, and chief engineers. This multi-role respondent structure was adopted to obtain a more comprehensive professional evaluation of digitalization-driven lean implementation, rather than relying on a single stakeholder perspective. In addition, many respondents had participated in multiple lean and digital PC projects, indicating that they were able to evaluate the identified factors based on practical project experience. Therefore, the questionnaire data provide a professional basis for constructing regional-level evaluations of lean implementation effectiveness.
Descriptive statistical analyses were conducted via SPSSPRO for the reliability of the results. Cronbach’s alpha coefficient is 0.949 > the standardized threshold of 0.948, indicating excellent internal consistency of the questionnaire. To evaluate the construct validity, a factor analysis was performed. The Kaiser–Meyer–Olkin test has a value of 0.927, while Bartlett’s test of sphericity has p-value < 0.05, confirming the suitability of the data for factor analysis.
3.3. Determining Final Factors Through a Questionnaire Survey
A questionnaire survey was conducted to evaluate the perceived importance of the 27 preliminary factors. Respondents used a five-point Likert scale (1 = minimal impact, 5 = very high impact). To retain factors with relatively clear importance, a mean-score threshold of 3.4 was adopted. The 3.4 threshold represents a pragmatic selection criterion, retaining approximately two-thirds of the maximum score, reflecting the majority of respondents’ perception of importance, while excluding less important factors. This is commonly used as selection criteria for identifying relevant factors. Factors with mean scores below 3.4 were considered less salient and were removed. Accordingly, 18 factors with mean scores above 3.4 were retained for subsequent BP neural network analysis. The 9 removed factors included “Skilled workers knowing PC and BIM” [
44], “Clear organizational boundaries and responsibilities” [
6], “Development of lean construction technics” [
92], “Change control based-digitalization” [
93], ”Plan management based digitalization”, ”Cost–benefit analysis based digitalization” [
94], ”Promotion and application of emerging informational technologies” [
95,
96], ”Cost of application and management” [
97,
98] and “Investment of software and hardware” [
6,
98]. These final factors, listed in
Table 2, were deemed relatively important by respondents, thereby enhancing the robustness of the subsequent BP neural network analysis process.
Among the 18 factors, the organization-based factors emphasize digital skills, stakeholder attitudes, and organization structures, while the social-based factors concern lean management systems, workflows, and data consistency. Technology-based factors address informatization, design–construction integration, quality control, simulation, and digital platforms. In addition, economy- and environment-based factors highlight economic benefits, government support, and institutional standards. Together, these factors form a comprehensive framework grounded in established literature and models.
3.4. Bp Neural Network Analysis
(1) Constructing the structure of the BP neural network model.
The BP neural network model consists of an input layer, hidden layers, and an output layer [
20]. The parameters including the number of nodes in each layer were determined to set the BP network structure. One feature is that the neuron nodes are fully connected to adjacent layers but not within the same layer [
107]. While multiple hidden layers can model complex relationships, more layers do not improve performance. The hidden layer captures nonlinear relationships, but an imbalance in node count can lead to underfitting, i.e., too few nodes, or overfitting, i.e., too many nodes, reducing predictive accuracy. To optimize performance, the BP neural network model was designed with a compact structure, selecting the minimum necessary nodes while maintaining accuracy [
77]. The number of input layer nodes is 18, corresponding to the total number of influential factors, while the number of output layer nodes is 1, representing the goal of lean implementation in PC [
108]. The number of hidden layer nodes can be determined with reference to Equation (1) to enhance selection effectiveness [
74,
108].
where
is the number of nodes of the hidden layer,
is 18, representing the number of nodes of the input layer, and
is 1, representing the number of nodes of the output layer.
is a constant belonging to 1–10. In order to determine the node number of the hidden layer, training experiments are carried out for the network model with different numbers of neurons in the hidden layer respectively, and a comparative analysis is conducted for the mean square error (
MSE) [
109]. The
MSE calculation formula is as Equation (2) [
109].
where
yi is the true value,
ŷi is the estimated value, and
m is the number of hidden layer nodes. After multiple trials, this study determined that a BP neural network with one hidden layer and 6 nodes in the hidden layer achieved the lowest
MSE [
77].
The model structure of the BP neural network model with three layers is shown in
Figure 2. The figure illustrates the structure of the BP neural network model used to evaluate the impact of digitalization-related factors on lean construction. The input layer consists of 18 driving factors, which are processed through 6 neurons in the hidden layer to capture their nonlinear interactions. The output layer represents the overall effectiveness of lean construction. This structure enables the model to simulate and quantify how each factor contributes to lean implementation by learning the underlying relationships between inputs and outputs through iterative training.
(2) The working principle of the BP neural network.
The learning process of the BP neural network model consists of two directions: (1) feedforward, i.e., input data from the 18 factors are processed via weighted connections in the hidden layer, transformed via the activation function, and passed to the output layer; (2) back propagation, i.e., the error inverse transfer algorithm iteratively adjusts weight and bias to minimize the difference between expected and actual outputs, ensuring convergence to the optimal solution. The hyperbolic tangent sigmoid transfer function was selected as the activation function for the neurons in the hidden layer, while a purely linear activation function was selected in the output layer. The model consists of a summation unit computing the weighted sum of inputs and a nonlinear activation function within a defined threshold. They are mathematically formulated in Equation (3).
is the number of nodes of the input layer; is the number of nodes of the input layer; represents the weight; is the threshold; and is the activation function in the hidden layer.
(3) The learning and training of the Bp model.
The Levenberg–Marquardt algorithm is widely used for training BP neural network models, particularly with small training datasets [
110]. As an iterative optimization method, it integrates the strengths of the steepest descent and Gauss–Newton methods, making it highly effective for nonlinear least squares problems and function parameter fitting [
111,
112]. The training steps of the BP neuron network model can be found in Rumelhart, Hinton and Williams [
21,
77].
In this study, training parameters were set to balance computational efficiency and convergence accuracy. Usually, the learning rate of the network is set to 0.01 [
77,
113], the maximum training number is 1000 [
80,
88], and the training requirement accuracy is 0.001 [
108,
114]. Other parameters are set by default settings, and training is performed until the network automatically converges, that is, the BP neural network model construction is completed. In order to ensure the reliability of the training results, 30 samples were taken randomly as training samples, and the remaining 5 samples were chosen as a test group [
108]. The normalized index data are input into the constructed neural network model, and the output rural vulnerability rank index is between [−1 and 1] [
113].
(4) Calculating factors’ weights using the Bp model.
Note that the weight of the hidden layer is
wij in Equation (4), and the weight of the output layer is
wjk in Equation (4) [
77]. Then, the factors’ final weights
can be calculated through combining
wij and s
wjk. The input data matrix is
n ×
m dimensional: m is the amount of data. There are 6 hidden layer neurons, so the hidden layer weight
wij is an
n × 6 dimensional matrix, and the output layer weight w
jk is a 6 × 1 dimensional matrix.
The BP neural network modeling and analysis aim to determine the weight values of the influencing factor indices for digitally driven lean construction in PC. To evaluate the extent to which input variables affect the output variable, it is necessary to analyze and process the weights between neurons in the input layer, hidden layer, and output layer [
77,
111]. The influence of the
input variable relative to all units in the input layer on the
hidden layer unit is expressed as
Similarly, the influence of unit
relative to all units in the hidden layer on the
output is expressed as
Furthermore, the influence of the
input variable on the
output variable is expressed as
Therefore, the weights among the input layer indicators can be expressed as
where
represents the input variables of the neural network,
;
j represents the hidden layer units,
;
represents the output variables,
;
is the weight coefficient between
and
;
is the weight coefficient between
and
; and
is the final weight of the factors.
3.5. Importance Evaluation Based on the Bp Neural Network Model
(1) The data source of the input and output layer.
It should be noted that the BP neural network model in this study was used to calculate the relative weights of the identified factors rather than to conduct large-sample prediction. Existing studies indicated a limited number of representative analytical units can be used in BP-based exploratory evaluations such as 10 and 11. Therefore, the 11 region-level observations were used as representative analytical units for exploratory factor-weight calculation. Considering the regional differences in PC development, digital technology adoption, and lean implementation practices in China, the individual questionnaire responses were aggregated at the regional level for BP neural network analysis. These 11 regions have been identified as key areas for promoting PC in China due to their relatively mature industrial foundation, policy support, and practical experience in construction industrialization. For the input layer of the BP neural network, the average scores of the 18 identified key factors were calculated for each region, which served as the input variable, denoted as X1, X2, …, X18.
For the output layer, the regional average score of the perceived overall effectiveness of lean implementation in PC projects was used as the output variable, denoted as
Y. As mentioned in
Section 3.2, respondents’ qualitative assessments of perceived overall effectiveness of lean implementation are collected. It should be emphasized that the output variable represents an overall professional evaluation of lean implementation effectiveness, rather than a multidimensional latent construct scale. This design is consistent with the exploratory purpose of the BP model in this study, which is to calculate the relative contribution of digitalization-driven factors to the perceived lean implementation outcome. These scores were aggregated at the regional level to generate comparable regional input–output observations, rather than to represent the perception of every individual respondent or project. It should be noted that regional aggregation may reduce within-region variance. The purpose of the BP neural network analysis is to calculate exploratory, outcome-oriented factor weights at the regional level, rather than to explain individual-level differences. Moreover, the respondents covered multiple professional roles in multiple lean and digital PC projects as detailed in
Table 1, providing a more comprehensive professional evaluation of the overall implementation level within each region. In summary, the correspondence between input variables (
), i.e., the regional averages of the 18 factors, and the output variable (
), i.e., the composite lean effectiveness score, is presented in
Table 3.
(2) Data processing.
Data normalization and partitioning are necessary for the BP neural network model. Normalization forms the foundation of model training by transforming the data into the range of [−1, 1]. This can be done with the toolbox of BP neural networks in MATLAB (R2019b). Data partitioning is critical for preventing overfitting in BP neural networks. Typically, the data is divided into three subsets: training, validation, and testing. The model was trained using the training data, while the validation data was used to ensure that the model did not overfit the training dataset. The test dataset was finally applied to assess the final performance of the model. The data of training, validation, and testing datasets were set as 70%, 15%, and 15%, which is a balanced approach commonly used in neural network models [
115]. However, because the analysis was based on 11 region-level observations, the
MSE values were interpreted only as internal fitting indicators rather than as evidence of strong predictive accuracy or generalization capability. Similar BP-based evaluation studies have also used a limited number of evaluation objects when the purpose was model-based evaluation or factor-weight identification rather than large-sample prediction.
The training set MSE results indicate that the trained network achieved an acceptable internal fit to the aggregated regional dataset. Nevertheless, the interpretation of the BP neural network results focuses on the relative ranking and managerial implications of the derived factor weights rather than on predictive performance. Accordingly, the subsequent analysis uses the trained network weights to calculate the relative contribution of each digitalization-driven factor to lean implementation effectiveness in PC projects.
Furthermore, the trained BP neural network model was applied to generate fitted values for five selected regions, namely Anhui, Shandong, Liaoning, Zhejiang, and Shanghai. The fitted values were compared with the observed regional values using a discrepancy rate metric, as shown in
Table 4. The discrepancy rates were all below 10%, suggesting that the fitted values were generally consistent with the observed regional scores. This comparison provides a descriptive check of internal consistency, while the main purpose of the BP model remains the calculation of factor weights for identifying the key drivers of digitalization-driven lean implementation.