Bim Machine Learning and Design Rules to Improve the Assembly Time in Steel Construction Projects

Integrating the knowledge and experience of fabrication during the design phase can help reduce the cost and duration of steel construction projects. Building Information Modeling (BIM) are technologies and processes that reduce the cost and duration of construction projects by integrating parametric digital models as support of information. These models can contain information about the performance of previous projects and allow a classification by linear regression of design criteria with a high impact on the duration of the fabrication. This paper proposes a quantitative approach that applies linear regressions on previous projects’ BIM models to identify some design rules and production improvement points. A case study applied on 55,444 BIM models of steel joists validates this approach. This case study shows that the camber, the weight of the structure, and its reinforced elements greatly influence the fabrication time of the joists. The approach developed in this article is a practical case where machine learning and BIM models are used rather than interviews with professionals to identify knowledge related to a given steel structure fabrication system.


Introduction
The fabrication phase of structural steel projects represents 30 to 40% of the overall building cost [1]. Yet, the decisions taken during the design phase affect 88% of the steelworks' costs and time of execution [2]. However, the highest interest of the designers remains the compliance with standards [3] and the appropriate choice of structural elements for the resistance of the structures [4]. Not much time, a few minutes to a few hours are devoted to evaluating models for cost and time reduction and the search for alternative solutions during the design phase [5]. These evaluations are made without considering the particularities of the manufacturing plant where the work will be carried out [6]. This situation leads to a sub-optimal design [7][8][9]. In the traditional Design Bid Build (DBB) procurement, where a project is carried out in a linear and fragmented process, manufacturing specialists often intervene at the end of the design phase [10,11]. At this moment, the modifications they make cause delays and additional costs in completing the projects [12]. This situation is similar to the situation in Product Development Engineering (PDE) in the 1990s.
In PDE, designers and manufacturers collaborate formally through different methods and design rules, such as design for manufacturing and assembly (DFMA) [13]. These rules provide designers with the essential knowledge to reduce the cost, the time, the tools, the number of operations, the quantity of material, and the number of workers during projects while improving quality during the manufacturing and assembly of parts from the design phase [14], and for a specific workshop [15].
DFMA consists of identifying and considering manufacturing and assembly constraints during design. This process leads to design rules and tools, which help obtain simplified and standardized products suitable for the manufacturing and assembly process [13,14]. The DFMA methodology also improves the manufacturing and assembly process by integrating structural changes that promote essential design criteria [16]. A case study from Douglas Commercial Airlines demonstrates the significant benefits of using DFMA in a manufacturing process. Notably: 51% reduction in the number of parts, 37% reduction in the cost of manufacturing parts, 50% faster time to market, 68% improvement in the quality and reliability of the final product, 62% reduction in assembly time, and 57% reduction in manufacturing time [17] DFMA identifies design factors with a high impact on manufacturing and assembly processes [18]. One approach in identifying these factors is to hold meetings with designers, manufacturers, and assemblers with extensive knowledge and experience to assess the design factors available for a designed product [11,13]. The identified factors help to establish criteria that will allow the evaluation of different product designs [19]. This approach, which seems to be possible to implement in the PDE, is difficult to apply in the construction industry because of the context of Design Bid Build (DBB), where there is real fragmentation between project phases [6]. However, recent work in machine learning (ML) shows that it is possible to extract knowledge from the digital data of a process. Therefore, Building Information Modeling (BIM) offers relevant data for ML in the construction industry.
This paper addresses the research question: Is it possible to identify design rules such as DFMA from BIM models of previous projects and machine learning algorithms? As an answer to this question, this paper aims to propose an approach to identify design rules from BIM models of previous projects and ML algorithms: To achieve this goal, this paper proposes to validate the possibility of extracting design factors from BIM models of steel structures and ML, the possibility of establishing some design rules to reduce the fabrication time from the obtained design factors.
For that, this article suggests a literature review to justify the choice of methodology, a methodology, and a case study with 55,444 BIM models of steel joists. The BIM models are from a major North American steel structure manufacturer.

Choice of a Knowledge Extraction Method
The extraction of knowledge specific to construction processes is one of the main motivations of industrial and scientific organizations related to the construction industry. Among these organizations, the Construction Industry Institute (CII) [20] and the Independent Project Analysis (IPA) [21] consider that the extraction of knowledge from processes is essential for verifying constructability and seeking efficiency during projects.
Two classes of methods can be used to extract knowledge from an industrial process: qualitative and quantitative.
The qualitative method analyzes speech and texts from experts, from their experiences, from conferences or brainstorms [22]. The quantitative method provides knowledge based on the statistical and historical characteristics of the available data. This method mainly uses mathematical models in scientific logic to propose a probabilistic form, which may occur in a given process with identified input data [23].
In the context of large amounts of data, one of the significant limitations of using qualitative methods in knowledge extraction is the limit of analysis of the human brain [23]. Data has grown exponentially since the advent of BIM and machine controllers [1]. The traditional nature of contracts in the construction industry forces a separation between the design and construction phases [24]. This separation is accentuated by the increasing complexity of projects and the growing level of client requirements, which requires specialization of activities in the construction industry [25]. Another barrier to using the qualitative method is the difference in academic training between design and construction professionals. Indeed, the organization of brainstorming between professionals in these two professions can lead to costly and unproductive discussions caused by the difference in perception [26,27]. Given these difficulties, this article proposes using a quantitative method based on machine learning (ML) to bypass human limitations related to knowledge extraction.

Machine Learning to Identify and Extract Knowledge
The construction of steel structures requires complex fabrication and assembly operations [1]. Complex fabrication systems require sophisticated and accurate prediction systems, such as those proposed by machine learning (ML) [28]. ML is the science of giving computers the ability to learn and act as humans do and improve their learning over time autonomously by providing data and information from a real process. ML is also defined as a process that extracts models automatically from historical data [29]. ML belongs to the domain of Artificial Intelligence (AI). AI is a field of research that aims to reproduce, through artificial systems, the different cognitive capacities of human beings [30]. One of the most targeted objectives of AI use is its ability to solve complex problems that are beyond human competence [28,[30][31][32], as well as to develop programs capable of learning from data [33]. Numerous applications of ML are found in finance, insurance, and medicine [28,34]. ML is also found in manufacturing production management [35], and in construction [36][37][38]. The observed benefits of ML in construction are widely appreciated in the industry [28,39] 2.3. Choice of the Type of Learning and the Type of Algorithm ML's two main approaches: supervised learning and unsupervised learning [31]. Supervised learning applies to processes with known output data. The objective here is to understand the relationship between input and output data. Unsupervised learning is applied to a series of data that is not understood. The goal is to find a natural link between these data.
The study proposed in this article is a simple application case of ML. Supervised learning with a regression algorithm is suitable for this study [31,40].
Regression algorithms consist of building a prediction model and training it with available data to respond accurately to new data belonging to the process to be studied [29]. Several regression-based operating time prediction cases exist in the literature [28]. In each of these cases, a comparison between algorithms identifies the algorithm that best fits the study [28].

Ensemble Learning
In the field of prediction with regression algorithms, there is increasing interest in Ensemble Learning (EL), a method that combines predictions from several algorithms and aggregates their results to obtain a higher accuracy than any individual algorithm [41][42][43][44]. The two main methods used in EL are boosting and bagging [43].
In boosting, successive prediction trees make incremental contributions to improve the predictions of previous trees. In the end, a weighted vote is taken for the final prediction. One of the techniques used for boosting is the Gradient Boosting Regressor GBR.
In Bagging, prediction trees do not depend on previous prediction trees. All trees are individually constructed. In the end, a simple majority vote is taken for the prediction [41]. The Random Forest Regressor (RFR) is one of the techniques used in Bagging. The GBR and RFR are both used and compared in this article for knowledge extraction.
To better appreciate the performance of GBR and RFR techniques in knowledge extraction, this study proposes to use another linear regression technique: Lasso (least absolute shrinkage and selection operator).
Lasso is a regression method that performs variable selection and regulation to improve the accuracy of predictions and the interpretation of the statistical model it produces [45]. The method was popularized by Robert Tibshirani in 1996 and is widely used today in ML, in the specific case of linear regressions. The objective of the method is to minimize the prediction error. For that, the Lasso method imposes a constraint on the sum of the absolute values of the model parameters. The sum must be less than a fixed value. The imposition of constraints is done by applying a process of narrowing (regulation), or it penalizes the regression variable coefficients by reducing certain to zero [46].
The use of Lasso has many advantages, including the accuracy of predictions by reducing the coefficients. This is particularly useful when the number of observations is small, and the number of characteristics is large. Lasso also allows researchers to increase the interpretation of the models by removing irrelevant variables [46].

Evaluation of Prediction Quality
It is essential to evaluate the quality of the prediction results. Lantz (2015) suggests the use of Mean Absolutes errors (MAEs), and Relative Absolute Errors (RAEs).
MAEs consider how far, on average, the prediction is from the real value [31] RAE is a performance metric that compares the actual forecast error to a very simple forecasting model [47].
We add the Gap between predicted and real-time (GBP) to these measurements. This measure represents the percentage difference between predicted and actual values. where: x i is the predicted value for the individual sample i, y i is the real value for the individual sample i, x is a mean value of x, with x = 1 n ∑ n i=1 x i y is a mean value of y, with y = 1 n ∑ n i=1 y i n is the sample size.
As MAE and GBP get closer to 0 and RAE gets closer to 1, the quality of the prediction improves.

BIM for Data Extraction
The success of a prediction depends on the quality of the data used [40]. It is essential to pay special attention to the data quality coming from a process. BIM offers dedicated technologies and processes for better information management in a construction project [24].
BIM technology offers high-quality data through the BIM models [24], and makes it possible to gather and classify information [48,49] specific to different disciplines in a single 3D model. In steel construction, the information extracted from these models can be used for constructability and quantitative estimation [48,50]. This information can also be used for cost and time estimation [51]. These data can be extracted automatically and reduce the time extraction [52,53]. The method proposed by this article will use the BIM models as a data source.

BIM Is an Asset for the Success of DFMA in the Construction Industry
Three main characteristics of DFMA are the component-based approach, modularization, and standardization [54]. BIM can be used as an object-oriented collaborative process to integrate information representing the fabrication and assembly phases of steel structures. BIM application in the DFMA approach has allowed professionals to simulate construction virtually to identify potential constraints that could increase project costs [54]. We believe that BIM can bring to the steel construction industry the benefits that Computer-Aided Design (CAD) has brought to DFMA. These include a more systematic analysis of fabrication and assembly options to produce a structural design that is more suited to the available processes [55] and fabrication process information to allow for multiple fabrication and assembly simulations [56]. BIM is used in the construction industry as a tool and process to improve the way buildings are designed and constructed [24].

Methodology
According to the Cross-Industry Standard Process for Data Mining (CRISP-DM), the main steps in data prediction are system understanding, data understanding, data preparation, data modeling, and outcome evaluation [57]. This paper adds pattern identification and design rules to these steps ( Figure 1).
Three main characteristics of DFMA are the component-based approach, modularization, and standardization [54]. BIM can be used as an object-oriented collaborative process to integrate information representing the fabrication and assembly phases of steel structures. BIM application in the DFMA approach has allowed professionals to simulate construction virtually to identify potential constraints that could increase project costs [54]. We believe that BIM can bring to the steel construction industry the benefits that Computer-Aided Design (CAD) has brought to DFMA. These include a more systematic analysis of fabrication and assembly options to produce a structural design that is more suited to the available processes [55] and fabrication process information to allow for multiple fabrication and assembly simulations [56]. BIM is used in the construction industry as a tool and process to improve the way buildings are designed and constructed [24].

Methodology
According to the Cross-Industry Standard Process for Data Mining (CRISP-DM), the main steps in data prediction are system understanding, data understanding, data preparation, data modeling, and outcome evaluation [57]. This paper adds pattern identification and design rules to these steps ( Figure 1). The proposed research approach could be described as follows:

System Understanding
The objective of system understanding is to understand the goals of the prediction and the requirements necessary to achieve these objectives. In construction projects, cost and schedule are generally the performance criteria sought to realize the project. They can be defined as prediction objectives. Costs and schedules depend on the geometrical and functional decisions made during the design phase. These decisions will be identified to serve as prediction criteria during system understanding. During this step, technologies and tools are also identified for the prediction set. The proposed research approach could be described as follows:

System Understanding
The objective of system understanding is to understand the goals of the prediction and the requirements necessary to achieve these objectives. In construction projects, cost and schedule are generally the performance criteria sought to realize the project. They can be defined as prediction objectives. Costs and schedules depend on the geometrical and functional decisions made during the design phase. These decisions will be identified to serve as prediction criteria during system understanding. During this step, technologies and tools are also identified for the prediction set.

Data Understanding
Data understanding is the next step after system understanding. This step consists in collecting and analyzing the data necessary for the prediction objectives. To do this, it is necessary to group and classify the data related to the prediction criteria sought with data analysis tools during the data understanding. The collection can be done with MS Excel or Google sheet spreadsheets. After data collection, the next step is to explore the data to ensure its quality. One method is to eliminate data that is either too large or too small to fit the criteria being analyzed.

Data Preparation
Data preparation consists of preparing the data for modeling. This is done in several steps, including: • data cleaning, which consists of removing, correcting, or deleting erroneous values, • data construction which consists of determining additional attributes that will be useful for data modeling, and • data integration consists of combining data from various sources.

Data Modelling
Modeling consists of building and evaluating various models based on different modeling techniques. Here we select the algorithms to try; we assess the competing models based on the results obtained and the performance criteria sought.
The implementation of the approach proposed in this article requires the use of an interpretive programming language and a programming package for the ML. In this article, the tool proposed as an interpretive programming language is Python, and the package for ML is Tensor Flow. The prepared data will be used in modeling with the RFR, GBR, and Lasso techniques. The prediction results with these techniques are presented at the end of the modeling.

Data Evaluation
This step compares the results of the three techniques used for modeling. The algorithm with which MAE and GBP will be close to 0 and RAE get closer to 1 has the best performance.

Pattern Identification
This step consists of identifying the variables that have the most impact on steel structure fabrication and assembly time.

Knowledge Learned & Design Rules:
This part consists of understanding the results of the pattern identification, establishing design rules, and formulating recommendations to improve the fabrication and assembly line.

Case Study
The case study in this article concerns the assembly of steel joists for a major manufacturer of steel structures in North America

System Understanding
In the steel construction industry, steel joists are lightweight steel structures that support roofs and floors and transfer the loads they receive directly to the steel structures that support them.
Joists are mainly composed of top and bottom chords, webs, and seats. See Figure 2.   Removing the assembled joist.
An automatic device is installed on the joist assembly table to measure the assembly time. Each time the beam elements arrive on the assembly table for the first time, the automatic device starts counting the assembly time. The counting will stop when the assembly table becomes empty again. This feature reduces human intervention in starting the countdown. However, the device does not stop when, leaving the joists on the assembly table, the workers go on a break or weekend.

Data Understanding
The data for this study come from the main variables that characterize the joists.

Data Preparation
Data cleaning: More than 170,000 assembly times were recorded, corresponding to more than 170,000 joists. However, these data contain noise. The noise is mainly due to inattention errors by workers, joists remaining on tables during breaks, weekends, and holidays. To reduce the noise in the data set, this study proposes to obtain the maximum and minimum time to assemble a joist on the assembly table. This technique consists of excluding the study, the times too short or too much to correspond to the time for assembling the beams. A maximum time of 25 min and a minimum time of 5 min is obtained from professionals. These times will be retained as the minimum and maximum limits of the assembly times retained for the study. This technique will reduce the data by 170,000 for 55,444 see Figure 3. holidays. To reduce the noise in the data set, this study proposes to obtain the maximum and minimum time to assemble a joist on the assembly table. This technique consists of excluding the study, the times too short or too much to correspond to the time for assembling the beams. A maximum time of 25 min and a minimum time of 5 min is obtained from professionals. These times will be retained as the minimum and maximum limits of the assembly times retained for the study. This technique will reduce the data by 170,000 for 55,444 see Figure 3. Data splitting for a better balance in prediction, the data were divided into four equivalent groups with similar statistical characteristics. Tables 1-4 present the organization of the configured groups with their respective features. The features were considered according to the prediction criteria Depth, Span, Camber1 ComponentCount Memb_Lgth Weight, and RealTime. These criteria were organized according to the mean, the minimum (min), the first quartile (25%), the median (50%), the third quartile (75%), the maximum (max), and the standard deviation (std), as seen in Tables 1-4. These measures ensure that the data is well distributed among the four groups. Three of these groups were used for training the learning algorithms and the fourth for testing them.  Data splitting for a better balance in prediction, the data were divided into four equivalent groups with similar statistical characteristics. Tables 1-4 present the organization of the configured groups with their respective features. The features were considered according to the prediction criteria Depth, Span, Camber1 ComponentCount Memb_Lgth Weight, and RealTime. These criteria were organized according to the mean, the minimum (min), the first quartile (25%), the median (50%), the third quartile (75%), the maximum (max), and the standard deviation (std), as seen in Tables 1-4. These measures ensure that the data is well distributed among the four groups. Three of these groups were used for training the learning algorithms and the fourth for testing them.

Modeling
Three algorithms (GBR, RFR, and Lasso) are used to predict the manufacturing time. The results of this modeling are as illustrated in Tables 5-7.  • Categories represent the range of lengths in feet to which the items belong. • The number of items represents the number of registered joists belonging to a category • Real-time is the sum of the recorded assembly times of the joists corresponding to a given category.

•
Prediction is the sum of the predicted times of the beams corresponding to a given category.

•
The GBP, Rae, and Mae measurements are the measures that allow the evaluation of the joists by category according to the prediction technique used.
The data resulting from Tables 5-7 give the difference of the values GBP, Rae, and Mae of the algorithms Lasso, GBR, and RFR. Indeed, the order of magnitude of the GBP, Rae, and Mae values are very close for the GBR and RFR algorithms. However, these values are more significant for the Lasso algorithm.

Evaluation
The following observations are made from Tables 5-7 and Figures 4-6. For the choice of the best algorithm.

•
The GBP obtained from the prediction results with the Lasso algorithm range from −22% to 35%, while the GBP obtained from the prediction results with the RFR and GBR algorithms range from −0.7% to 1.1% for the RFR and −0.4% to 1.1% for the GBR respectively, see Tables 5-7, and Figure 4. Considering that the GBP of ideal prediction results is very close to 0%. Thus, from the GBP point of view, the GBR and RFR algorithms provide more accurate prediction results than the Lasso algorithm.

•
The Rae obtained from the prediction results with the Lasso algorithm range from 0.99 to 3.20, while the Rae obtained from the prediction results with the RFR and GBR algorithms range from 0.94 to 1.06 for RFR and 0.92 to 1.01 for GBR respectively, see Tables 5-7, and Figure 5. Considering that the Rae of the ideal prediction results is very close to 1. Thus, from Rae's point of view, the GBR and RFR algorithms provide more accurate prediction results than the Lasso algorithm. • Finally, the Mae obtained from the prediction results with the Lasso algorithm vary between 3.17 and 7.28, while the Mae obtained from the prediction results with the RFR and GBR algorithms vary between 2.40 and 3.27 for the RFR and 2.30 and 3.14 for the GBR, respectively see Tables 5-7, and Figure 6. Considering that the Mae of the results of an ideal prediction is very close to 0. Thus, from Mae's point of view, the GBR and RFR algorithms provide more accurate prediction results than the Lasso algorithm.

•
The GBR and the RFR present results with almost identical GBP, Rae, and Mae.

•
The prediction times from the GBR and RFR are so close to real-time that their representative lines are overlayed see Figure 7.
Thus, The Lasso shows poor results compared to the other algorithms used in this article. The GBR and the RFR will be used in this article for pattern identification.

•
For the RFR, below 1502 items, the GBP are higher than 0.65%, while for the GBR, GBP greater than 0.61 are observed below 1172 items. This may indicate that prediction with the GBR technique does not require large amounts of data to provide accurate results.
The highest RAEs (−1.05 for RBR and −1.14 for GBR) correspond to the lowest number of items category. This can be explained by ML prediction results being more accurate when more data are available [31,32].

•
The prediction times from the GBR and RFR are so close to real-time that their representative lines are overlayed see Figure 7.
Thus, The Lasso shows poor results compared to the other algorithms used in this article. The GBR and the RFR will be used in this article for pattern identification.

•
For the RFR, below 1502 items, the GBP are higher than 0.65%, while for the GBR, GBP greater than 0.61 are observed below 1172 items. This may indicate that prediction with the GBR technique does not require large amounts of data to provide accurate results.
The highest RAEs (−1.05 for RBR and −1.14 for GBR) correspond to the lowest number of items category. This can be explained by ML prediction results being more accurate when more data are available [31,32].    • The prediction times from the GBR and RFR are so close to real-time that their representative lines are overlayed see Figure 7.
Thus, The Lasso shows poor results compared to the other algorithms used in this article. The GBR and the RFR will be used in this article for pattern identification.

•
For the RFR, below 1502 items, the GBP are higher than 0.65%, while for the GBR, GBP greater than 0.61 are observed below 1172 items. This may indicate that prediction with the GBR technique does not require large amounts of data to provide accurate results.
The highest RAEs (−1.05 for RBR and −1.14 for GBR) correspond to the lowest number of items category. This can be explained by ML prediction results being more accurate when more data are available [31,32].   • The prediction times from the GBR and RFR are so close to real-time that their representative lines are overlayed see Figure 7.
Thus, The Lasso shows poor results compared to the other algorithms used in this article. The GBR and the RFR will be used in this article for pattern identification.

•
For the RFR, below 1502 items, the GBP are higher than 0.65%, while for the GBR, GBP greater than 0.61 are observed below 1172 items. This may indicate that prediction with the GBR technique does not require large amounts of data to provide accurate results.
The highest RAEs (−1.05 for RBR and −1.14 for GBR) correspond to the lowest number of items category. This can be explained by ML prediction results being more accurate when more data are available [31,32].

Pattern Identification
Once the modeling is done, GBR and RFR allow for processing the pattern identification. Pattern identification proposes to identify variables that substantially impact the prediction results. This functionality is available on both the GBR and the RFR see Figure  8.

Pattern Identification
Once the modeling is done, GBR and RFR allow for processing the pattern identification. Pattern identification proposes to identify variables that substantially impact the prediction results. This functionality is available on both the GBR and the RFR see Figure 8.

Pattern Identification
Once the modeling is done, GBR and RFR allow for processing the pattern identification. Pattern identification proposes to identify variables that substantially impact the prediction results. This functionality is available on both the GBR and the RFR see Figure  8.  According to Figure 5, the following remarks are made: • For both prediction techniques, the variables Weight, ReinfWeight, Camber, Span, Memb_lgth, Tcx_Lgth_r, and Depth, are the variables that have the most impact on the joist assembly time.

•
For both prediction techniques, joist weight is the variable with the most significant effect on assembly time.

Knowledge Learned and Design Rules
After the pattern identification, the following information is retained: • The joist weight greatly influences the joist assembly time. This may indicate difficulty lifting and handling heavy elements on the assembly table. Improving the assembly line by installing additional lifting equipment could considerably reduce joist assembly time.

•
The length of the members and the height of the joists also significantly impact the assembly time of the joists. This may indicate difficulty in maneuvering the long bars on the assembly Some design rules can be derived from this pattern identification: • The ReinfWeight (the weight of the additional materials used to reinforce the joists elements) has a significant impact on the assembly time of the joists. For example, avoiding the use of reinforced bars by replacing them with larger profiles can significantly reduce joist assembly time.

•
Joist length impacts joist assembly time, but the number of joist components does. Designing joists that can be assembled in subassemblies will undoubtedly increase the number of components but may reduce joist assembly time.

•
The length of joist members has a considerable impact on assembly time. Fragmenting the length of joist parts such as top and bottom chords during design can reduce joist assembly time.
Top chord extensions to the right have more impact on joist assembly time than extensions to the left. Matching joists to make top chord extensions to the left during joist assembly before being flipped to the right can reduce joist assembly time.

Discussion and Interpretation of Results
The objective of this paper was to propose an approach to identify design rules such as DFMA from BIM models and ML algorithms. Thus: Quantitative analysis of 55,444 BIM models by ML algorithms identified that the factors "steel component weight", "number of cambers", "component lengths", and "component depth" are the factors with the most significant impact on the fabrication time of steel structures. Thus, the analysis of BIM models can identify the factors with high impact on the fabrication time of steel structure components.
Quantitative analysis of BIM models by ML algorithms can provide information on the knowledge of the limits of the equipment available in fabrication and assembly plants. Indeed, the factors "weight of steel components", "number of cambers", "component lengths", and "component depth" are related to the capabilities of the equipment available in the fabrication and assembly plants.
The "weight of steel components" factor is more significant than the "number of bends" factor, which in turn is greater than the "depth of components" factor. Thus, quantitative analysis of BIM models by ML algorithms can enable the classification of the weight of fabrication factors on the assembly time of steel structures. This can allow the formulation of design rules and the judicious selection of which rules to apply in case of rule conflicts.
Quantitative analysis of BIM models by ML algorithms can allow steel structure component manufacturers to identify deficiencies in the equipment available in the production facilities. Consideration of these deficiencies can allow fabricators to initiate modifications in a way that considers the limitations of their equipment.

Conclusions
This work proposes an approach to identify design rules from BIM models of previous projects and ML algorithms. To achieve this, this research suggests extracting and classifying data from BIM models of joists and using a predictive regression model to predict assembly time. Ensemble learning algorithms (RFR and GBR) proved to be better predictors than non-ensemble learning (Lasso). Furthermore, both ensemble learning algorithms were able to identify the most input variables. Based on these variables, it was possible to formulate recommendations concerning the assembly line and formulate design rules. A case study with 55,444 steel Joists demonstrates the feasibility of this method. Variables are classified according to their impact on the fabrication time. The study also proposes a series of relevant variables that could inspire future work in predicting manufacturing duration in steel joists projects in a specific workshop. The methodology proposed in this study can also be adapted to other productive construction industry sectors, such as steel structures installation, glass fabrication, and installation. For each of these applications, it will be necessary to get BIM models of previous projects and the duration of operations of these projects. The data must also come from a single production unit. A practical perspective for this study will be to apply the design rules developed on the design of new joists to be realized in these same workshops to appreciate the impact of these rules on the manufacturing time of the structures.