1. Introduction
The fabrication phase of structural steel projects represents 30 to 40% of the overall building cost [
1]. Yet, the decisions taken during the design phase affect 88% of the steelworks’ costs and time of execution [
2]. However, the highest interest of the designers remains the compliance with standards [
3] and the appropriate choice of structural elements for the resistance of the structures [
4]. Not much time, a few minutes to a few hours are devoted to evaluating models for cost and time reduction and the search for alternative solutions during the design phase [
5]. These evaluations are made without considering the particularities of the manufacturing plant where the work will be carried out [
6]. This situation leads to a sub-optimal design [
7,
8,
9]. In the traditional Design Bid Build (DBB) procurement, where a project is carried out in a linear and fragmented process, manufacturing specialists often intervene at the end of the design phase [
10,
11]. At this moment, the modifications they make cause delays and additional costs in completing the projects [
12]. This situation is similar to the situation in Product Development Engineering (PDE) in the 1990s.
In PDE, designers and manufacturers collaborate formally through different methods and design rules, such as design for manufacturing and assembly (DFMA) [
13]. These rules provide designers with the essential knowledge to reduce the cost, the time, the tools, the number of operations, the quantity of material, and the number of workers during projects while improving quality during the manufacturing and assembly of parts from the design phase [
14], and for a specific workshop [
15].
DFMA consists of identifying and considering manufacturing and assembly constraints during design. This process leads to design rules and tools, which help obtain simplified and standardized products suitable for the manufacturing and assembly process [
13,
14]. The DFMA methodology also improves the manufacturing and assembly process by integrating structural changes that promote essential design criteria [
16]. A case study from Douglas Commercial Airlines demonstrates the significant benefits of using DFMA in a manufacturing process. Notably: 51% reduction in the number of parts, 37% reduction in the cost of manufacturing parts, 50% faster time to market, 68% improvement in the quality and reliability of the final product, 62% reduction in assembly time, and 57% reduction in manufacturing time [
17]
DFMA identifies design factors with a high impact on manufacturing and assembly processes [
18]. One approach in identifying these factors is to hold meetings with designers, manufacturers, and assemblers with extensive knowledge and experience to assess the design factors available for a designed product [
11,
13]. The identified factors help to establish criteria that will allow the evaluation of different product designs [
19]. This approach, which seems to be possible to implement in the PDE, is difficult to apply in the construction industry because of the context of Design Bid Build (DBB), where there is real fragmentation between project phases [
6]. However, recent work in machine learning (ML) shows that it is possible to extract knowledge from the digital data of a process. Therefore, Building Information Modeling (BIM) offers relevant data for ML in the construction industry.
This paper addresses the research question: Is it possible to identify design rules such as DFMA from BIM models of previous projects and machine learning algorithms? As an answer to this question, this paper aims to propose an approach to identify design rules from BIM models of previous projects and ML algorithms:
To achieve this goal, this paper proposes to validate the possibility of extracting design factors from BIM models of steel structures and ML, the possibility of establishing some design rules to reduce the fabrication time from the obtained design factors.
For that, this article suggests a literature review to justify the choice of methodology, a methodology, and a case study with 55,444 BIM models of steel joists. The BIM models are from a major North American steel structure manufacturer.
3. Methodology
According to the Cross-Industry Standard Process for Data Mining (CRISP-DM), the main steps in data prediction are system understanding, data understanding, data preparation, data modeling, and outcome evaluation [
57]. This paper adds pattern identification and design rules to these steps (
Figure 1).
The proposed research approach could be described as follows:
3.1. System Understanding
The objective of system understanding is to understand the goals of the prediction and the requirements necessary to achieve these objectives. In construction projects, cost and schedule are generally the performance criteria sought to realize the project. They can be defined as prediction objectives. Costs and schedules depend on the geometrical and functional decisions made during the design phase. These decisions will be identified to serve as prediction criteria during system understanding. During this step, technologies and tools are also identified for the prediction set.
3.2. Data Understanding
Data understanding is the next step after system understanding. This step consists in collecting and analyzing the data necessary for the prediction objectives. To do this, it is necessary to group and classify the data related to the prediction criteria sought with data analysis tools during the data understanding. The collection can be done with MS Excel or Google sheet spreadsheets. After data collection, the next step is to explore the data to ensure its quality. One method is to eliminate data that is either too large or too small to fit the criteria being analyzed.
3.3. Data Preparation
Data preparation consists of preparing the data for modeling. This is done in several steps, including:
data cleaning, which consists of removing, correcting, or deleting erroneous values,
data construction which consists of determining additional attributes that will be useful for data modeling, and
data integration consists of combining data from various sources.
3.4. Data Modelling
Modeling consists of building and evaluating various models based on different modeling techniques. Here we select the algorithms to try; we assess the competing models based on the results obtained and the performance criteria sought.
The implementation of the approach proposed in this article requires the use of an interpretive programming language and a programming package for the ML. In this article, the tool proposed as an interpretive programming language is Python, and the package for ML is Tensor Flow. The prepared data will be used in modeling with the RFR, GBR, and Lasso techniques. The prediction results with these techniques are presented at the end of the modeling.
3.5. Data Evaluation
This step compares the results of the three techniques used for modeling. The algorithm with which MAE and GBP will be close to 0 and RAE get closer to 1 has the best performance.
3.6. Pattern Identification
This step consists of identifying the variables that have the most impact on steel structure fabrication and assembly time.
3.7. Knowledge Learned & Design Rules:
This part consists of understanding the results of the pattern identification, establishing design rules, and formulating recommendations to improve the fabrication and assembly line.
4. Case Study
The case study in this article concerns the assembly of steel joists for a major manufacturer of steel structures in North America
4.1. System Understanding
In the steel construction industry, steel joists are lightweight steel structures that support roofs and floors and transfer the loads they receive directly to the steel structures that support them.
Joists are mainly composed of top and bottom chords, webs, and seats. See
Figure 2.
The joist assembly operation consists of:
Reading and understanding the plans,
Identification of the steel elements that will build the joist,
Identification of the positions of the elements on the joist,
Lifting and placing the top and bottom chords on the assembly table,
Lifting and positioning of joist components,
Welding of joist elements,
Final check,
Removing the assembled joist.
An automatic device is installed on the joist assembly table to measure the assembly time. Each time the beam elements arrive on the assembly table for the first time, the automatic device starts counting the assembly time. The counting will stop when the assembly table becomes empty again. This feature reduces human intervention in starting the countdown. However, the device does not stop when, leaving the joists on the assembly table, the workers go on a break or weekend.
4.2. Data Understanding
The data for this study come from the main variables that characterize the joists.
Depth (mm) is the vertical distance between the axis of the top and bottom chords
Span (mm) is the distance between the two positions of the joist seats
Camber (mm) is the distance between the highest point before, and after bending; the camber is imposed on the upper chords to keep the joist horizontal when loaded
ComponentCount (qty) is the number of elements
Memb_Lgth (mm) is the total length of all its elements
weight (Weight(kg)) is the total mass of the features that make up the joist
ChordsDis (binary) checks if the top and bottom chords are different in profile types and dimensions
ReinfWeight (mm) is the weight of the additional materials used to reinforce the joist elements
Tcs_Lgth_l (mm) is the extension length to the left of the top chord.
Tcs_Lgth_r (mm) is the extension length to the right of its top chord
Tcx_Depth_l (mm) is the depth of the profile used for the left extension of the top chord
Tcx_Depth_r (mm) is the depth of the profile used for the extension to the right of the top chord
TcxType_x (from type1 to type 8) is the type of top chord used to manufacture the joist
Dsg_type (from type 1 to type 4) is the type of joist design used
4.3. Data Preparation
Data cleaning: More than 170,000 assembly times were recorded, corresponding to more than 170,000 joists. However, these data contain noise. The noise is mainly due to inattention errors by workers, joists remaining on tables during breaks, weekends, and holidays. To reduce the noise in the data set, this study proposes to obtain the maximum and minimum time to assemble a joist on the assembly table. This technique consists of excluding the study, the times too short or too much to correspond to the time for assembling the beams. A maximum time of 25 min and a minimum time of 5 min is obtained from professionals. These times will be retained as the minimum and maximum limits of the assembly times retained for the study. This technique will reduce the data by 170,000 for 55,444 see
Figure 3.
Data splitting for a better balance in prediction, the data were divided into four equivalent groups with similar statistical characteristics.
Table 1,
Table 2,
Table 3 and
Table 4 present the organization of the configured groups with their respective features. The features were considered according to the prediction criteria Depth, Span, Camber1 ComponentCount Memb_Lgth Weight, and RealTime. These criteria were organized according to the mean, the minimum (min), the first quartile (25%), the median (50%), the third quartile (75%), the maximum (max), and the standard deviation (std), as seen in
Table 1,
Table 2,
Table 3 and
Table 4. These measures ensure that the data is well distributed among the four groups. Three of these groups were used for training the learning algorithms and the fourth for testing them.
4.4. Modeling
Three algorithms (GBR, RFR, and Lasso) are used to predict the manufacturing time. The results of this modeling are as illustrated in
Table 5,
Table 6 and
Table 7.
Categories represent the range of lengths in feet to which the items belong.
The number of items represents the number of registered joists belonging to a category
Real-time is the sum of the recorded assembly times of the joists corresponding to a given category.
Prediction is the sum of the predicted times of the beams corresponding to a given category.
The GBP, Rae, and Mae measurements are the measures that allow the evaluation of the joists by category according to the prediction technique used.
The data resulting from
Table 5,
Table 6 and
Table 7 give the difference of the values GBP, Rae, and Mae of the algorithms Lasso, GBR, and RFR. Indeed, the order of magnitude of the GBP, Rae, and Mae values are very close for the GBR and RFR algorithms. However, these values are more significant for the Lasso algorithm.
4.5. Evaluation
For the choice of the best algorithm.
Thus, The Lasso shows poor results compared to the other algorithms used in this article. The GBR and the RFR will be used in this article for pattern identification.
For the RFR, below 1502 items, the GBP are higher than 0.65%, while for the GBR, GBP greater than 0.61 are observed below 1172 items. This may indicate that prediction with the GBR technique does not require large amounts of data to provide accurate results.
The highest RAEs (−1.05 for RBR and −1.14 for GBR) correspond to the lowest number of items category. This can be explained by ML prediction results being more accurate when more data are available [
31,
32].
4.6. Pattern Identification
Once the modeling is done, GBR and RFR allow for processing the pattern identification. Pattern identification proposes to identify variables that substantially impact the prediction results. This functionality is available on both the GBR and the RFR see
Figure 8.
According to
Figure 5, the following remarks are made:
For both prediction techniques, the variables Weight, ReinfWeight, Camber, Span, Memb_lgth, Tcx_Lgth_r, and Depth, are the variables that have the most impact on the joist assembly time.
For both prediction techniques, joist weight is the variable with the most significant effect on assembly time.
4.7. Knowledge Learned and Design Rules
After the pattern identification, the following information is retained:
The joist weight greatly influences the joist assembly time. This may indicate difficulty lifting and handling heavy elements on the assembly table. Improving the assembly line by installing additional lifting equipment could considerably reduce joist assembly time.
The length of the members and the height of the joists also significantly impact the assembly time of the joists. This may indicate difficulty in maneuvering the long bars on the assembly table. A system for handling thin bars can considerably reduce joist assembly time.
Extensions on the right side of the joist top chords have a more significant impact on joist assembly time than on the left side. This may indicate a difficulty in symmetrical work on the joist assembly table. Symmetrically equipping the assembly table can reduce joist assembly time.
Some design rules can be derived from this pattern identification:
The ReinfWeight (the weight of the additional materials used to reinforce the joists elements) has a significant impact on the assembly time of the joists. For example, avoiding the use of reinforced bars by replacing them with larger profiles can significantly reduce joist assembly time.
Joist length impacts joist assembly time, but the number of joist components does. Designing joists that can be assembled in subassemblies will undoubtedly increase the number of components but may reduce joist assembly time.
The length of joist members has a considerable impact on assembly time. Fragmenting the length of joist parts such as top and bottom chords during design can reduce joist assembly time.
Top chord extensions to the right have more impact on joist assembly time than extensions to the left. Matching joists to make top chord extensions to the left during joist assembly before being flipped to the right can reduce joist assembly time.
5. Discussion and Interpretation of Results
The objective of this paper was to propose an approach to identify design rules such as DFMA from BIM models and ML algorithms. Thus:
Quantitative analysis of 55,444 BIM models by ML algorithms identified that the factors “steel component weight”, “number of cambers”, “component lengths”, and “component depth” are the factors with the most significant impact on the fabrication time of steel structures. Thus, the analysis of BIM models can identify the factors with high impact on the fabrication time of steel structure components.
Quantitative analysis of BIM models by ML algorithms can provide information on the knowledge of the limits of the equipment available in fabrication and assembly plants. Indeed, the factors “weight of steel components”, “number of cambers”, “component lengths”, and “component depth” are related to the capabilities of the equipment available in the fabrication and assembly plants.
The “weight of steel components” factor is more significant than the “number of bends” factor, which in turn is greater than the “depth of components” factor. Thus, quantitative analysis of BIM models by ML algorithms can enable the classification of the weight of fabrication factors on the assembly time of steel structures. This can allow the formulation of design rules and the judicious selection of which rules to apply in case of rule conflicts.
Quantitative analysis of BIM models by ML algorithms can allow steel structure component manufacturers to identify deficiencies in the equipment available in the production facilities. Consideration of these deficiencies can allow fabricators to initiate modifications in a way that considers the limitations of their equipment.