Error Analysis Method of Geometrically Incomplete Similarity of End-Plate Connection Based on Linear Regression

: Due to the limitations of processing errors, test conditions and other factors, geometric similarity errors in scale tests of steel structure joints are di ﬃ cult to avoid, but the research on this error is little known. Based on the similarity theory and the basic idea of the component method, this paper deduces the similar macro conditions of beam–column end-plate connections and derives the main inﬂuencing factors of geometric similarity of these types of structures. Aiming at the factor of the thickness of the end-plate, the formation mechanism of the geometrically incomplete similarity error of this type of node was studied. Through the establishment of accurate ﬁnite element models for parameterized analysis, the inﬂuence of end plate thickness on incomplete similarity error is analyzed. Based on this model and linear regression analysis methods, the prediction formulas of geometric incomplete similarity errors of beam–column end-plate connections have been established, which can signiﬁcantly reduce similar errors due to end plate thickness. This article aims to propose a method for simulating the distribution of incompletely similar errors and provide a reference for the research of similar problems.


Introduction
Beam-column end-plate connection is an important connection form for prefabricated steel structures. For many years, scholars have conducted significant experimental research on this connection form and obtained many interesting results [1][2][3]. The scale model test is an important type of node test. Under conditions of similar error control, good empirical results can be obtained. At the same time, compared with full-scale model tests, it can save considerable test expenses, site occupation, and time costs. However, due to the lack of effective communication and unified organization, each researcher only focuses on his own research content and rarely analyzes the similar errors of the trial itself, which results in unevenness in the trial results and a waste of experimental resources. Due to this situation, it is urgent to analyze incomplete similar factors in the scale model test for the beam-column end-plate connections and solve the bottlenecks caused by the scale model test.
In view of the performance characteristics of common semi-rigid connections of steel structures, including beam-column end-plate connections, scholars from various countries have performed many experiments and theoretical studies. Lu Linfeng et al. completed the failure tests of six full-scale beam-column connections under low-cycle repeated loads and compared and analyzed the

Derivation of Completely Similar Macro Conditions for Semi-Rigid Connections
This chapter analyzes the unique characteristics of semi-rigid connections and derives the completely similar conditions for semi-rigid steel nodes [11]. This is the theoretical basis for further similar error analysis.
The 'so-called' macro similarity means that the target physical quantity in the derivation process is a macro quantity. For example, the dimension parameter L refers to all geometric properties and physical quantities of the node, including length, width, plate thickness, and aperture. At the same time, the stress includes the stress in a series of components, such as steel beams, steel columns, and joint core areas.
First, the main physical quantities involved in the physical process according to the analysis elements are listed in Table 1. All relevant quantities in the analysis should be included. During the static test of semi-rigid beam-column joints, the physical quantities involved are summarized as follows: Sixteen physical quantities were selected (shown above). This research chose F L T, which are commonly used in mechanical systems, as the basic dimensional system. The a symbol introduced by Maxwell has been used to represent the dimension of any quantity 'a'. Table 2 shows all the relevant system parameters used in the case study.
According to the second law of similitude theory (Buckingham π theorem), if there is a physically meaningful equation involving a certain number (n) of physical variables, then the original equation can be rewritten in terms of a set of p = n − k dimensionless parameters constructed from the original variables.
f (π 1 , π 2 , · · ·, π n−k ) = 0 (1) The number of basic physical quantities is 3. Then n = 16, k = 3, and n − k = 13, and the above formula can be rewritten as: f (M, E, G, K, I, A, γ, ρ, σ, ε, s, θ, T) = 0 (2) According to the principle of dimensional homogeneity, the dimensions of both sides of the equations should be the same. Assuming the power exponent as x 1 , x 2 , x 3, and x 4 , the equation representing M (moment) in the form of basic parameters is expressed as followed: Since the equation is harmonious around the dimension, sorting out the relationship gives: Make the power exponent of both sides equal, Solve the equations, assuming x 1 = 1, thus, From this, the first dimensionless number can be calculated: Appl. Sci. 2020, 10, 4812 4 of 23 The equations of the remaining 12 dimensionless numbers are: The 13 dimensionless numbers of beam-column joints in semi-rigid steel structures can be calculated with: The above equation shows the exact similar conditions of the beam-column joint of the semi-rigid steel structure. Assuming that the geometric scale ratio of the model is 1:2, according to Equation (11), the similarity ratio of each physical quantity in Table 3 can be obtained. In a typical semi-rigid steel structure node test, the controlled test conditions are numerous. First, the similarity ratios of size parameters cannot all be consistent. When the size of a local component cannot meet the dimensional similarity ratio, the similarity error caused cannot be derived from traditional similarity conditions. This article focuses on the effect of inconsistent end-plate thickness on similar errors.

Numerical Simulation Method
Many deficiencies in the similarity relationships derived from the classic similarity theory have not been able to fully meet the development requirements of current model tests [12][13][14][15][16][17][18][19][20]. To solve this problem, ABAQUS 6.14 (2014, Dassault SIMULIA, Inc., Providence, USA) was used to establish an accurate node finite element model to simulate similar errors. The element type of the finite element model in this paper is the linear reduced integral element C3D8R, the grid control size of the beam-column member is 15 mm, the end-plate is 4 mm, and the bolt is 1.4 mm. The model in this paper adopts the loading method of displacement control. The analysis in this article is a static analysis, so the constitutive relationship of steel uses a trilinear model. The strength grade of the beam and column steel is Q345B, the ultimate strain is 0.1, and the elastic modulus is taken as 206,000 MPa. The bolts are all 10.9 high-strength friction-type bolts, the ultimate strain is 0.1, and the elastic modulus is 210,000 MPa. The size parameters of the model are shown in Table 4 and Figure 1. semi-rigid steel structure node test, the controlled test conditions are numerous. First, the similarity ratios of size parameters cannot all be consistent. When the size of a local component cannot meet the dimensional similarity ratio, the similarity error caused cannot be derived from traditional similarity conditions. This article focuses on the effect of inconsistent end-plate thickness on similar errors.   To make the simulation as close as possible to the actual situation, fine modeling was performed at the boundary of the nodes in the finite element analysis simulation, including ribs and stiffeners, which are consistent with the actual conditions of the actual test. Column length is the distance between the inflection points of the upper and lower adjacent layers in the actual structure. The model adopts the form of column-head loading and beam-end restraint, the bottom of the column is simulated as a hinged bearing constraint, and the beam ends are telescopic sliding bearings. The contact surface of the column and the end-plate was set to contact. The definition of the contact properties was divided into two parts. The normal action was defined as the hard contact type, the tangential action was defined as the Coulomb friction type, and the friction coefficient was To make the simulation as close as possible to the actual situation, fine modeling was performed at the boundary of the nodes in the finite element analysis simulation, including ribs and stiffeners, which are consistent with the actual conditions of the actual test. Column length is the distance between the inflection points of the upper and lower adjacent layers in the actual structure. The model adopts the form of column-head loading and beam-end restraint, the bottom of the column is simulated as a hinged bearing constraint, and the beam ends are telescopic sliding bearings. The contact surface of the column and the end-plate was set to contact. The definition of the contact properties was divided into two parts. The normal action was defined as the hard contact type, the tangential action was defined as the Coulomb friction type, and the friction coefficient was defined as 0.4. The model is shown in Figure 2. The boundary conditions are shown in Figure 3. Two materials were used in the finite element model, which were high-strength bolts and beam-column steel. The material properties of the two steels are shown in Table 5.

Numerical Simulation Method
In order to avoid the simulation error caused by the size effect of the unit, the unit size ratio of the model and the prototype is set to a ratio that meets the similar ratio, see Table 6 for details. By comparing the stress distribution of the completely similar model and prototype, the accuracy of numerical simulation is illustrated.
The model adopts the form of column-head loading and beam-end restraint, the bottom of the column is simulated as a hinged bearing constraint, and the beam ends are telescopic sliding bearings. The contact surface of the column and the end-plate was set to contact. The definition of the contact properties was divided into two parts. The normal action was defined as the hard contact type, the tangential action was defined as the Coulomb friction type, and the friction coefficient was defined as 0.4. The model is shown in Figure 2. The boundary conditions are shown in Figure 3.
Two materials were used in the finite element model, which were high-strength bolts and beamcolumn steel. The material properties of the two steels are shown in Table 5.
In order to avoid the simulation error caused by the size effect of the unit, the unit size ratio of the model and the prototype is set to a ratio that meets the similar ratio, see Table 6 for details. By comparing the stress distribution of the completely similar model and prototype, the accuracy of numerical simulation is illustrated.    The dimensions in the table are the mesh sizes in the finite element models.

Comparison of Semi-Rigid Node Indicators Under Completely Similar Conditions
This section looks for the distribution of similar errors by analyzing the errors between geometrically incompletely similar models and completely similar models. To study the error, the data of the completely similar model must be obtained first. To examine the completely similar results, a model that strictly adheres to the 1:2 geometric scale of the prototype node is established, and the parameters of the scaled model are determined based on the fully similar conditions derived. Similar conditions can be judged from the stress state of the key parts of the node.
As can be seen from

Comparison of Semi-Rigid Node Indicators Under Completely Similar Conditions
This section looks for the distribution of similar errors by analyzing the errors between geometrically incompletely similar models and completely similar models. To study the error, the data of the completely similar model must be obtained first. To examine the completely similar results, a model that strictly adheres to the 1:2 geometric scale of the prototype node is established, and the parameters of the scaled model are determined based on the fully similar conditions derived. Similar conditions can be judged from the stress state of the key parts of the node.
As can be seen from Figures 4-8, the stress distributions of the nodes are almost exactly the same under the completely similar situation, including the stress distribution of beam-column members, end-plates, and the core area of the nodes. The stress data of the six typical points, such as the upper flange of the beam, the outer flange of the column, the partition on the column, the upper part of the column web, the core area of the node, and the upper and middle part of the end-plate, are listed in Figures 6-8. It can be seen that with the progress of the loading process, the stress changes of the upper flange of the beam, the core area of the joint, and the upper and middle parts of the end-plate increase monotonically, while the change trend of the stress at the outer flange of the column first decreases and then increases, and the trend of the stress change at the upper part of the column web is to maintain flatness and then increase monotonically. After superimposing the above-mentioned point stress trends of the prototype and the completely similar model (Figure 8), it can be seen that the curves of the two are almost identical, which proves the correctness of the complete similarity of the two and the previously derived completely similar conditions.    Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 22 model (Figure 8), it can be seen that the curves of the two are almost identical, which proves the correctness of the complete similarity of the two and the previously derived completely similar conditions.

Effect of End-plate Thickness on Similar Errors
To study the effects of end-plate thickness on similar errors, six non-exact similar models with a 1:2 scale ratio were established under exactly the same conditions, with end-plate thicknesses of 4 mm, 6 mm, 10 mm, 12 mm, 14 mm, and 16 mm, with other similar conditions unchanged, where the

Effect of End-plate Thickness on Similar Errors
To study the effects of end-plate thickness on similar errors, six non-exact similar models with a 1:2 scale ratio were established under exactly the same conditions, with end-plate thicknesses of 4 mm, 6 mm, 10 mm, 12 mm, 14 mm, and 16 mm, with other similar conditions unchanged, where the

Effect of End-plate Thickness on Similar Errors
To study the effects of end-plate thickness on similar errors, six non-exact similar models with a 1:2 scale ratio were established under exactly the same conditions, with end-plate thicknesses of 4 mm, 6 mm, 10 mm, 12 mm, 14 mm, and 16 mm, with other similar conditions unchanged, where the

Effect of End-Plate Thickness on Similar Errors
To study the effects of end-plate thickness on similar errors, six non-exact similar models with a 1:2 scale ratio were established under exactly the same conditions, with end-plate thicknesses of 4 mm, 6 mm, 10                                        As can be seen in Figures 9-21, compared with the stress distribution of completely similar models, when the end-plate thickness deviates from the completely similar conditions, the model's stress distribution shows a large difference. The stress trends and numerical levels of beam and column members, end-plate, and the core of the joint are no longer highly consistent with the prototype, and a large similarity error has occurred. The six typical points on the beam flange, including the outer flange of the column, the partition on the column, the upper part of the column web, the core area of the node, and the upper and middle parts of the end-plate, are still selected.
The changes in stress data are listed in Figures 9-21. It can be seen that if the degree of deviation from the completely similar model is different, the value and change law of the stress distribution at each point will be considerably different.
For the upper flange of the beam, the trend of stress changes is monotonically increasing when the thickness of the end-plate is 16 mm and the stress decreases in the later stage of loading, which is inconsistent with the overall law. For the outer flange of the column, the trend of the stress change is to decrease first and then increase, and when the thickness of the end-plate is 4 mm, the stress growth in the later stage is relatively slow and all other factors linearly increase. For the partition on the column, the trend of stress changes is first increasing, then decreasing, and then flattening, but when the end-plate thickness is 4 mm, the stress does not decrease. For the upper part of the column web, except for the end-plate thicknesses of 4 mm and 6 mm, the stress change trend is monotonous, and the deviation from the overall trend is more obvious at 4 mm. For the core area of the node, the trend of stress change is a monotonic increase, except for the 4 mm case, for which the changes of the other curves are closer. For the middle and upper parts of the end-plate, when the thickness is 4 mm and 6 mm, the stress change trend is to increase first and then decrease, and the rest are monotonous.
Therefore, when the thickness of the end-plate deviates far from the complete similarity, the value of stress and the law of change will change greatly, especially when the thickness of the end-plate is too thin or too thick; the rule will be more obvious; and the stress change trend of the loading process will significantly change.
Through the stress change diagram of the end-plate connection with the thickness of the end-plate (Figure 22), it can be seen that as the thickness of the end-plate increases, the stress distribution of the end-plate changes significantly and presents a nonlinear characteristic. In the case of 4mm thickness, the maximum stress is mainly concentrated around the bolt hole. In the case As can be seen in Figures 9-21, compared with the stress distribution of completely similar models, when the end-plate thickness deviates from the completely similar conditions, the model's stress distribution shows a large difference. The stress trends and numerical levels of beam and column members, end-plate, and the core of the joint are no longer highly consistent with the prototype, and a large similarity error has occurred. The six typical points on the beam flange, including the outer flange of the column, the partition on the column, the upper part of the column web, the core area of the node, and the upper and middle parts of the end-plate, are still selected. The changes in stress data are listed in Figures 9-21. It can be seen that if the degree of deviation from the completely similar model is different, the value and change law of the stress distribution at each point will be considerably different.
For the upper flange of the beam, the trend of stress changes is monotonically increasing when the thickness of the end-plate is 16 mm and the stress decreases in the later stage of loading, which is inconsistent with the overall law. For the outer flange of the column, the trend of the stress change is to decrease first and then increase, and when the thickness of the end-plate is 4 mm, the stress growth in the later stage is relatively slow and all other factors linearly increase. For the partition on the column, the trend of stress changes is first increasing, then decreasing, and then flattening, but when the end-plate thickness is 4 mm, the stress does not decrease. For the upper part of the column web, except for the end-plate thicknesses of 4 mm and 6 mm, the stress change trend is monotonous, and the deviation from the overall trend is more obvious at 4 mm. For the core area of the node, the trend of stress change is a monotonic increase, except for the 4 mm case, for which the changes of the other curves are closer. For the middle and upper parts of the end-plate, when the thickness is 4 mm and 6 mm, the stress change trend is to increase first and then decrease, and the rest are monotonous.
Therefore, when the thickness of the end-plate deviates far from the complete similarity, the value of stress and the law of change will change greatly, especially when the thickness of the end-plate is too thin or too thick; the rule will be more obvious; and the stress change trend of the loading process will significantly change.
Through the stress change diagram of the end-plate connection with the thickness of the end-plate (Figure 22), it can be seen that as the thickness of the end-plate increases, the stress distribution of the end-plate changes significantly and presents a nonlinear characteristic. In the case of 4mm thickness, the maximum stress is mainly concentrated around the bolt hole. In the case of 16mm thickness, the maximum stress distribution range is obviously expanded, and the maximum stress gradually decreases as the thickness of the end-plate increases. Therefore, although the stress value of the end-plate does not change linearly with the increase in thickness, it is regular. In this paper, the linear regression method is not used to accurately simulate the distribution law of node stress, but to reduce the similar error according to the general data law. Through linear regression, it can be seen that the similarity error of the end-plate is greatly reduced. Linear regression is not a perfect method for the error analysis involved in this article, but it is an effective method.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 22 Linear regression is not a perfect method for the error analysis involved in this article, but it is an effective method.

Error Analysis Method
Through parameterized numerical model analysis, we obtained a large number of stress data of incompletely similar nodes. Based on these data, this paper used regression analysis to obtain similar errors. The flow chart of modeling and regression analysis of incompletely similar nodes is as follows (Figure 23):

Error Analysis Method
Through parameterized numerical model analysis, we obtained a large number of stress data of incompletely similar nodes. Based on these data, this paper used regression analysis to obtain similar errors. The flow chart of modeling and regression analysis of incompletely similar nodes is as follows (Figure 23 The method of regression analysis is as follows: Regression analysis is a predictive modeling technique, which studies the relationship between a dependent variable (target) and an independent variable (feature). This technique is often used for predictive analysis and to find causal relationships between variables. Usually a curve is used to fit the data points, and the goal is to minimize the difference in distance between the curve and the data points. Linear regression is a regression problem which assumes that the target value and the characteristics are linearly related, that is, to satisfy a multivariate linear equation. By constructing the loss function, the parameters w and b when the loss function is minimum are solved. Usually, we can express it as the following formula: where y is the predicted value, and the independent variable x and the dependent variable y are known. What we want to achieve is the prediction of the value of y when adding a new x. Therefore, in order to construct this functional relationship, we need to solve the two parameters w and b in the linear model through known data points. Solving the best parameters requires a standard to measure the results. For this, we need to quantify an objective function formula so that the computer can continuously optimize during the solution process.
For the error-solving problem of incompletely similar nodes, the loss function can be defined as follows： where ypi is the predicted value of the node, yi is the known value of the node, and n is the number of node models. The method of regression analysis is as follows: Regression analysis is a predictive modeling technique, which studies the relationship between a dependent variable (target) and an independent variable (feature). This technique is often used for predictive analysis and to find causal relationships between variables. Usually a curve is used to fit the data points, and the goal is to minimize the difference in distance between the curve and the data points. Linear regression is a regression problem which assumes that the target value and the characteristics are linearly related, that is, to satisfy a multivariate linear equation. By constructing the loss function, the parameters w and b when the loss function is minimum are solved. Usually, we can express it as the following formula: where y is the predicted value, and the independent variable x and the dependent variable y are known.
What we want to achieve is the prediction of the value of y when adding a new x. Therefore, in order to construct this functional relationship, we need to solve the two parameters w and b in the linear model through known data points. Solving the best parameters requires a standard to measure the results. For this, we need to quantify an objective function formula so that the computer can continuously optimize during the solution process.
For the error-solving problem of incompletely similar nodes, the loss function can be defined as follows: where y pi is the predicted value of the node, y i is the known value of the node, and n is the number of node models.
L is the average squared distance between the predicted value and the true value. Substituting Equation (11) into the loss function, and taking the parameters w and b to be solved as the independent variables of the function L, we can obtain: In order to obtain the optimal solutions of w and b, it is necessary to minimize the loss function L. Using the least squares parameter estimation method, we can derive L(w,b) to w and b, respectively. Suppose the derivative is 0 and calculate the closed solution of the optimal solution of w and b:

Discussion
The above analysis shows that when the thickness of the end-plate changes, the overall stress error of incompletely similar nodes cannot be ignored, and there is a certain regularity. A regression analysis was performed on the error data to obtain a similar error calculation method. The error data were arranged based on the calculations in Figure 21.
It can be seen in Figure 24 that all the error data, except for the middle and upper parts of the end-plate, present a monotonic linear growth relationship related to the end-plate thickness. Therefore, the data distribution is suitable for linear regression analysis. By importing the data into Equations (12)- (16) for univariate linear regression analysis, we obtain the corresponding regression formula and correlation coefficient, the closer the correlation coefficient was to 1, the more significant the regression formula. The comparison between the regression curve and the original data points is shown in                  greatly improved. In this way, the similarity error problem of incompletely similar models has been solved.  It is the main purpose of this paper to reduce the similarity error by looking for the law of data change. The authors did not pursue accurate linear stress change laws, but attempted to reduce Therefore, the linear regression can realize the prediction and correction of most similar errors, and the residual similar errors after correction are shown in Table 7. Except for the error of the stress data in the middle and upper parts of the end-plate, most of the similar errors have been greatly improved. In this way, the similarity error problem of incompletely similar models has been solved.  It is the main purpose of this paper to reduce the similarity error by looking for the law of data change. The authors did not pursue accurate linear stress change laws, but attempted to reduce similar errors through linear regression. It can be seen that linear regression is at least an effective method applied in this field.

Conclusions
(1) Through the derivation, the completely similar conditions of the semi-rigid beam-column connection structure were obtained, and the constraints of the similar factors were clarified. This paper analyzed the limitations of the classic similarity theory and provided a theoretical basis for further research on similarity errors. (2) The numerical model proves that under the premise of meeting the completely similar conditions, the prototype and the model's stress distribution law and development process are completely consistent. (3) This paper took the thickness of the end-plate as a similar factor, established six incomplete scale models, selected the stress development history of six points as the research object, analyzed, in detail, the influence of the end-plate thickness on the model's stress development process and distribution when it does not meet the exact similar conditions, and summarized the stress development law of typical points of the end-plate connection. (4) This paper calculated the similarity errors due to the incomplete end-plate thicknesses.
The analysis shows that the similarity error and the similarity factor are roughly monotonic and linear. A regression analysis was performed, and a regression formula for similar error and end-plate thickness was obtained. The correlation coefficient of the corresponding formula was calculated, and the results show that except for the stress at the end-plate points, the univariate linear regression equations at other points are more significant. Similar error analysis methods were provided which can be applied to other types of models. (5) The analysis results in this paper show that even if the stress distribution of the model only approximately meets the linear distribution law, the linear regression method can still reduce the incomplete similarity error to a large extent, which provides a reference for future study of the similarity error distribution law.