An Entropy-Based Neighborhood Rough Set and PSO-SVRM Model for Fatigue Life Prediction of Titanium Alloy Welded Joints

In order to obtain comprehensive assessment of the factors influencing fatigue life and to further improve the accuracy of fatigue life prediction of welded joints, soft computing methods, including entropy-based neighborhood rough set reduction algorithm, the particle swarm optimization (PSO) algorithm and support vector regression machine (SVRM) are combined to construct a fatigue life prediction model of titanium alloy welded joints. By using an entropy-based neighborhood rough set reduction algorithm, the influencing factors of the fatigue life of titanium alloy welded joints such as joint type, plate thickness, etc. are analyzed and the reduction results are obtained. Fatigue characteristic domains are proposed and determined subsequently according to the reduction results. The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints is established in the suggested fatigue characteristic domains. Experimental results show that by taking into account the impact of joint type, the PSO-SVRM model could better predict the fatigue life of titanium alloy welded joints. The PSO-SVRM model indicates the relationship between fatigue life and fatigue life influencing factors in multidimensional space compared with the conventional least-square S-N curve fitting method, it could predict the fatigue life of the titanium alloy welded joints more accurately thus helps to the reliability design of the structure.


Introduction
Welded components have been widely used in the engineering equipment field under static or dynamic load circumstances because of their advantages of good connectivity, light weight and ease of use. Due to stress concentration however, the welded joint becomes one of the weak links of the structural reliability of the product. This causes great hidden dangers to the safety and normal work of the structure.
Currently, three popular fatigue life analysis methods of welded joints have been proposed, which include S-N curve method, local stress strain method and fracture mechanics method [1]. Compared with the local stress strain method and fracture mechanics method, the S-N curve method is more widely used in the design stage which mainly consists of nominal stress method [2], hot spot stress method [3,4] and nodal force-based structural stress method [5,6]. Due to the mesh insensitivity of structural stress calculations, the higher accuracy of fatigue life prediction and the wide applicability, the nodal force-based structural stress method is one of the most notable fatigue life evaluation methods for welded structures. Dong reprocessed thousands of fatigue test data of steel welded joints from the last 50 years [7]. The research results indicated that when taking into account the influences of

Nodal Force Based Structural Stress Principle
In the finite element stress calculation the results are affected by mesh size, which causes inconsistency in calculations for different structures. In order to address this problem, structural stress based on the line force is first defined in the nodal force based structural stress method.
The normal structural stress at each node is defined as where F y is the vertical stress in the weld toe, M x is the moment around the weld toe. f y = Fy/l is the line force and m x = M x /l is the line moment as shown in Figure 1. Structural stress can be expressed as [5]: An equivalent structural stress range parameter can be defined as: where S σ Δ represents the structural stress range calculated, ( ) I r is a dimensionless parameter derived by fracture mechanics considerations, and m is the crack propagation exponent in the conventional Paris law, taking on a value of about 3.6 [17]. It could be seen that the equivalent structural stress parameter described in Equation (5) (6) where N is the number of cycles which indicate the fatigue life of the structure, C is material constant and h represents the negative slope of the master S-N curve.

Entropy based Neighborhood Rough Set Model
Basic concepts are introduced in this section before the entropy-based neighborhood rough set model is established. Fatigue life influencing factors of the titanium alloy welded joints are analyzed by using the established model. Key influencing factors together with the weights are obtained and the fatigue characteristic domains are subsequently determined.

Basic Concept
Proposed by Pawlak, rough set theory [18] has been used successfully as a new feature reduction tool to discover data dependencies and reduce the number of features, but the classical rough set can only operate effectively with data sets containing categorical values. When using classical rough set theory, discretion must be done beforehand which inevitably result in loss of information. To overcome this drawback, Lin [19] proposed a neighborhood rough set model by extending the equivalence relation of classical rough sets. The neighborhood rough set model can process both numerical and categorical data set efficiently [20].
Uncertainty measure is an important way to describe classification ability and classification accuracy in rough set theory. Many scholars have studied this. For example, Chen proposed the An equivalent structural stress range parameter can be defined as: where ∆σ S represents the structural stress range calculated, I(r) is a dimensionless parameter derived by fracture mechanics considerations, and m is the crack propagation exponent in the conventional Paris law, taking on a value of about 3.6 [17]. It could be seen that the equivalent structural stress parameter described in Equation (5) can capture the effects of stress concentration, plate thickness and loading mode effects on fatigue behavior of welded components. The formula for fatigue life calculation of welded joints using the equivalent structural stress ∆S s can be expressed as: where N is the number of cycles which indicate the fatigue life of the structure, C is material constant and h represents the negative slope of the master S-N curve.

Entropy Based Neighborhood Rough Set Model
Basic concepts are introduced in this section before the entropy-based neighborhood rough set model is established. Fatigue life influencing factors of the titanium alloy welded joints are analyzed by using the established model. Key influencing factors together with the weights are obtained and the fatigue characteristic domains are subsequently determined.

Basic Concept
Proposed by Pawlak, rough set theory [18] has been used successfully as a new feature reduction tool to discover data dependencies and reduce the number of features, but the classical rough set can only operate effectively with data sets containing categorical values. When using classical rough set theory, discretion must be done beforehand which inevitably result in loss of information. To overcome this drawback, Lin [19] proposed a neighborhood rough set model by extending the equivalence relation of classical rough sets. The neighborhood rough set model can process both numerical and categorical data set efficiently [20].
Uncertainty measure is an important way to describe classification ability and classification accuracy in rough set theory. Many scholars have studied this. For example, Chen proposed the concept of neighborhood entropy [21] and suggested several uncertainty measures including neighborhood accuracy, information quantity, neighborhood entropy and information granularity in the neighborhood systems [22]. Uncertain measures like roughness measure, accuracy measure, entropy and granularity for covering rough set models were studied and a new kind of partial order was suggested [23]. Huang [24] discussed the uncertainty measures of information entropy and rough entropy.
Feature reduction is a quite useful data preprocessing technique, aiming to obtain a minimal feature subset from a decision system while maintaining the same classification accuracy by deleting the noisy, irrelevant or misleading features. An uncertainty measure method of neighborhood entropy to evaluate the quality of the reduction results is proposed for neighborhood rough set reduction algorithm in this work. The fatigue characteristic domain is then determined on basis of the reduction results for classification of the fatigue samples to further reduce the dispersion level.
Several definitions about the basic concept of neighborhood rough set theory would be introduced first before further discussion of the neighborhood entropy based reduction algorithm. Definition 1. A neighborhood decision system can be denoted as ND = (U, C ∪ D, δ) and a distance function ∆(x, y) → [0, 1] . Where U = {x 1 , x 2 , . . . , x n } is a nonempty finite set of objects called the universe, C = {a 1 , a 2 , . . . , a m } is the condition feature set, D is the set of decision features, and δ is the neighborhood parameter (0 ≤ δ ≤ 1). ∆(x, y) is a distance function, which satisfies: For a feature subset B ⊆ C and a neighborhood parameter δ ∈ [0, 1], a similarity relation denoted by: The neighborhood class n δ B (x) of x ∈ U in the subspace B is defined as: Definition 2. Let ND = (U, C ∪ D, δ) be a neighborhood decision system, B ⊆ C be a feature subset, for any object set X ⊆ U, the neighborhood lower and upper approximations of X on B are defined as: Definition 3. Let ND = (U, C ∪ D, δ) be a neighborhood decision system, B, M, N ⊆ C be the feature subsets, the neighborhood entropy of B is denoted by: and the neighborhood joint entropy of M and N is denoted by: Let U/D = {D1, D2, . . . , Dn} be equivalence classes constituted by decision feature set D on the universe U, the neighborhood conditional entropy of D on B is denoted by: Entropy 2019, 21, 117

of 13
Definition 4. Let ND = (U, C ∪ D, δ) be a neighborhood decision system, B ⊆ C be a feature subset, ∀a ∈ C − B the significance of condition attribute a is defined as: Definition 5. Let ND = (U, C ∪ D, δ) be a neighborhood decision system, B ⊆ C be a feature subset, we say B is a reduction of C if:

Entropy-Based Neighborhood Reduction Algorithm
An uncertain measurement of neighborhood entropy-based reduction algorithm is suggested in this work. Generally speaking, there are five main steps in the algorithm. The flow chart of the entropy-based neighborhood reduction algorithm is shown as Figure 2 and the detailed steps of the algorithm are shown in Appendix A.1.
∀ ∈ − the significance of condition attribute a is defined as: and

Entropy-based Neighborhood Reduction Algorithm
An uncertain measurement of neighborhood entropy-based reduction algorithm is suggested in this work. Generally speaking, there are five main steps in the algorithm. The flow chart of the entropy-based neighborhood reduction algorithm is shown as Figure 2 and the detailed steps of the algorithm are shown in Appendix A1. In the figure ND denotes that the neighborhood decision system is the input of the algorithm and red indicating the reduction result set is the output of the algorithm. The initial red is set to be φ at the beginning. In each iteration of the algorithm, significance of each condition attribute is calculated according to Equation (14), then the condition attribute ak which has the max significance is selected. Determine whether the significance of the selected attribute ak is bigger than 0. If it is bigger than 0, ak is added to the reduction result sets red, the next iteration of the algorithm will begin, otherwise, the algorithm stops and output the reduction results red.

Fatigue Characteristic Domain Determination
Commercial pure titanium has already been used for seawater cooling pipes in thermal power plants because of its high corrosion resistance in seawater. Additionally, the advantage of its high specific strength has already been proved on racing yachts, although there is still a lack of research concerning the fatigue strength of titanium fillet welded joints that are indispensable for ship structures.
In this work, a neighborhood decision system (Table 1) of titanium alloy welded joints was In the figure ND denotes that the neighborhood decision system is the input of the algorithm and red indicating the reduction result set is the output of the algorithm. The initial red is set to be φ at the beginning. In each iteration of the algorithm, significance of each condition attribute is calculated according to Equation (14), then the condition attribute a k which has the max significance is selected. Determine whether the significance of the selected attribute a k is bigger than 0. If it is bigger than 0, a k is added to the reduction result sets red, the next iteration of the algorithm will begin, otherwise, the algorithm stops and output the reduction results red.

Fatigue Characteristic Domain Determination
Commercial pure titanium has already been used for seawater cooling pipes in thermal power plants because of its high corrosion resistance in seawater. Additionally, the advantage of its high specific strength has already been proved on racing yachts, although there is still a lack of research concerning the fatigue strength of titanium fillet welded joints that are indispensable for ship structures.
In this work, a neighborhood decision system (Table 1) of titanium alloy welded joints was established based on the well-documented fatigue data. Forty three (43) sets of titanium alloy fatigue data [25] were cited. The test specimens were manufactured using 2 and 10 mm thick commercial pure titanium mill products (JISH4600 TP340C/H). They were made with their lengths both along and transverse to the plate rolling direction, referred to as L or C, respectively. Furthermore, the butt-welded joints, transverse fillet welded joints or longitudinal fillet welded joints are referred to as B, T or L, respectively. The five-types of welded specimen (LB, CB, LT, CT and LL) were instrumented with strain gauges to determine the nominal stress distribution approaching the weld in the tensile test. Equivalent structural stress range was then calculated according to nodal force-based structural stress principle detailed in Section 2. As could be seen from Table 1, there were five condition features including joint type (J), plate thickness (t), load ratio (r), material name (M) and equivalent structural stress range (∆S s ) and one decision feature {life cycles (N)} in the decision system. Entropy-based neighborhood rough set reduction algorithm was used here for attributes reduction as described in Section 3.2. The attribute reduction result of the decision system of titanium alloy welded joints was {J,∆S s }. As could be seen from the reduction result, joint type (J) is one of the most important influencing factors of the titanium alloy welded joints besides the equivalent structural stress range (∆S s ). This indicates that the joint type influencing factor could not be omitted, or in other words, the joint type should be considered for fatigue life prediction of the titanium alloy welded joints in the nodal force based structural stress method. Fatigue characteristic domain of joint type is thus proposed. All the fatigue test samples are then divided into 5 different domains (S 1 -S 5 in Figure 3) according to different joint type LB, CB, LT, CT and LL.
established based on the well-documented fatigue data. Forty three (43) sets of titanium alloy fatigue data [25] were cited. The test specimens were manufactured using 2 and 10 mm thick commercial pure titanium mill products (JISH4600 TP340C/H). They were made with their lengths both along and transverse to the plate rolling direction, referred to as L or C, respectively. Furthermore, the butt-welded joints, transverse fillet welded joints or longitudinal fillet welded joints are referred to as B, T or L, respectively. The five-types of welded specimen (LB, CB, LT, CT and LL) were instrumented with strain gauges to determine the nominal stress distribution approaching the weld in the tensile test. Equivalent structural stress range was then calculated according to nodal force-based structural stress principle detailed in Section 2. As could be seen from Table 1, there were five condition features including joint type (J), plate thickness (t), load ratio (r), material name (M) and equivalent structural stress range ( s S Δ ) and one decision feature {life cycles (N)} in the decision system. Entropy-based neighborhood rough set reduction algorithm was used here for attributes reduction as described in Section 3.2. The attribute reduction result of the decision system of titanium alloy welded joints was {J, s S Δ }. As could be seen from the reduction result, joint type (J) is one of the most important influencing factors of the titanium alloy welded joints besides the equivalent structural stress range ( s S Δ ). This indicates that the joint type influencing factor could not be omitted, or in other words, the joint type should be considered for fatigue life prediction of the titanium alloy welded joints in the nodal force based structural stress method. Fatigue characteristic domain of joint type is thus proposed. All the fatigue test samples are then divided into 5 different domains (S1-S5 in Figure 3) according to different joint type LB, CB, LT, CT and LL. 10    As can be seen from Figure 3, an "automatic" classification of fatigue specimen samples could be achieved when the fatigue characteristic domains are determined. This classification is based on the theory of knowledge granularity. The most important fatigue life influencing factors of the titanium alloy welded joints are obtained through attribute reduction by using entropy based neighborhood reduction algorithm.

PSO Algorithm
PSO algorithm is an evolutionary computing technology proposed by Eberhart and Kennedy in 1995 [26]. It originates from the study of bird prey behaviour. Similar to other evolutionary algorithms, PSO is a global search algorithm based on iterative updating. In PSO, multiple candidate solutions coexist and cooperate with each other simultaneously. It searches for the optimal solution through cooperation and information sharing among individuals in a group, and the convergence speed of the algorithm is fast. It has been widely used in function optimization, pattern recognition and other fields nowadays.
Suppose in a D-dimension target search space, the particle swarm consist of m particles, the position of the i-th (i = 1, 2, . . . , m) particle is marked as X t i = (X t i1 , X t i2 , . . . , X t iD ) and the velocity is marked as . The optimal personal position that the i-th particle has experienced is marked as P t i = (P t i1 , P t i2 , . . . , P t iD ) while the optimal position that the whole particle swarm has experienced is marked as P t g = (P t g1 , P t g2 , . . . , P t gD ).The position and velocity of each particle in the swarm are updated by Formula (16) [27,28]: where i = 1, 2, . . . , m, ω t = ω max − ω max −ω min t max × t is an inertia weight that controls the diversification feature of the algorithm, c 1 and c 2 are learning factors that control the intensification feature of the algorithm, rand is a random number between (0,1). The velocity of each particle V id in each dimension is usually limited between [−V max , V max ], in order to prevent the particle from being too fast to miss the optimal solution. The algorithm ends when the maximum number of iterations is achieved or the specified accuracy requirements are met.

SVRM Principle
SVRM [29] is a supervised machine learning algorithm based on statistical learning theory with finite samples. In the nonlinear case, the input vector is mapped into a high-dimensional feature vector space by using the kernel function, and the linear inseparable classification problem of the original input space is transformed into the linear separable classification problem in the high-dimensional space, and the optimal classification hyperplaneis constructed in the high-dimensional space. Currently, SVRM has been applied in more and more fields due to its unique superiority [30,31].
The principles of SVRM can be described briefly. For nonlinear regression, a nonlinear function is used to map training data into a high-dimensional feature space and linear regression is carried out in the high-dimensional feature space. The regression estimation function is set as Formula (18), where ω represents the dimension of the feature space, φ(x) is the nonlinear function, b is the threshold. The optimization problem to be solved by SVRM could be described by Equation (19), where c is the penalty coefficient used to control the complexity of the model, training error rate and generalization ability, ξ i , ξ * i are the relaxation factors, ε is the default error limit and m is the number of training samples:  1, 2, . . . , m) The dual form of the optimization problem could be obtained as Equation (20): where a i and a i * are range multiplier, K(x i ·x j ) = φ(x i )·φ(x j ) is the kernel function. The regression estimation function could be expressed as Equation (21): Generally speaking, the most commonly used kernel types in SVRM includes linear, polynomial, radial basis function (RBF), sigmoid and pre-computed kernels. Since only one parameter needs to be determined in RBF and it has global performance similar to other kernel functions, RBF is chosen as the kernel function of SVRM here.

PSO-SVRM Model
There are two parameters to be optimized when using RBF as the kernel function, one is the penalty factor C, the other is the kernel width parameter σ. Both of these parameters have a great impact on the performance of SVRM. The penalty factor C controls the complexity of the SVRM model and the penalty level for the wrong samples. If C is too large, the prediction accuracy of SVRM in training phase is very high, but in testing phase is very low; if C is too small, it is difficult to obtain satisfactory prediction accuracy. The parameter σ reflects the complexity of the distribution of data samples in the high-dimensional feature space, which determines the complexity of the linear classification surface. It has a greater impact on the prediction accuracy of SVRM than the penalty factor C, too large value will lead to an "over-learning" problem, and too small value will lead to an "under-learning" problem. Therefore, in the SVRM training process, it is particularly important to choose the appropriate parameters for the penalty factor C and kernel function width σ. At present, PSO algorithm has been used to optimize the parameters of least squares support vector machines in order to construct an optimal LS-SVM classifies [32] and an improved adaptive particle swarm optimization algorithm has been proposed [33]. In this paper, PSO algorithm is used to automatically select the appropriate parameters of SVRM model and PSO-SVRM model is proposed to obtain the best performance.
The process of optimization of SVRM parameters based on PSO algorithm is shown as Figure 4. As could be seen from Figure 4, at the very beginning of the algorithm, the search range for SVRM model parameters is set. After that, the values of the population size, learning factors (c 1 , c 2 ), weight factor (ω), the maximum number of iterations and the maximum speed value in the PSO algorithm are initialized. Then particle velocity and its position are initialized randomly. The fitness of each particle is calculated and the best individual and global particle position is updated. The velocity and the position of the particle is updated subsequently. Determine whether the given maximum number of iterations have been reached or the accuracy requirements have been achieved, if so, stop optimization, Entropy 2019, 21, 117 9 of 13 return the current optimal SVRM model parameters, otherwise go on with the iteration. A step by step description of the algorithm is shown as Appendix A.2.
Entropy 2018, 20, x FOR PEER REVIEW 9 of 13 particle is calculated and the best individual and global particle position is updated. The velocity and the position of the particle is updated subsequently. Determine whether the given maximum number of iterations have been reached or the accuracy requirements have been achieved, if so, stop optimization, return the current optimal SVRM model parameters, otherwise go on with the iteration. A step by step description of the algorithm is shown as Appendix A2.

Results and Discussions
The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints was established by using the LibSVM toolbox [34]. We set the local search capabilities related parameter c1 = 1.

Results and Discussions
The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints was established by using the LibSVM toolbox [34]. We set the local search capabilities related parameter c 1 = 1.5, the global search capability related parameters c 2 = 1.7, the maximum number of iterations of the algorithm was 2000 times, the maximum population of the particles was 20, the value of ω decreased linearly from 0.9 to 0.4 as the number of iterations increased. The PSO algorithm was used for optimization of the SVRM parameters C and σ. The SVRM model was established based on the best parameter values of C and σ to predict the fatigue life of the titanium alloy welded joints.
Since there is only one fatigue sample in the fatigue characteristic domain1, it is not discussed here. The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints was established in the determined fatigue characteristic domains, from domain2 to domain5, respectively. The fatigue life prediction values of the titanium alloy welded joints by using the PSO-SVRM model were compared with that obtained by using the tradition least square fitting method and the actual fatigue life obtained through fatigue test. The comparison results are shown in Figure 5. The error of fatigue life prediction by using PSO-SVRM model and by using the traditional least square fitting method is shown in Figure 6.
As could be seen from Figure 5 that the blue ' ' indicates the actual fatigue life cycles in the fatigue test. The red ' ' denotes the fatigue life prediction value by using the PSO-SVRM model. The black '*' indicates the fatigue life prediction value by using the least square method. Compared with the least squares method, the prediction results obtained by using the PSO-SVRM model are much closer to the actual fatigue life cycles.
As is shown in Figure 5 the black '*' indicates the fatigue life prediction error of the least squares method and the red ' ' denotes the fatigue life prediction error of the PSO-SVRM model. It could be seen from Figure 5 that the fatigue life prediction error by using PSO-SVRM is less than using the least squares method. The fatigue life prediction results shown in Figures 5 and 6 indicate that when PSO-SVRM model is used in the determined joint type fatigue characteristic domain, smaller prediction errors could be achieved than by using the tradition least squares method. established by using the LibSVM toolbox [34]. We set the local search capabilities related parameter c1 = 1.5, the global search capability related parameters c2 = 1.7, the maximum number of iterations of the algorithm was 2000 times, the maximum population of the particles was 20, the value of ω decreased linearly from 0.9 to 0.4 as the number of iterations increased. The PSO algorithm was used for optimization of the SVRM parameters C and σ . The SVRM model was established based on the best parameter values of C andσ to predict the fatigue life of the titanium alloy welded joints. Since there is only one fatigue sample in the fatigue characteristic domain1, it is not discussed here. The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints was established in the determined fatigue characteristic domains, from domain2 to domain5, respectively. The fatigue life prediction values of the titanium alloy welded joints by using the PSO-SVRM model were compared with that obtained by using the tradition least square fitting method and the actual fatigue life obtained through fatigue test. The comparison results are shown in Figure 5. The error of fatigue life prediction by using PSO-SVRM model and by using the traditional least square fitting method is shown in Figure 6.` As could be seen from Figure 5 that the blue '○' indicates the actual fatigue life cycles in the fatigue test. The red '□' denotes the fatigue life prediction value by using the PSO-SVRM model. The black '*' indicates the fatigue life prediction value by using the least square method. Compared with the least squares method, the prediction results obtained by using the PSO-SVRM model are much closer to the actual fatigue life cycles. (c) (d) Since there is only one fatigue sample in the fatigue characteristic domain1, it is not discussed here. The PSO-SVRM model for fatigue life prediction of titanium alloy welded joints was established in the determined fatigue characteristic domains, from domain2 to domain5, respectively. The fatigue life prediction values of the titanium alloy welded joints by using the PSO-SVRM model were compared with that obtained by using the tradition least square fitting method and the actual fatigue life obtained through fatigue test. The comparison results are shown in Figure 5. The error of fatigue life prediction by using PSO-SVRM model and by using the traditional least square fitting method is shown in Figure 6.` As could be seen from Figure 5 that the blue '○' indicates the actual fatigue life cycles in the fatigue test. The red '□' denotes the fatigue life prediction value by using the PSO-SVRM model. The black '*' indicates the fatigue life prediction value by using the least square method. Compared with the least squares method, the prediction results obtained by using the PSO-SVRM model are much closer to the actual fatigue life cycles.

Conclusions
Entropy-based neighborhood rough set theory is used for analysis of the fatigue life influencing factors of titanium alloy welded joints and the PSO-SVRM model is constructed for fatigue life prediction of the titanium alloy welded joints. The following conclusions can be proposed: (1) The reduction results show that besides the equivalent structural stress range, the joint type influencing factors also play a very important role in determining the fatigue life of titanium alloy welded joints. Future work will be concentrated on other determination methods of fatigue characteristic domain and further validation of the PSO-SVRM model as more fatigue test samples become available.