Industrial Big Data and Computational Sustainability: Multi-Method Comparison Driven by High-Dimensional Data for Improving Reliability and Sustainability of Complex Systems

: Sustainable development is of great significance. The emerging research on data-driven computational sustainability has become an effective way to solve this problem. This paper presents a fault diagnosis and prediction framework for complex systems based on multi-dimensional data and multi-method comparison, aimed at improving the reliability and sustainability of the system by selecting methods with relatively superior performance. This study took the avionics system in the industrial field as an example. Based on the literature research on typical fault modes and fault diagnosis requirements of avionics systems, three popular high-dimensional data-driven fault diagnosis methods—support vector machine, convolutional neural network, and long-and short-term memory neural network—were comprehensively analyzed and compared. Finally, the actual bearing failure data were used for programming in order to verify and compare various methods and the process of selecting the superior method driven by high-dimensional data was fully demonstrated. We attempt to provide a sustainable development idea that continuously explores multi-method integration and comparison, aimed at improving the calculation efficiency and accuracy of reliability assessments, optimizing system performance, and ultimately achieving the goal of long-term improvement of system reliability and sustainability.


Introduction
Sustainable development has become the focus of the international community.However, how to implement technology-driven sustainability from the existing policy-driven approach is the bottleneck of realizing sustainability.In recent years, the emerging research on computational sustainability has become an effective way to solve this problem and thus a new research hotspot.The focus of computational sustainability research is to develop computational models, mathematical models, and related methods to help solve some of the most challenging problems related to sustainable development [1][2][3].The advent of the era of big data brings opportunities for computational sustainability research, as well as new challenges such as complexity of problems, computational efficiency, and scalability of methods.Furthermore, the use of computer and information science technology can improve the necessity and effectiveness of resource management and allocation.Big data contains abundant information and potential knowledge, which provides a new research method driven by data, especially multi-source data.It will greatly improve the accuracy of the method and the efficiency of problem solving.
The development of massive multi-dimensional data and computational sustainability is critical for meeting the challenges of sustainability.It can help people make tradeoffs, familiarize themselves with complex systems, and explain uncertainties.At present, more and more scholars have started to use data mining technology to solve the problem of computational sustainability.Representational research involves many fields such as banking [4,5], healthcare [6,7], meteorology [8][9][10][11], ecological protection [12][13][14], agriculture [15,16], and disaster management [17,18].However, the applications in industry, especially from a system perspective, are relatively rare.In fact, the reliability and sustainability of complex systems in industry, especially complex systems with high-end equipment, is a matter of concern, because it relates to efficiency, cost, resources, energy consumption, human-computer interaction, and many other aspects.The reliability and sustainability of complex systems are also closely related to each other.Continuous improvement of the reliability level is of great benefit to long-term sustainability.A complex system with good sustainability will greatly improve operational efficiency and re-source utilization.
The reliability and sustainability of high-end equipment systems determine the efficiency and costs of equipment operation and affect their products and service quality.Reliability and sustainability assessment of high-end equipment is a complex task, and there is a need to continuously improve the accuracy of this task.This study took the problem of fault diagnosis and reliability evaluation of avionics systems as an example to illustrate the application of big data and computational sustainability in industry.As a key component structure of modern aircraft, its intelligence and synthesis are constantly enhanced with the rapid development of electronic information technology, computer science, large aircraft, and unmanned aerial vehicle industry.The avionics systems are a complex system.There exists a correlation between failures, and fault data present high-dimensional attributes.Studying the degradation mode of the avionics system, especially the fault classification and diagnosis prediction driven by high-dimensional data, is of great significance for improving the reliability and sustainability of the system.
Massive multi-dimensional data, computational sustainability and multi-method comparisons are undervalued in terms of improving the reliability and sustainability of industrial complex systems.In this paper, a fault diagnosis and prediction framework for complex systems based on multi-dimensional data and multi-method comparison is proposed.We comprehensively analyzed and compared the popular fault diagnosis methods driven by high-dimensional data and analyzed the performance of each method.According to the characteristics of systems and data sets, methods with relatively superior performance were selected.To improve the reliability and sustainability of complex systems, it is necessary to continuously evaluate and select appropriate data-driven methods.On the one hand, it can improve the accuracy and efficiency of evaluation; on the other hand, it can save time, resources, and costs for the entire system.On the issue of improving the reliability and sustainability of industrial complex systems, we expect to propose a new approach that matches the characteristics of industrial complex systems and performs better than the traditional and isolated methods by using industrial big data, machine learning, and multi-method comparisons, rather than chasing hot spots and trends out of touch with reality.

Fault Diagnosis and Reliability Evaluation of Complex System
With the development of science and technology and information technology, the precision and complexity of high-end equipment are constantly improving, and the reliability level of complex systems is also required to be higher.Compared to traditional mechanical components, the avionics system has not only a complex hierarchical structure but also numerous software and hardware devices.Moreover, the reliability of the avionics system is also interfered by various external environmental factors such as external temperature, continuous propulsion, electromagnetic interference, pollution, and vibration.Therefore, the faults are not isolated, but rather interrelated to each other.Moreover, they are affected by each other because of the complex hierarchical structure.There are many ways to evaluate the reliability of the avionics system, such as analysis of the mechanism characteristics of massive and high-dimensional fault data in the big data environment, analysis of the modeling of correlation between faults, analysis of multi-fault mode modeling, evaluation of comprehensive reliability, and demonstration platform of fault diagnosis.
In recent years, aircraft fault informatics has made great progress because of a rapid acceleration in the level of computer, modern monitoring technology, and signal processing technology, thus improving the diagnosis and positioning levels of aircraft faults.For example, large civil aircraft (such as A320, A340, and BoeNe777) use a central maintenance computer system to monitor the flight status of aircraft without intervals, record faults, and problems during the flight process, and send alarm signals to the computer [19].For multi-fault diagnosis, great achievements have been made in recent years.Wang et al. studied the relationship between the factors causing problems in aircraft electronic equipment and the external environment [20].Chunfei et al. established an engine fault diagnosis method using the neural network to study a turbofan engine, which can obviously remove the influence of noise in the signal, thus helping in obtaining correct analysis results of engine performance [21].Furthermore, from the perspective of probability statistics, fault diagnosis of aircraft communication equipment was carried out by Petrov and Marinov [22].Combining rule-based reasoning with fuzzy modeling, Yang proposed an overall intelligent fault diagnosis method for avionics systems [23].Using fuzzy modeling, Zhou and Cao realized fault classification diagnosis [24].Jia et al. considered aircraft as engineering systems and concluded that they had different energies under different states (failure or normal) [25].Then, a multi-fault diagnosis model was established according to the law of conservation of energy.Multi-fault classification diagnosis has been implemented with several different neural network models, such as back propagation (BP) and self-organizing maps (SOM), by Fan [26].Using the theory of rough neural network, Du proposed a new fault diagnosis method for avionics systems, but with higher reliability [27].Similarly, Xu put forward a new fault classification diagnosis theory for avionics systems on the basis of the Lyapunov index spectrum and Lyapunov index spectral entropy [28].Moreover, Shi proposed a rule-based fault diagnosis method [29].From the perspective of the physical model, Baig proposed a multi-fault diagnosis expert system [30].Palmer suggested that data/information collection and processing as well as database technology played an important role in aircraft fault diagnosis [31].Kordestani proposed an integrated system to classify fault localization [32].Likewise, a multi-fault diagnosis expert system was proposed by Anami, using a dynamic fault tree method with continuous optimization [33].Multi-fault diagnosis methods for the aircraft mentioned above can usually meet traditional requirements.However, the definition of fault mechanism and transmission route of the system is not comprehensive, so faults with high complexity and multiple sources of avionics systems cannot be comprehensively and effectively diagnosed.At the same time, the above methods can only diagnose faults, and they lack reliable evaluation of the system.There are many research directions in the classical system reliability theory, including the modeling and analysis of failure modes, system reliability, reliable physical model, reliability design, maintainability, maintainability design, distribution pattern recognition of failure and maintenance, reliability evaluation and application, and reliability and maintenance implementation [34][35][36][37].Existing evaluation for system reliability is usually completed by establishing a reliability model, for which the modeling and analysis of failure modes of the system should be done first, such as failure distribution model, constant failure rate model, and failure model related to time.No matter from external forms or internal logical mechanisms, reliability evaluation of the avionics system is completely different from the traditional single part (equipment) or the mechanical manufacture system composed of a set of parts (equipment).It is impossible to use the existing mathematical statistical model of reliability and the highly simplified and abstract system analysis method.The existing analysis and evaluation methods for system reliability include the reliability block diagram method, failure tree method, Markov model, Petri net method, and goal-oriented (GO) method.However, in evaluating the avionics system of an unmanned aerial vehicle (UAV), the disadvantages of these methods are as follows: 1.
Most of the methods can only be used for static modeling, and they do not reflect the relatively complex dynamic temporal relationship of the system [22][23][24]26].2.
As the system states increase, there would appear an exponential explosion of state space for the reliability analysis model, resulting in great difficulty in model construction and solution [21,23,24,26,27,29,30,32,33].
It is difficult for the existing reliability analysis model of the system to describe system uncertainty as well as for some unconventional influence factors (such as environment and human factors) to be described in the model [20,21,[24][25][26][27][28]30,32,33].
In a word, there are many methods for fault diagnosis and prediction of complex systems, and each method is not perfect.Considering the need to continuously improve the reliability and sustainability of systems, it is necessary to deal with such problems with the idea of sustainable development.That is, to continuously explore multi-method integration and comparison, aiming at improving the calculation efficiency and accuracy of reliability assessments, optimizing system performance, and ultimately achieving the goal of long-term improvement of system reliability and sustainability.Three popular high-dimensional data-driven fault diagnosis methods-support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network-were chosen in this study to demonstrate the process of selecting the superior method.The actual bearing failure data, provided by the Case Western Reserve University Bearing Data Center which is located in Cleveland, OH, USA, have been used for programming in order to verify various methods.

Support Vector Machine
On the basis of the statistical learning theory, SVMs are becoming increasingly popular in machine learning activities, including classification, regression, and outlier detection.Generally, the common binary classification problems are solved by the basic and traditional SVM at the beginning of this theory.

SVM Theory
The core principle of this SVM is how to transform data from the originally low-dimensional space to the feature space with a higher dimension, in which a so-called optimal hyperplane can be found to maximize the margin between the two classes.The methodology, algorithm, and software of this SVM have been widely applied [38,39].From the start, SVM is two dimensional and develops from an optimal classification plane problem on the basis of linear separability.Its central idea can be illustrated by the two-dimensional situation shown in Figure 1 [40].It displays two different kinds of data: Class A (circular) and Class B (pentagonal) points.The SVM tries to place a linear boundary H between the two classes, aiming to maximize the classification interval (margin); namely, the distance between the boundaries of the two classes H1 and H2 (H1, H2, and H being parallel to each other) is maximum.Moreover, the data points on the boundaries of H1 and H2, called support vectors, are used to define the boundaries.
Sustainability 2019, 11, x FOR PEER REVIEW 4 of 17 3.With a larger system scale, it is difficult to establish a model for hierarchical requirements of the system [21,[23][24][25][26][27][29][30][31]33].4. It is difficult for the existing reliability analysis model of the system to describe system uncertainty as well as for some unconventional influence factors (such as environment and human factors) to be described in the model [20,21,[24][25][26][27][28]30,32,33].In a word, there are many methods for fault diagnosis and prediction of complex systems, and each method is not perfect.Considering the need to continuously improve the reliability and sustainability of systems, it is necessary to deal with such problems with the idea of sustainable development.That is, to continuously explore multi-method integration and comparison, aiming at improving the calculation efficiency and accuracy of reliability assessments, optimizing system performance, and ultimately achieving the goal of long-term improvement of system reliability and sustainability.Three popular high-dimensional data-driven fault diagnosis methods-support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network-were chosen in this study to demonstrate the process of selecting the superior method.The actual bearing failure data, provided by the Case Western Reserve University Bearing Data Center which is located in Cleveland, OH, USA, have been used for programming in order to verify various methods.

Support Vector Machine
On the basis of the statistical learning theory, SVMs are becoming increasingly popular in machine learning activities, including classification, regression, and outlier detection.Generally, the common binary classification problems are solved by the basic and traditional SVM at the beginning of this theory.

SVM Theory
The core principle of this SVM is how to transform data from the originally low-dimensional space to the feature space with a higher dimension, in which a so-called optimal hyperplane can be found to maximize the margin between the two classes.The methodology, algorithm, and software of this SVM have been widely applied [38,39].From the start, SVM is two dimensional and develops from an optimal classification plane problem on the basis of linear separability.Its central idea can be illustrated by the two-dimensional situation shown in Figure 1 [40].It displays two different kinds of data: Class A (circular) and Class B (pentagonal) points.The SVM tries to place a linear boundary H between the two classes, aiming to maximize the classification interval (margin); namely, the distance between the boundaries of the two classes H1 and H2 (H1, H2, and H being parallel to each other) is maximum.Moreover, the data points on the boundaries of H1 and H2, called support vectors, are used to define the boundaries.

Multiclassification SVM
The SVM was originally designed for binary classification.In general, there was more than one fault condition in addition to the health condition.Therefore, the binary classifier was not suitable for fault diagnosis, which made it necessary to develop a method to deal with multiclassification problems.A brief introduction of the multiclassification SVM is as follows.
Multiclassification can be obtained by combining binary classifications.If there is a classifiable class k (k > 2), it can be separated by binary classification.On the contrary, if any two classes are separable in a class k event, class k must be separable.Therefore, a multiclassifier can be constructed by combining multiple binary classifiers.Several methods have been proposed now, such as one-to-one, one-to-all, and directed acyclic graph multiclass SVM [41,42].By comparing these methods, Hsu and Lin indicated that the one-to-one method was more suitable for practical application than other methods [35].
For an event or data set containing k categories, the one-to-one method is used to construct M (M = k (k − 1)/2) classifiers, each of which contains two types of data training events or data sets.
For the training data from class i to class j, the following binary classification problems need to be solved: Similar to the basic and traditional SVM, K(x n ) represents the classifier of the kernel function; (x n , y n ) is the ith or jth training sample, where ω ∈ R n and b ∈ R are the weight coefficient and intercept parameter; ξ ij n is the relaxation variable; and C is the parameter for the penalty term.The specific determination methods for the weight coefficients, relaxation variable, and penalty function can be found in literature [35].After M classifiers are constructed, a variety of methods are applied in the subsequent test stage, usually using the following voting decision method: While determining whether a sample x belongs to class i or class j, if the classifier determines that the sample belongs to class i, the value of the corresponding voting function V(x i ) should be increased by 1.On the contrary, the value of voting function V x j should be increased by 1. Finally, sample x is categorized as one class in which V is the largest, and this voting decision is also called a "max wins" decision.Please refer to [43] for more detailed introduction to an SVM.

Convolutional Neural Network
Computer vision (CV) is an important direction for in-depth learning practice.At present, CV problems are mainly divided into three categories: (1) image classification, (2) object detection, and (3) neural style transfer.If a traditional neural network is used to deal with the three categories of CV problems, it would be very difficult to solve the problem of the high dimension of the input layer.For a 1000×1000-pixel picture with three color channels, the input layer of the neural network would have a dimension of more than three million, which would result in a sharp increase in the network weight, W, as well as two unexpected consequences.On the one hand, the neural network structure would become complex, which would lead to overfitting as the amount of data is less than that of the network structure, but on the other, the higher requirement for the hardware environment is needed, and the required memory and computation are larger.These can be better solved by CNN.

Structure of CNN
The CNN is also a multistage neural network composed of multiple screening stages and a classification stage.The purpose of the screening stage is to extract features from the convolution layer and pooling layer.The classification stage closely following the screening stage is actually a multilayer perceptron composed of several fully connected layers.To sum up, taking input and output into account, the CNN contains five layers: (1) input layer, (2) convolution layer, (3) pooling layer, (4) full connection layer, and (5) output layer.The function of each type of layer is described as follows: The convolution layer, firstly, uses a filter (the size depends on the dimension of the input data) to convolve the local area of the input layer and then forms a convolution activation unit containing features extracted from the original data.The same convolution kernel (also known as weight sharing) is used by each filter to extract local features of the local area of the input layer.A filter corresponds to a frame in the next layer, and the number of frames is called the depth of the layer.K l i and b l i represent the weight and deviation of the ith convolution kernel in the lth layer, respectively, and X l ( j) refers to the jth local area of the first layer.The convolution process is described as follows: where * represents convolution, and it is used to calculate the dot product of the convolution kernel and the local area of the input layer.y (l + 1) i ( j) represents the input of the jth neuron in the ith frame of the (l + 1)th layer.After the convolution operation, the rectified linear units (ReLUs) activation function is introduced to accelerate the convergence of the neural network.The introduction, types, advantages, and disadvantages of this activation function are covered in Section 3.2.2.
In the architecture of CNN, the pooling layer is usually connected with the convolution layer.The pooling layer plays a role in reducing the sampling operation as well as reducing the spatial size of features and network parameters.Furthermore, it is divided into two types: (1) max pooling and (2) average pooling.Max pooling is more frequently used than average pooling.It can conduct a local maximum operation on the input features, reduce the parameters, and obtain the feature of the same position.The transformation of max pooling is described as follows: where q l i (t) refers to the value of the tth neuron in the ith frame of the first layer, t ∈ [( j − 1)W + 1, jW], where W is the width of the pooling region.p ( ) refers to the jth local area of the first layer.The convolution process is described as follows: where * represents convolution, and it is used to calculate the dot product of the convolution kernel and the local area of the input layer.
( ) ( ) represents the input of the jth neuron in the ith frame of the (l + 1)th layer.After the convolution operation, the rectified linear units (ReLUs) activation function is introduced to accelerate the convergence of the neural network.The introduction, types, advantages, and disadvantages of this activation function are covered in Section 3.2.2.
In the architecture of CNN, the pooling layer is usually connected with the convolution layer.The pooling layer plays a role in reducing the sampling operation as well as reducing the spatial size of features and network parameters.Furthermore, it is divided into two types: (1) max pooling and (2) average pooling.Max pooling is more frequently used than average pooling.It can conduct a local maximum operation on the input features, reduce the parameters, and obtain the feature of the same position.The transformation of max pooling is described as follows: where ( ) refers to the value of the tth neuron in the ith frame of the first layer, ∈ [( − 1) + 1, ], where W is the width of the pooling region.
where z j represents the probability of the jth output neuron.Similar to multi-layer perceptron (MLP), the CNN can be trained.The loss function of the CNN model is the cross-entropy between the estimated Softmax output probability distribution and the target class probability distribution.Let p(x) represent the target distribution and q(x) represent the estimated distribution; then, the cross-entropy between p(x) and q(x) is In order to minimize the loss function, the Adam stochastic optimization algorithm is used for training the CNN model.It is suitable for big data or a multiparameter model, with the advantages of direct implementation, higher calculation efficiency, small storage space requirement, and so on.For more details about the Adam stochastic optimization algorithm, the reader can refer to [40].Dropout is applied in the training as one of the effective methods to control overfitting.

Activation Function
At present, there are mainly three frequently used activation functions: (1) sigmoid function, (2) tanh function, and ReLU function.The sigmoid function can precisely control the input to the real value range (0, 1).Particularly, if the input is a very large negative number, then the output is 0. On the contrary, if the input is a very large positive number, then the output is 1.The sigmoid function has been widely used, but fewer people have used it recently.The main reasons are as follows: 1.
When the input is slightly away from the origin of coordinates, the gradient of the function becomes smaller-almost zero.During back propagation of neural network, the differential of each weight ω is calculated by the chain rule of differential.As back propagation passes through the sigmoid function, the differential on the chain is very small.Further, back propagation might pass through many sigmoid functions, finally resulting in little influence of weight ω on the loss function, which goes against weight optimization.This problem is called gradient saturation or gradient diffusion.

2.
If the function output is not centered on 0, the weight updating efficiency would decrease.

3.
The sigmoid function is applied in exponential operation, which is relatively slow for the computer.
Tanh function is a hyperbolic tangent function and its curve is similar to that of the sigmoid function.The same point is, as the inputs of the two functions are large or small, the outputs are almost smooth, and their gradients are very small, which is harmful to weight update.The difference is the output interval; the output interval of tanh is (−1, 1); and the whole function is centered on 0, which is better than the sigmoid function.
Compared to the first two activation functions, the ReLU function is one of the most popular activation functions and has the following advantages: There is no gradient saturation when the input is positive.The calculation speed is much faster.Since the ReLU function only has a linear relationship, it is much faster than the sigmoid and tanh functions in both forward and back propagation.Correspondingly, the ReLU function also has some disadvantages.When the input is negative, the ReLU function is not activated completely, which suggests that once the input is negative, it would die.During forward propagation, some areas are sensitive, while some are not, so the ReLU function would not die.However, when it comes to back propagation, if the input is negative, the gradient would exactly go to zero, which encounters the same problem as the sigmoid and tanh functions.The ReLU function is also not centered on 0. Please refer to [44] for more detailed introduction to the CNN.

Long-and Short-Term Memory Neural Network Model
As for the recurrent neural network (RNN), because the neurons have the looping structure, the characteristics of the previous state can be reserved.The structure of the RNN is shown in Figure 3.If the network input is a time series, it can be transformed into standard neurons connecting with each other.negative, the ReLU function is not activated completely, which suggests that once the input is negative, it would die.During forward propagation, some areas are sensitive, while some are not, so the ReLU function would not die.However, when it comes to back propagation, if the input is negative, the gradient would exactly go to zero, which encounters the same problem as the sigmoid and tanh functions.The ReLU function is also not centered on 0. Please refer to [44] for more detailed introduction to the CNN.

Long-and Short-Term Memory Neural Network Model
As for the recurrent neural network (RNN), because the neurons have the looping structure, the characteristics of the previous state can be reserved.The structure of the RNN is shown in Figure 3.If the network input is a time series, it can be transformed into standard neurons connecting with each other.As shown in Figure 3, each node represents a neuron at a time point in the time series.U represents the weight between the input layer and the hidden layer; W, the weight of circulating from the hidden layer to itself; and V, the weight between the hidden layer and the output layer.Furthermore, the weight coefficients are the same for each time series.The traditional RNN would lose certain information after each feedback because of its own structure, which means the original information would be lost and degraded, which is the so-called gradient disappearance.In order to solve the gradient disappearance problem, the gated recurrent unit (GRU) neural network is adopted.It is necessary to improve the neural network unit (the hidden layer of the RNN) and the memory unit should be added to form the GRU structure.However, compared to GRU, the LSTM neural network is a powerful solution to the gradient disappearance problem.As one of the RNN, the LSTM neural network can solve the gradient disappearance problem of the traditional RNN, so it can learn and train the relationship with long-term information transmission.In terms of the whole structure, the LSTM neural network is similar to the traditional RNN, including the input layer, hidden layer, and output layer.Unlike the GRU, it adds multiple control gates (memory modules) on the basis of the neuron structure of the traditional hidden layer in order to solve the gradient disappearance problem, thus realizing long-term memory and information transmission.Specific introduction of the LSTM neural network can be referred to in the literature [45].

Introduction of Data Source
In this study, the rolling bearing vibration signal data have been collected from the Case Western Reserve University Bearing Data Center for fault classification (see Supplementary Materials).The specific structure of the test platform includes four parts: the motor is on the left, the torque sensor is in the middle, the power meter is on the right, and an electronic control device that cannot be seen.On the basis of the horsepower (1 hp = 746 W), the motor has four different working conditionsnamely, 1 hp, 2 hp, 3 hp, and 4 hp.The bearing to be tested is used to support the motor shaft in the As shown in Figure 3, each node represents a neuron at a time point in the time series.U represents the weight between the input layer and the hidden layer; W, the weight of circulating from the hidden layer to itself; and V, the weight between the hidden layer and the output layer.Furthermore, the weight coefficients are the same for each time series.The traditional RNN would lose certain information after each feedback because of its own structure, which means the original information would be lost and degraded, which is the so-called gradient disappearance.In order to solve the gradient disappearance problem, the gated recurrent unit (GRU) neural network is adopted.It is necessary to improve the neural network unit (the hidden layer of the RNN) and the memory unit should be added to form the GRU structure.However, compared to GRU, the LSTM neural network is a powerful solution to the gradient disappearance problem.As one of the RNN, the LSTM neural network can solve the gradient disappearance problem of the traditional RNN, so it can learn and train the relationship with long-term information transmission.In terms of the whole structure, the LSTM neural network is similar to the traditional RNN, including the input layer, hidden layer, and output layer.Unlike the GRU, it adds multiple control gates (memory modules) on the basis of the neuron structure of the traditional hidden layer in order to solve the gradient disappearance problem, thus realizing long-term memory and information transmission.Specific introduction of the LSTM neural network can be referred to in the literature [45].

Introduction of Data Source
In this study, the rolling bearing vibration signal data have been collected from the Case Western Reserve University Bearing Data Center for fault classification (see Supplementary Materials).The specific structure of the test platform includes four parts: the motor is on the left, the torque sensor is in the middle, the power meter is on the right, and an electronic control device that cannot be seen.On the basis of the horsepower (1 hp = 746 W), the motor has four different working conditions-namely, 1 hp, 2 hp, 3 hp, and 4 hp.The bearing to be tested is used to support the motor shaft in the test platform.Before the operation of the whole mechanical system, a single-point fault is arranged on the bearing (adopting the spark erosion technology), with the fault diameters of 0.007, 0.014, and 0.021 in., respectively.There are four different kinds of bearing states: (1) normal, (2) inner ring fault, (3) ball fault, and (4) outer ring fault.The vibration data generated by bearing operation under these four different fault states would be different; each fault state also includes three fault diameters, 0.007, 0.014, and 0.021 in., respectively.Therefore, there are a total of 10 fault states (including normal state) corresponding to 1 hp.

Data Processing
In this study, the data are the time series data of 480,000 * 1 fault under 3 hp.The data processing method is as follows: Since the upper limit of the data to be read is limited by the learner, the original vibration data are segmented with a step length of 2000 to obtain the vibration data of 240 * 2000 in each state.Since the bearing contains 10 fault states, the data set (including labels) of 2400 * 2000 is finally obtained.

Parameter Setting
In order to compare the results of the SVM, CNN, and LSTM, it is necessary to set the same parameters for all of them.The specific setting is as follows: The sample size is 8000, and the proportion of the training, validation, and test sample is 7:2:1, batch = 200.The Adam stochastic optimization algorithm is selected for the optimization algorithm.The learning rate is set as 0.06; the ReLU function is selected as the activation function.The Softmax classifier is adopted.The number of hidden layers is 2, as well as stride = 1, padding = same, and dropout = 1.

SVM
The parameters of the SVM are as follows: (1) C, penalty coefficient; (2) gamma, coefficient of the kernel function; (3) kernel, kernel function.We used rbf as the kernel function.Grid search with cross-validation (GridSearchCV) was used to select penalty coefficient parameters.The following figure shows the performance of the learner when C takes different values.Macro avg was used as the evaluation function.The value of C was obtained by three-fold cross-validation.
It can be seen that when C exceeds 32, the accuracy of the learner on the training set and test set remains basically invariant (Figure 4).Considering that the higher the value of C, the higher the training cost, therefore we set the parameter C = 32 on the premise that the performance of the learner was relatively stable.
figure shows the performance of the learner when C takes different values.Macro avg was used as the evaluation function.The value of C was obtained by three-fold cross-validation.
It can be seen that when C exceeds 32, the accuracy of the learner on the training set and test set remains basically invariant (Figure 4).Considering that the higher the value of C, the higher the training cost, therefore we set the parameter C = 32 on the premise that the performance of the learner was relatively stable.Then we randomly split the data to a training set and a test set 10 times to improve the reliability of the results.Results are shown in Table 1 below.When choosing the parameter C of SVM, we used GridSearchCV, which combines grid search with three-fold cross-validation.In the SVM, CNN, and LSTM, 10-fold cross-validation was used to validate the learner.The training set and validation set were put into the learner according to the 10-fold cross strategy to ensure that the actual training data amount reached 90% of the original data set, and then the test set (10%) was used to test the learner.We also trained many times to improve the reliability of the results.The results of the CNN and LSTM are as follows (Tables 2 and 3).Then we randomly split the data to a training set and a test set 10 times to improve the reliability of the results.Results are shown in Table 1 below.When choosing the parameter C of SVM, we used GridSearchCV, which combines grid search with three-fold cross-validation.In the SVM, CNN, and LSTM, 10-fold cross-validation was used to validate the learner.The training set and validation set were put into the learner according to the 10fold cross strategy to ensure that the actual training data amount reached 90% of the original data set, and then the test set (10%) was used to test the learner.We also trained many times to improve the reliability of the results.The results of the CNN and LSTM are as follows (Table 2 and Table 3).

Times
Accuracy F1-Score Loss   After finishing the classification for the SVM, CNN, and LSTM, the typical indexes are selected to compare the results, as shown in Figure 5.It was found that the neural network method is better than the SVM in classification accuracy.The reason is that when the sample size is sufficient, the neural network method can show a greater superiority than the SVM.In addition, compared to LSTM, the CNN was promoted in the accuracy of the training set and test set.Therefore, it is concluded that when the sample size is sufficient, the classification by the neural network method is more accurate than the SVM in the field of classification and fault diagnosis for the time series data, and the CNN has greater superiority than LSTM.It is worth mentioning that LSTM was difficult to train.The following table shows the timing performance of the three chosen methods (Table 4).Our hardware environment and platform settings are as follows: After finishing the classification for the SVM, CNN, and LSTM, the typical indexes are selected to compare the results, as shown in Figure 5.It was found that the neural network method is better than the SVM in classification accuracy.The reason is that when the sample size is sufficient, the neural network method can show a greater superiority than the SVM.In addition, compared to LSTM, the CNN was promoted in the accuracy of the training set and test set.Therefore, it is concluded that when the sample size is sufficient, the classification by the neural network method is more accurate than the SVM in the field of classification and fault diagnosis for the time series data, and the CNN has greater superiority than LSTM.It is worth mentioning that LSTM was difficult to train.The following table shows the timing performance of the three chosen methods (Table 4).Our hardware environment and platform settings are as follows: After finishing the classification for the SVM, CNN, and LSTM, the typical indexes are selected to compare the results, as shown in Figure 5.
It was found that the neural network method is better than the SVM in classification accuracy.The reason is that when the sample size is sufficient, the neural network method can show a greater superiority than the SVM.In addition, compared to LSTM, the CNN was promoted in the accuracy of the training set and test set.Therefore, it is concluded that when the sample size is sufficient, the classification by the neural network method is more accurate than the SVM in the field of classification and fault diagnosis for the time series data, and the CNN has greater superiority than LSTM.It is worth mentioning that LSTM was difficult to train.The following table shows the timing performance of the three chosen methods (Table 4).Our hardware environment and platform settings are as follows: In order to see if the out-performances was by chance, we used the random forest method to test whether the results were accidental.The table below shows the results of random forest (Table 5).
The random forest method performed well on the training set, and the training time was very short.However, the method is not very effective on the test set.The cause may be because the large number of sample features leads to the poor distribution of feature weights in decision trees in the random forest.The following is the structure diagram of a single decision tree in random forest (Figure 6).Testing results show that the out-performances of the three chosen methods are relatively accurate.After finishing the classification for the SVM, CNN, and LSTM, the typical indexes are selected to compare the results, as shown in Figure 5.It was found that the neural network method is better than the SVM in classification accuracy.The reason is that when the sample size is sufficient, the neural network method can show a greater superiority than the SVM.In addition, compared to LSTM, the CNN was promoted in the accuracy of the training set and test set.Therefore, it is concluded that when the sample size is sufficient, the classification by the neural network method is more accurate than the SVM in the field of classification and fault diagnosis for the time series data, and the CNN has greater superiority than LSTM.It is worth mentioning that LSTM was difficult to train.The following table shows the timing performance of the three chosen methods (Table 4).Our hardware environment and platform settings are as follows:   In order to see if the out-performances was by chance, we used the random forest method to test whether the results were accidental.The table below shows the results of random forest (Table 5).The random forest method performed well on the training set, and the training time was very short.However, the method is not very effective on the test set.The cause may be because the large number of sample features leads to the poor distribution of feature weights in decision trees in the random forest.The following is the structure diagram of a single decision tree in random forest (Figure 6).Testing results show that the out-performances of the three chosen methods are relatively accurate.

Conclusions
The reliability and sustainability of complex systems in industry, especially complex systems with high-end equipment, is a matter of concern, because it relates to efficiency, cost, resources, energy consumption, human-computer interaction, and many other aspects.Using large data, sustainability calculation, and multi-method comparison can improve the cost-benefit of complex systems and minimize the risk of failure while avoiding waste and redundancy.A more reasonable selection of fault diagnosis methods will greatly reduce the overall demand for human resources, energy and time, and help to improve efficiency and accuracy.This paper presents a fault diagnosis and prediction framework for complex systems based on multi-dimensional data and multi-method comparison, aimed at improving the reliability and sustainability of the system by selecting methods with relatively superior performance.Three methods-SVM, CNN, and LSTM-were used for programming, then their results were compared, thus drawing the following conclusions: 1.The new method is better than the traditional and single statistical analysis method.2. For the classification of the time series fault data, the accuracy of the neural network is higher

Conclusions
reliability and sustainability of complex systems in industry, especially complex systems with high-end equipment, is a matter of concern, because it relates to efficiency, cost, resources, energy consumption, human-computer interaction, and many other aspects.Using large data, sustainability calculation, and multi-method comparison can improve the cost-benefit of complex systems and minimize the risk of failure while avoiding waste and redundancy.A more reasonable selection of fault diagnosis methods will greatly reduce the overall demand for human resources, energy and time, and help to improve efficiency and accuracy.This paper presents a fault diagnosis and prediction framework for complex systems based on multi-dimensional data and multi-method comparison, aimed at improving the reliability and sustainability of the system by selecting methods with relatively superior performance.Three methods-SVM, CNN, and LSTM-were used for programming, then their results were compared, thus drawing the following conclusions: 1.
The new method is better than the traditional and single statistical analysis method.2.
For the classification of the time series fault data, the accuracy of the neural network is higher than that of the SVM.

3.
The CNN and LSTM both performed well.The CNN has slight superiority than LSTM regarding accuracy.More than that, LSTM is much more difficult to train.It takes much more time and requires higher equipment conditions.Generally, the CNN has greater advantages in the classification of the time series fault data than LSTM.
Through this method and train of thought, we can choose the most effective fault diagnosis and prediction method according to the characteristics of different systems and data sets, so as to improve the reliability and sustainability of complex systems.
In addition to these findings, there are some problems to be further studied.There is no doubt that model-based reasoning and data-supported reasoning are different.With the development of big data technology and the shift of projects from simulation to the physical world, data will play an increasingly important role, especially in reliability and sustainability applications.In this paper, a framework of production operation using industrial big data and sustainable development guiding ideology in the industrial field is proposed, which aims to illustrate that industrial big data and computational sustainability are of great benefit in improving the reliability and sustainability of complex systems.Due to the variety and rapid development of data mining and machine learning methods, the method selection in this paper is representative but also has limitations.The machine learning methods for fault diagnosis are still not comprehensive enough, and some methods such as other neural networks, integrated learning, and random forest should be further developed to compare the comprehensive effect for fault classification.Follow-up research can further broaden and supplement algorithms with better performance according to the trends and hotspots of sustainable computing.

Figure 1 .
Figure 1.The principle of a support vector machine.Figure 1.The principle of a support vector machine.

Figure 1 .
Figure 1.The principle of a support vector machine.Figure 1.The principle of a support vector machine.
(l+1) i ( j) represents the value of the neuron in the (l + 1)th layer of the pooling operation.The CNN model structure used in this study is shown in Figure2.Sustainability 2019, 11, x FOR PEER REVIEW 6 of 17 layer and pooling layer.The classification stage closely following the screening stage is actually a multilayer perceptron composed of several fully connected layers.To sum up, taking input and output into account, the CNN contains five layers: (1) input layer, (2) convolution layer, (3) pooling layer, (4) full connection layer, and (5) output layer.The function of each type of layer is described as follows: The convolution layer, firstly, uses a filter (the size depends on the dimension of the input data) to convolve the local area of the input layer and then forms a convolution activation unit containing features extracted from the original data.The same convolution kernel (also known as weight sharing) is used by each filter to extract local features of the local area of the input layer.A filter corresponds to a frame in the next layer, and the number of frames is called the depth of the layer.and represent the weight and deviation of the ith convolution kernel in the lth layer, respectively, and represents the value of the neuron in the (l + 1)th layer of the pooling operation.The CNN model structure used in this study is shown in Figure2.

Figure 3 .
Figure 3.The neuron structure of the recurrent neural network.

Figure 3 .
Figure 3.The neuron structure of the recurrent neural network.

Figure 5 .
Figure 5.Comparison for the classification effect of the support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network.

Figure 5 .
Figure 5.Comparison for the classification effect of the support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network.

Figure 5 .
Figure 5.Comparison for the classification effect of the support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network.

Figure 5 .
Figure 5.Comparison for the classification effect of the support vector machine (SVM), convolutional neural network (CNN), and long-and short-term memory (LSTM) neural network.

Figure 6 .
Figure 6.Structure diagram of a single decision tree in random forest.

Figure 6 .
Figure 6.Structure diagram of a single decision tree in random forest.

Table 1 .
Classification effect of the support vector machine (SVM).

Table 1 .
Classification effect of the support vector machine (SVM).

Table 3 .
Classification effect of long-and short-term memory (LSTM) neural network.

Table 3 .
Classification effect of long-and short-term memory (LSTM) neural network.

Table 4 .
Training time of SVM, CNN, and LSTM.

Table 5 .
Classification effect of random forest.

Table 4 .
Training time of SVM, CNN, and LSTM.

Table 5 .
Classification effect of random forest.