Elevator Fault Detection Using Proﬁle Extraction and Deep Autoencoder Feature Extraction for Acceleration and Magnetic Signals

: In this paper, we propose a new algorithm for data extraction from time-series data, and furthermore automatic calculation of highly informative deep features to be used in fault detection. In data extraction, elevator start and stop events are extracted from sensor data including both acceleration and magnetic signals. In addition, a generic deep autoencoder model is also developed for automated feature extraction from the extracted proﬁles. After this, extracted deep features are classiﬁed with random forest algorithm for fault detection. Sensor data are labelled as healthy and faulty based on the maintenance actions recorded. The remaining healthy data are used for validation of the model to prove its efﬁcacy in terms of avoiding false positives. We have achieved above 90% accuracy in fault detection along with avoiding false positives based on new extracted deep features, which outperforms results using existing features. Existing features are also classiﬁed with random forest to compare results. Our developed algorithm provides better results due to the new deep features extracted from the dataset when compared to existing features. This research will help various predictive maintenance systems to detect false alarms, which will in turn reduce unnecessary visits of service technicians to installation sites.


Introduction
In recent years, elevator systems have been used increasingly extensively in apartments, commercial facilities, and office buildings. Presently 54% of the world's population lives in urban areas [1]. Therefore, elevator systems need proper maintenance and safety. The next step for improving the safety of elevator systems is the development of predictive and pre-emptive maintenance strategies, which will also reduce repair costs and increase the lifetime while maximizing the uptime of the system [2,3]. Elevator production and service companies are now opting for a predictive maintenance policy to provide better service to customers. They are remotely monitoring faults in elevators and estimating the remaining lifetime of the components responsible for faults. Elevator systems require fault detection and diagnosis for healthy operation [4].
Fault diagnosis methods based on deep neural networks [5][6][7] and convolutional neural networks [8,9] feature extraction methodology are presented as state of the art for rotatory machines similar to elevator systems. Support vector machines [10] and extreme learning machines [11] are also used as fault detection methods for rotatory machines. However, we have developed an intelligent deep autoencoder random forest-based feature extraction methodology for fault detection in elevator systems to improve the performance of traditional fault diagnosis methods.
Profile extraction for health monitoring is a major issue in automated industrial applications such as elevator systems, computer numerical control, machinery, and robotics [12]. Although rotating machine have been running for decades, but profile extraction and processing methods are not widely available [13]. Profile extraction methods have applied in electric vehicles [14], computer numerical control systems [15] and horizontal planes [16]. Kalman filter [17] is one of the methods being used for profile extraction. However, we have developed an off-line profile extraction algorithm based on low-pass filtering and peak detection to extract elevator start and stop events from sensor data including both acceleration and magnetic signals.
In the last decade, neural networks [18] have extracted highly meaningful statistical patterns from large-scale and high-dimensional datasets. Neural networks [19] has also been used to improve elevator ride comfort via speed profile design. Neural networks [20] has been applied successfully to nonlinear time-series modeling. A deep learning network can self-learn the relevant features from multiple signals [21]. Deep learning algorithms are frequently used in areas such as bearing fault diagnosis [22], machine defect detection [23], vibration signal analysis [24], computer vision [25] and image classification [26]. Autoencoding is a process for nonlinear dimension reduction with natural transformation architecture using feedforward neural network [27]. Autoencoders have proven powerful as nonlinear feature extractors [28]. Autoencoders can increase the generalization ability of machine learning models by extracting features of high interest as well as making possible its application to sensor data [29]. Autoencoders were first introduced by LeCun [30], and have been studied for decades. Traditionally, feature learning and dimensionality reduction are the two main features of autoencoders. Recently, autoencoders have been considered one of the most compelling subspace analysis techniques because of the existing theoretical relations between autoencoders and latent variable models [31]. Autoencoders have been used for feature extraction from the data in systems such as induction motors [32] and wind turbines [33] for fault detection, different from elevator systems as in our research.
In our previous research, raw sensor data, mainly acceleration signals, were used to calculate elevator key performance and ride quality features, which we call here existing features. Random forest was used for fault detection based on these existing features. Existing domain specific features are calculated from raw sensor data, but that requires expert knowledge of the domain and results in a loss of information to some extent. To avoid these implications, we have developed an algorithm for profile extraction from the raw sensor data rides including both acceleration and magnetic signals. In addition, a generic algorithm with deep autoencoder random forest approach for automated feature extraction from raw sensor data profiles for fault detection in elevator systems.
Our off-line profile extraction algorithm is signal-based and deep autoencoder random forest method is model-based. First, it extracts profiles from time-series signal and then, calculates highly informative deep features from extracted profiles. It is better than other algorithms because it provides better results, dimensionality reduction and is robust against overfitting characteristics.
We have proposed a reliable fault detection model with above 90% accuracy in fault detection, which will increase the safety of passengers. In addition, we have validated the efficacy of the pre-trained model in terms of false positives with the remaining healthy rides, which is helpful in detecting false alarms for elevator predictive maintenance strategies. It is extremely helpful in reducing unnecessary visits by maintenance personnel to installation sites. Figure 1 shows the fault detection approach used in this paper, which includes raw sensor data rides extracted based on time periods provided by the maintenance data from all floor patterns. Acceleration and magnetic signal rides collected from an elevator system are fed to the algorithm for profile extraction separately. These extracted profiles from all five traction elevators including both acceleration and magnetic signals are then fed to the deep autoencoder model for feature extraction, and then random forest performs the fault detection task based on extracted deep features. We only extract start and stop profiles from the both acceleration and magnetic signal rides because of the different lengths of rides for each floor combination due to the constant speed phase, which is longer when there is longer travel.
This paper provides the following novelties. (1) We propose a new off-line profile extraction algorithm for extracting elevator start and stop events from time-series data. (2) In addition, we propose a new deep autoencoder model to automatically generate highly informative deep features from sensor data for fault detection. The rest of this paper is organized as follows. Section 2 presents the methodology of the paper including profile extraction, deep autoencoder, and random forest algorithms. Then, Section 3 includes the details of experiments performed, results, and discussion. Finally, Section 4 concludes the paper and presents the future work.

Methodology
In this study, we have used 12 different existing features derived from raw sensor data describing the motion and vibration of an elevator for fault detection and diagnostics of multiple faults. We have developed an automated feature extraction technique for raw sensor data in this research as an extension to the work of our previous research to compare the results using new extracted deep features. In our previous research [34], we have used only acceleration signal, which represents vibration related features. In this research, we have extended our approach to include magnetic signals, which represents position related features. This will validate our goal of this research to develop generic models for profile extraction and automated feature extraction for fault detection in the health state monitoring of elevator systems. In addition, we have analyzed almost two months of the data from five traction elevators in this research as an extension to one elevator in our previous research. Each elevator usually produces around 200 rides per day. Each ride used in analysis contains around 5000 rows of the data, which proves robustness of the algorithms over large dataset. We have excluded around 20 rides before and after the time period of faulty rides in selecting healthy rides, which will help us to remove suspicious data from the analysis with our algorithm. We have used 70% of the data for training and rest 30% for testing.

Profile Extraction Algorithm
Raw sensor data collected from elevator systems typically encompass a large collection of data points sampled at high frequency. In order to feed large sensor data to cloud-based applications, it is often desirable to pre-process the data and perform compression before transmission, for example in the form of edge computing performed in the device end. Here we assume that raw data is in the form of a one-dimensional time-series vector with equidistant sampling times. The goal of the proposed method is to compress the raw time series obtained from machinery while maintaining the information about key events, and secondly, to make the data more applicable for machine learning.
The algorithm works in two stages. In the first stage, the signal is pre-processed and normalized, followed by low-pass filtering. The low-pass filtered signal is used for peak detection, which for each elevator travel detects a local minimum and maximum corresponding to acceleration and deceleration (start and stop) events. The algorithm uses crude low-pass filtering with a low cut-off frequency for peak detection, which ensures that a sustained period of acceleration is required for a peak to be registered. This prevents short bursts of noise from being detected as a movement window. Low-pass filtering is applied to ensure that only a sustained acceleration or deceleration event is registered as a peak as opposed to noise.
Low-pass filter is used to avoid noise spikes being detected as peaks. From low-pass filter signal, the algorithm cannot detect the precise magnitude or timing of peaks, but it will detect the approximate region in which to align the event profiles. Unfiltered data is then used for profile alignment.
In the second stage, alignment and collection of equal length profiles is performed based on windowing of the acceleration signal near the peak events. In this stage, the raw acceleration signal is used instead of the filtered signal. A number of time domain alignment methods have been proposed in the literature. Dynamic time warping (DTW) has been commonly applied, e.g., in speech recognition [35], whereas various alignment techniques for sensor data have been presented in [36]. Here alignment is performed against a reference profile. The reference profile is aligned against the raw data in the window of the detected peaks. The length of the initial profile window m is selected empirically based on the sample frequency and the maximum estimated length of the elevator acceleration events. The criterion for optimal alignment was defined as the alignment that minimizes the sum of the Euclidean or L 2 norm. The output from this operation is an n×m matrix of aligned profiles describing n acceleration and deceleration events of length m.
To improve the alignment accuracy, the reference profile is updated iteratively following each batch run. Each sequence in the profile matrix is closely synchronized in time and can hence be considered a repetition of the same signal. Using signal averaging, the new reference profile is calculated as the mean of the n extracted profiles. This both maintains the main characteristics of the signal and reduces the noise. Assuming white noise and perfect synchronization, signal averaging improves the signal-to-noise ratio (SNR) by a factor of √ n. Information in the obtained reference profile can be used to update the window size m. Assuming an overestimated size of the event window, the averaged reference profile will contain superfluous close to zero values corresponding to no acceleration. The number of elements s below this threshold in the reference profile can be used to estimate the optimal window length by reducing the window length m by s for the following iteration.
The off-line profile extraction algorithm is described as following.

Off-line profile extraction algorithm
Pre-procession 1. Read a vector of raw acceleration data containing k elevator travels. Define the zero mean transformed dataset as X.
2. Perform low-pass filtering on X and obtain denoised dataset Y. Initialization 3. Define parameters for reference profile. Set the approximated maximum window length to m samples and height h to the 99th percentile of the low-pass filtered dataset.
4. Define alignment window size a and set k = 1. Iteration 5. From Y(k), detect peak acceleration points y min and y max 6. Align reference profile P against raw dataset X in the vicinity of detected peaks by minimizing the L 2 norm according to 7. Add aligned data points from X(k) as rows into an n × m profile matrix, alternatively separate matrices according to direction of travel (min/max).
8. Set travel window k = k + 1 and repeat steps 5-7 until end of dataset. 9. Update reference profile P with the signal-averaged profile obtained from the column-wise mean of the new profile matrix. 10. Reduce window length m by s samples, where s is the number of elements in P that satisfy where is a close to zero number indicating no acceleration. 11. Set k = 1 and continue with new batch iterations by repeating steps 5-8.

Deep Autoencoder
The deep autoencoder model is based on deep learning autoencoder feature extraction methodology. A basic autoencoder is a fully connected three-layer feedforward neural network with one hidden layer. Typically, the autoencoder has the same number of neurons in the input and output layer and reproduces its inputs as its output. We are using a five-layer deep autoencoder (see Figure 2) including input, output, encoder, decoder, and representation layers, which is a different approach than in [33,37]. In our approach, we first analyze the data to find all floor patterns and then feed the segmented raw sensor data windows in up and down directions separately to the algorithm for profile extraction. Extracted profiles from both acceleration and magnetic signals are fed to the deep autoencoder model for extracting new deep features. Lastly, we apply random forest as a classifier for fault detection based on new deep features extracted from the profiles. We have combined healthy and faulty profiles as a vector from all five traction elevators including both acceleration and magnetic signals before feature extraction. The encoder transforms the input x into corrupted input data x ' using hidden representation H through nonlinear mapping where f (.) is a nonlinear activation function as the sigmoid function, W 1 ∈ R k*m is the weight matrix and b ∈ R k the bias vector to be optimized in encoding with k nodes in the hidden layer [37]. Then, with parameters W 2 ∈ R m*k and c ∈ R m , the decoder uses nonlinear transformation to map hidden representation H to a reconstructed vector x " at the output layer.
x " = g(W 2 H + c) where g(.) is again nonlinear function (sigmoid function). In this study, the weight matrix is W 2 = W 1 T , which is tied weights for better learning performance [38]. Among multiple input variables the use of nonlinear activation functions provides us better opportunity to capture nonlinear relationships. Effective fault detection is a challenge due to nonlinearity of elevator systems and as a result, time-series data will have temporal dependencies. Our proposed approach can capture nonlinear relationships among multiple sensor variables, which has improved the performance in terms of fault detection.

Random Forest
Random forest includes an additional layer of randomness to bagging. It uses different bootstrap samples of the data for constructing each tree [39]. The best subset of predictors is used to split each node in random forest. This counterintuitive strategy is the best feature of random forest, which makes it different from other classifiers as well as robust against overfitting. It is one of the most user-friendly classifiers because it consists of only two parameters: the number of variables and number of trees. However, it is not usually very sensitive to their values [40]. The final classification accuracy of random forest is calculated by averaging, i.e., arithmetic mean of the probabilities of assigning classes related to all the produced trees (e). Testing data (d) that is unknown to all the decision trees is used for evaluation by the voting method (see Figure 3).
where j = 1, 2, ..., C and the combination model is H(a), the number of training subsets are Z depending on which decision tree model is h i (a), i ∈ [1, 2, ..., Z] while output or labels of the P classes are y j , j = 1, 2, ..., P and combined strategy is I(.) defined as: where output of the decision tree is h i (a) and ith class label of the P classes is y j , j = 1, 2, ..., P .

Evaluation Parameters
Evaluation parameters used in this research are defined with the confusion matrix in Table 1.  The rate of positive test result is sensitivity, The ratio of a negative test result is specificity, The overall measure is accuracy,

Results and Discussion
In this research, we first selected all floor patterns like floor 2-5, 3-8, and so on from the data, some of which are shown in Table 2. The next step includes the selection of faulty rides from all floor patterns based on time periods provided by the maintenance data. An equal number of healthy rides are also selected. Only the vertical component of both acceleration and magnetic signal data is selected in this research because it is the most informative aspect, consisting of significant changes in vibration levels as compared to other components. Healthy and faulty rides are fed to the algorithm for profile extraction separately. Start and stop profiles are of equal length, irrespective of floor combination.
First, we have selected all floor patterns from the data and then divided the data into up and down directions. Next, we selected the rides from healthy and faulty parts of the data and extracted profiles from them. These profiles are fed to deep autoencoder model for feature extraction and based on these feature faults are detected.

Up Movement
We have analyzed up and down movements separately because the traction-based elevator usually produces slightly different levels of vibration in each direction. First, we have selected faulty rides based on time periods provided by the maintenance data, including all floor patterns, which is fed to the algorithm for profile extraction, as shown in Figure 4. Then, we have selected an equal number of rides for healthy data, and the extracted profiles are shown in Figure 5. Visualization of the profiles proved that our proposed algorithm extracted elevator start and stop events have equal length irrespective of the floor combination as shown in Figures 4 and 5.  The next step is to label both the healthy and faulty profiles with class labels 0 and 1, respectively. Healthy and faulty profiles with class labels are fed to the deep autoencoder model and the generated deep features are shown in Figure 6. These are called deep features or latent features in deep autoencoder terminology, which shows hidden representations of the data. In Figure 6, we can see from visualization that both features with class labels are perfectly separated, which results in better fault detection.
Extracted deep features are fed to the random forest algorithm for classification, and the results provide 90% accuracy in fault detection as shown in Table 3. We have compared accuracy in terms of avoiding false positives from both features and found that new deep features generated in this research outperform the existing features. We have used the remaining healthy rides for extracting profiles to analyze the number of false positives. These healthy profiles are labelled as class 0 and fed to the deep autoencoder to extract new deep features from the profiles, as shown in Figure 7. These new deep features are then classified with the pre-trained deep autoencoder random forest model to test the efficacy of the model in terms of false positives. Table 3 presents the results for upward movement of the elevator in terms of accuracy, sensitivity, and specificity. We have also included the accuracy of avoiding false positives as an evaluation parameter for this research. The results show that the new deep features provide better accuracy in terms of fault detection and avoiding false positives from the data, which is helpful in detecting false alarms for elevator predictive maintenance strategies. It is extremely helpful in reducing unnecessary visits by maintenance personnel to installation sites.

Down Movement
For downward motion, we have repeated the same analysis procedure as in the case of upward motion. First we have selected the faulty rides and an equal amount of healthy data rides for profile extraction, as shown in Figures 8 and 9. Again, we fed both healthy and faulty profiles with class labels to the deep autoencoder for the extraction of new deep features, as shown in Figure 10.
Finally, the new extracted deep features are classified with random forest model and the results are shown in Table 4. After this, the remaining healthy rides are used to analyze the number of false positives. The extracted deep features are shown in Figure 11. Table 4 presents the results for fault detection with deep autoencoder random forest model in the downward direction. The results are similar to the upward direction, but we can see significant change in terms of accuracy of fault detection and when analyzing the number of false positives with new deep features. All floor healthy profiles−down Figure 9. Profiles from healthy rides (similar to Figure 5 but in downward movement of elevator system).

Conclusions and Future Work
This research focuses on the health monitoring of elevator systems using a novel fault detection technique. The goal of this research was to develop generic models for profile extraction and automated feature extraction for fault detection in the health state monitoring of elevator systems. Our approach in this research provided above 90% accuracy in fault detection and in the case of analyzing false positives for all floor combinations with new extracted deep features from sensor data including both acceleration and magnetic signals. The results support the goal of this research of developing generic models which can be used in other machine systems for fault detection. The results are useful in terms of detecting false alarms in elevator predictive maintenance. The approach will also reduce unnecessary visits of maintenance personnel to installation sites if the analysis results are used to allocate maintenance resources. Our developed models can also be used for different predictive maintenance solutions to automatically generate highly informative deep features for solving diagnostics problems. Our models outperform others because of new deep features extracted from the dataset as compared to existing features calculated from the same raw sensor dataset. The automated feature extraction approach does not require any prior domain knowledge. It also provides dimensionality reduction and is robust against overfitting characteristics. The experimental results show the feasibility of our generic models, which will increase the safety of passengers as well as serve the public interest. Visualization of the extracted profiles and features support our goal of developing generic models for profile and feature extraction for fault detection.
In future work, we will extend our approach on other real-world big data cases to validate its potential for other applications and improve its efficacy.