PHIR: A Platform Solution of Data-Driven Health Monitoring for Industrial Robots

: The large-scale application of industrial robots has created a demand for more intelligent and efficient health monitoring, which is more efficiently met by data-driven methods due to the surge in data and the advancement of computing technology. However, applying deep learning methods to industrial robots presents critical challenges such as data collection, application packaging, and the need for customized algorithms. To overcome these difficulties, this paper introduces a Platform of data-driven Health monitoring for IRs (PHIR) that provides a universal framework for manufacturers to utilize deep-learning-based approaches with minimal coding. Real-time data from multiple IRs and sensors is collected through a cloud-edge system and undergoes unified pre-processing to facilitate model training with a large volume of data. To enable code-free development, containerization technology is used to convert algorithms into operators, and users are provided with a process orchestration interface. Furthermore, algorithm research both for sudden fault and long-term aging failure detection is conducted and applied to the platform for industrial robot health monitoring experiments, by which the superiority of the proposed platform, in reality, is proven through positive results.


Introduction
Industrial robots (IR) have been widely used in industry and manufacturing for their flexibility, efficiency, and high precision [1].According to the International Federation of Robotics, annual installations exceeded 500,000 in 2023 [2].Abnormalities or malfunctions in IRs can lead to unplanned shutdowns of production lines, which will not only affect the yield but also bring about huge economic losses for related companies.Therefore, IR health condition monitoring has become a key to improving the safety and reliability of production systems.However, traditional methods, including technologies based on historical experience, mechanism models, and expert systems, suffer from complex modeling issues, a lack of shared mechanisms, or limited expert knowledge, and thus could not meet the requirement of large-scale robot health monitoring.
With the development of technologies such as the Internet of Things (IoT), big data, cloud computing, and artificial intelligence (AI), data-driven IR health monitoring methods are promising candidates for solving the above problems.To this end, efforts from two directions have been explored: platform construction and algorithm research.For platform construction, Wang et al. [3] provide platforms for IR health monitoring and have experimentally applied some traditional machine learning algorithms.However, they focus more on platform construction to collect and store robot data, while the model development of IR health monitoring only stops at the introduction of simple machine learning methods.As for algorithms, some studies try to introduce deep learning approaches [4,5] or fuse physical mechanisms with neural networks [6,7] to learn the patterns better.Yet the application of these approaches is virtually unexplored since they are verified on open datasets or a single experimental IR, with a lack of attention to data sources and application deployment.On the other hand, the robot health monitoring task is essentially a multi-variate time series anomaly detection (MTSAD) problem.There has been a recent significant growth in research on this problem, and the above methods have not followed these advances well [8].To summarize, the platform and algorithm studies described above are isolated and do not comprehensively consider all aspects of the process from deep model training to application deployment.
Specifically, there are the following challenges in applying deep learning or datadriven methods to IR health monitoring and forming services: (1) Deep learning is highly dependent on a large amount of data while industrial robot execution data are often stored in manufacturing execution systems (MES) and separated from sensor data.Data from different IRs and sensors cannot be managed and processed uniformly, and are difficult to directly feed into deep models.(2) Many machine learning/deep learning algorithms and preprocessing methods are complex for most practitioners in the industrial field, so they are not very skilled in developing and using these algorithms.Therefore, this limitation has become a threshold for the migration of the algorithms to the industry.(3) Recent MTSAD algorithms cannot be directly migrated to the IR health monitoring task because IRs perform different actions in factories and have complex and diverse faults, such as sudden motor failures and long-term wear failure of transmission components.
To address the above issues, we propose an innovative PHIR, which is the first relatively successful platform combining IoT systems with health monitoring algorithms in the IR field to our knowledge.First, we propose a cloud-edge-robot architecture to collect high-frequency time series into the cloud.Then, a process orchestration function is developed to enable pluggable integration of various IR-related algorithms, where they can be easily packed into containers and strung as services through a non-code editor.This orchestration function allows developers to focus on the development of the model itself, ignoring upstream and downstream processes such as data cleaning and model deployment.In addition, we have also proposed two deep-learning models for IRs and deployed them on our PHIR.AE-DTW uses autoencoders and modifies the loss function to make it easier to detect sudden faults such as servo motor shutdown and received a best F1 score of 0.84.The transformer-SD improves the traditional transformer with signal decomposition to better learn long-term patterns, which obtained an F1 score of 0.77 on our dataset.The PHIR has been applied in real-life scenarios for more than 2 years, which has been connected to 13 industrial robots and collected more than 100 million time series points.Furthermore, more than 20 components of different algorithms are contained in the PHIR, and its efficiency has been verified by extensive experiments.
The structure of this article is as follows: in Section 2, we introduce the related work; in Section 3, we focus on the system design and the employed IR health condition monitoring algorithms; and in Section 4, we present the results of the experiment.

Related Work
The maintenance of industrial robots has always been a crucial part of their management and application.In this section, we outline the development process of existing IoT systems and algorithms that are related to IR health monitoring.
Platforms.With the improvement of edge and cloud infrastructure [9][10][11][12][13][14], a large amount of work [15][16][17] has been conducted to manage industrial robots using the technology of IoT.In [18], the authors summarize the latest software technologies required to collect and manage all data through IoT devices that are deployed on production lines.Cloud robotics [15] suggests that robots could employ cloud infrastructures such as databases and data centers to store and process data.The knowledge sharing over the cloud makes it possible to build intelligent robot applications including health condition management [17,19].However, they are far from industrial sites and face poor real-time performance and the risk of privacy leakage.Therefore, cloud-edge IoT platforms are explored.For example, frameworks for IR health monitoring have been introduced in [3], where some traditional machine learning algorithms have been applied in experiments.However, these platforms only provide framework design but do not develop services for industrial users.
Algorithms.In recent years, with the emergence of new robot control algorithms and systems [20][21][22], data-driven health monitoring technology has received increasing attention in industry and academia [23,24] for its advantages of simple use, low costs, and suitability for large-scale applications.In this context, industrial equipment data of high frequency and large volume are difficult for humans to annotate.Therefore, related research mainly focuses on unsupervised learning.Ref. [23] assesses the health of IRs by predicting end-effector deviations based on vision.Ref. [24] predicts faults of IRs based on the k-nearest neighbor (KNN) and long short-term memory (LSTM) network and formulates predictive maintenance strategies for IRs based on knowledge graphs (KGs).Although these methods apply AI technology to the field of IR, they do not keep up with the latest advances in deep learning models, such as Transformer.In addition, the IR health monitoring task is essentially an MTSAD problem, while classification-based AD methods [25,26] face the challenge of data imbalance.Therefore, some research takes a process of augmentation and uses the labels [27][28][29][30], while some unrealistic anomalies may be introduced in studies.Normality-based anomaly detection methods [31][32][33][34][35][36] believe that normal data should have one or more characteristics, while data that do not meet them will be judged as abnormal.These methods are widely applied in various fields, rather than being tailored specifically for robots.Therefore, some studies try to use robot features [4,5] or fuse physical mechanisms with neural networks [6,7].However, these approaches are only verified on open datasets or experimental IRs and have not yet been applied in industrial sites.

Methodology
For real-time computing and privacy considerations, the system consists of three layers, including the device, edge, and cloud, as shown in Figure 1.The device layer is the monitored objects such as 6-axle robots, actuators, and various sensors that collect environmental or video data from the factory, etc.The edge layer includes various computing devices that connect to the device layer through wired or wireless networks to acquire data and provide a local monitor for the staff on site.Finally, the cloud layer is a centralized computing center that receives data from the edge layer, processes them, trains the model, and delivers the model to the edge layer.Additionally, the cloud provides interfaces for defining their workflows to the users.Next, we will introduce the three primary functions in detail: data collection, workflow orchestration, and our health condition monitoring algorithms.

Data Collection
Data collection for IRs presents several challenges, including the following: • Heterogeneous Data: Industrial robots generate data from various sources, such as sensors, cameras, and actuators.These data can be in different formats and structures, making it challenging to collect and integrate into a unified system.Thus, robust data processing techniques are required to ensure compatibility and consistency in handling this heterogeneity.• Different Data Transmission Protocols: Industrial robots often use different data transmission protocols, depending on the devices and systems involved.Coordinating and collecting data from these diverse protocols can be complex and requires specialized integration mechanisms to ensure seamless data transfer.• Aggregation of High-Frequency Massive Data: Industrial robots generate a significant amount of data at high frequencies, which poses challenges in massive data collecting, storing, and processing in real-time.Efficient aggregation techniques, along with scalable storage and processing infrastructure, are necessary to handle this highfrequency data flow effectively.
To address the challenges of data collection in industrial robots, we have implemented several strategies.
Firstly, we have customized schemas for different data types and stored them in regular databases, temporal databases, and unstructured databases.The system has two groups of entities, where the first group focuses on robot-related devices.This first group includes edge devices, robots, and robotic end effectors.The end effectors are installed on the wrist of a robotic arm to interact with the environment, and their type depends on the application scenario, such as polishing, welding, spraying, handling, and assembly.To handle the static and dynamic data of robots and effectors, their data are modeled as static (attributes) and dynamic (such as robot current and welding effectors) models separately.There is a bi-directional association between edges and robots, and dynamic models are associated with static models.The other group of entities refers to external sensors deployed at the edge to collect environmental data, and video cameras to monitor the site.These entity groups enable effective modeling and management of the various components in the industrial environment, ensuring that data are collected accurately and efficiently for further analysis and decision making.
Secondly, in the Data Collector component, we have developed a flexible interface for data collection protocols such as OPC UA, MQTT, and TCP sockets.These protocol engines are loosely coupled with the subsequent processing, enabling us to add new protocols easily.Users can customize their own engines, ensuring that the output format conforms to the data schema.The flexibility of our data collection interface ensures that the system can handle diverse data transmission protocols and devices.It also enables quick integration with new protocols, so that the time and effort required to add new data sources to the system can be significantly reduced.
Thirdly, we have developed a Data Cleaner module on the edge layer.This module performs data integrity and correctness verification, as well as temporal alignment processing, which is essential for further analysis.Meanwhile, it compresses data to reduce bandwidth usage between edges and the cloud.These techniques ensure that the system can process and analyze the data quickly and efficiently, enabling optimal performance in industrial settings.
Finally, we have implemented a message queue mechanism for communication between cloud edges.The data are firstly saved in a Redis database, which makes the system more fault-tolerant, and data processing more efficiently and reliably.

Process Orchestration
Our low code development environment is designed to simplify the implementation of deep learning algorithms in robotics for non-computer professionals.The environment leverages containerization technology, which allows us to encapsulate data pre-processing methods and deep learning algorithms into containers.These containers are portable and can be deployed across different environments, making it easier for users to access and use these components.To manage these containers, we use Kubernetes on the cloud to provide a scalable and efficient framework for deploying, managing, and scaling containerized applications.We can choreograph the processes involved in the implementation of deep learning algorithms to ensure that they are executed efficiently and reliably.
Before developing a complete health monitoring application, we compressed and packaged different algorithms into images, and stored them in the operator base, as Operators described in Figure 1.This also includes algorithms developed specifically for robots, which will be introduced in the next section.To launch an application from the raw algorithms, the user and our system need to follow three main steps: building a workflow, creating Operators (Ops), and deploying an instance, among which the users only do the first step.
Users can define their workflows from the cloud through our Workflow Definition interface, which provides a graphical tool based on Scalable Vector Graphics (SVG) that transforms the algorithms and data streams into nodes and connections between them, as shown in Figure 2. Algorithms and data sources are drowned as some points, and the flows are shown as the lines between them.Users can modify the information of the nodes, such as parameters, image addresses, and node attributes.It provides a userfriendly interface where users can drag-and-drop Ops over a graphical display to build their workflow.Once the workflow is submitted, the Workflow Builder compiles it into a workflow description of a K8s Custom Resource Definition (CRD).This includes the description of the whole data flow and each Ops.The compiled workflow is then ready for deployment.Then the Workflow Builder pulls the algorithm images to the running environment and executes the programs according to the CRD.Now the algorithms come into Ops, which are the basic computational units that perform a custom task on data streams.It is worth noting that we design a user-transparent framework to provide a common environment for embedding the Docker images in Operators, which supports multi-threaded communication between upstream and downstream and saves state for stateful Ops.
To deploy an instance of the workflow, we employ the K8s CustomController, which acts as a vigilant overseer of newly created instances and handles resource management.When a deployment request is received, the CustomController interacts with the K8s API server to allocate pods and services.The Scheduler, a crucial component, employs an intelligent scheduling mechanism to determine the optimal placement of pods, ensuring the smooth and efficient execution of the workflow.Once the Scheduler has determined the best placement, the API server takes charge.It pulls the necessary Docker images and orchestrates their execution within the designated pods.This step ensures that the required resources are provisioned and ready for use.Once the pods are successfully launched and operational, the CustomController receives an update regarding the state of the workflow.This information allows it to maintain an accurate record process of workflow and make any necessary adjustments or updates.

Health Condition Monitoring Algorithms
The universal anomaly detection algorithm has not been improved due to the characteristics of industrial robots such as complex and diverse abnormal signals, difficulty in capturing abnormal states, and long aging time.In this paper, we extract motion-independent features of IRs.Then we propose two algorithms to detect sudden anomalies (AE-DTW) and learn long-term age patterns (Transformer-SD), which will be introduced below.

AE-DTW
IR time series data are difficult for humans to label due to the high frequency and complex pattern of various abnormal types such as mutation point anomalies, context anomalies, and subsequence anomalies that appear in the collected datasets.Furthermore, data collected over different motions may disturb extracting features related to health.In this paper, we propose an unsupervised anomaly detection model based on autoencoder reconstruction error to capture the probability distribution of industrial robot data under normal conditions in latent space.To eliminate the influence of different motions, we extract the motion-independent features from time (amplitude fluctuation index), frequency, and time-frequency (signal-to-noise ratio) domains.During the reconstruction process, we introduce Dynamic Time Warping (DTW) in the reconstruction loss, which serves as a health indicator for industrial robots.
Anomaly pattern.The abnormal patterns in the electromechanical signals of the industrial robot body are very complex.Common abnormal patterns in time series data include mutation point anomalies, context anomalies, and subsequence anomalies, as shown in Figure 3.In addition, these abnormal signals may appear in the sensor signals of various components of the industrial robot, which increases the complexity of patterns.For example, IRs generally consist of five parts: reducer, servo motor, connection device, sensor, and controller.Among them, the reducer may suffer from gear wear, gear pitting, and transmission shaft breakage.These faults may appear as mutation point anomalies, context anomalies, and subsequence anomalies in the time series data.Servo motors may experience stator failure, bearing failure, and torque reduction.These failures are manifested in data as distortion of electromechanical signals and signal values exceeding the rated range.The connection device may wear out, causing the movement accuracy of the robot to decrease, which manifests itself as sequence anomalies or range changes in the timing data.Sensors may have faults such as component damage or poor contact, which manifests as point anomalies or context anomalies in the time series data.The controller may have instruction errors or control output errors, which may appear as mutation point anomalies or context anomalies in the time series data.
The model.As shown in Figure 4, the model discards unimportant components in the data and only retains the main characteristics of the signals by fusing the deep representations and the motion-independent features.This paper uses the AutoEncoder model to reduce dimensionality and extract the representations, which is a nonlinear model that can better capture complex patterns with lower computational complexity.
We use the dilated causal convolutional autoencoder, which consists of a residual layer and a nonlinear dimensionality reduction layer.The residual layer generally adopts the structure of convolution → normalization → convolution → normalization, and the input and output of the residual block are connected.The nonlinear dimensionality reduction layer adopts the structure of dilated causal convolution to increase the receptive field while eliminating the grid effect.The output of the last layer of the encoder is the lowdimensional features.The decoder takes the output of the encoder as input, and the output of the decoder is the reconstructed signal X ∈ R C×L .The decoder is mainly composed of a deconvolution layer and a residual layer.The former is used to increase the dimensionality of the low-dimensional representation.The structure of the residual layer is the same as that of the encoder.Finally, the reconstruction error between the input and the predicted signal is used as the health condition index, and whether the IR is abnormal is determined by comparing the offset of the reconstruction error between the monitoring data and the standard data.
The loss function.It is necessary to find a suitable distance metric to make the probability of anomaly points p(x t |x t−1 • • • x t−τ ) → 0 and the probability of anomaly sequences p( The traditional Euclidean distance assumes a one-to-one correspondence between the prediction sequence and the target sequence, and cannot effectively capture the distortion anomalies of the subsequence.Therefore, this paper uses DTW as the loss function of the reconstruction model, which contains two parts and thus can effectively distinguish the shape difference and distortion difference between the prediction sequence and the target sequence. Consider that there are N number of industrial robot signals A = {x i } i∈{1:N} , for each input signal x i = (x 1 i , • • • , x n i ) ∈ R p×n , our model first compresses the input signal into the hidden space and then uses the hidden space state to reconstruct it.Then ŷi = ( ŷ1 i , • • • , ŷn i ) ∈ R p×n can be obtained.Define the real input signal as y i = (y 1 i , • • • , y n i ) ∈ R p×n , where y i = x i .Then the DTW loss can be expressed as where L shape measures the shape difference between the predicted and the real IR signal, and L temporal measures the distortion difference, both of which require the regular matrix between the predicted and real signals.Define the regular path as A ⊂ {0, 1} n×n , if ŷh i and y j i is aligned, then A h,j = 1.Otherwise A h,j = 0.An effective regular path is recorded as A n×n .Its starting point and end point are the coordinates of the upper left corner (0, 0) and the lower right corner (n, n) of the regular matrix, where points can only move through the directions →, ↓, ↘ each time.Let ∆( ŷi , y i ) = [δ( ŷh i , y j i )] h,j represents the regular matrix of the predicted and real IR electromechanical signals, δ(•) as ŷh i and y j i .Then the dynamic normalized distance is where < • > represents matrix multiplication.We formulate the shape loss based on DTW into according to the findings in [37].
We also include a timing skew index.Specifically, the optimal regular path between the predicted and the real IR signal is A * .The distortion index of the industrial robot signal measures the difference between the optimal regular path A * and the first diagonal matric and is formulated as where Ω is a square matrix of size n × n, and Ω(h, j) = (h−j) 2 n 2 , which measures the difference between the optimal regular path A * and the first diagonal matrix.However, in the above formula, the argmin operator is not differentiable, so we consider using ∇ ∆ DTW γ ( ŷi , y i ) instead of A * , which can be To sum up, the distortion loss of industrial robot signals based on DTW can be expressed as The DTW-based distortion loss penalizes the degree of distortion.In industrial robot health monitoring, distortion loss can capture cycle offset anomalies caused by bearing failures, sensor failures, and aging and loosening of transmission belts, because such failures appear as timing distortions in timing data.The final loss function is

Transformer-SD
In practice, industrial robot health monitoring technology often focuses on the abnormal patterns of each component itself, while ignoring the overall aging trend.To capture the long-term aging trend of the nonlinear process for the whole IRs, we propose the Signal Decomposition Transformer model (Transformer-SD) for IRs inspired by Autoformer [38].The main idea of Transformer-SD is to use the current signal of the servo motor to predict the IR axis position since the transmission ratio decreases as the robot ages.
Although the self-attention-enhanced Transformer model can capture the long-distance dependencies in the signals of industrial robots, two flaws still exist.First, the IR signals contain complex intrinsic patterns.Specifically, it is difficult for Transformer to capture the interrelationships and changes between various components.Secondly, the space complexity increases quadratically with the input length, and directly modeling fine-grained long-term sequences meets the memory bottleneck, which seriously affects the practical application of the Transformer.In recent years, many studies have been trying to solve the efficiency problem of Transformer by replacing the original dense point-by-point attention mechanism with a sparse self-attention.To address the above issues, this paper decomposes the signal with STL in a component of the deep neural network model and integrates it into the Transformer model based on the sparse self-attention mechanism, making it effectively capture long-term dependencies.
The basic structure of Transformer-SD is shown in Figure 5. Specifically, the length of the input 6-axis IR electromechanical signal is I, and the length of the predicted endeffector position sequence is O.To improve the ability to capture complex industrial robot electromechanical signal patterns, this paper introduces STL decomposition into the Transformer model and gradually decomposes the original industrial robot electromechanical signal X into trend terms X t and periodic term X s .In actual implementation, this article uses a sliding average to filter out periodic fluctuations.The remaining part after the sliding average is the periodic term of the signal.Given X ∈ R L×d , trend term X t = AvgPool(Padding(X )), periodic term X s = X − X t , where X t , X s ∈ R L×d .We use X s , X t = SeriesDecomp(X ) to represent the whole process.The signal decomposition Transformer model is divided into two parts: the encoder and the decoder.The input of the encoder is the 6-axis IR signal with length O after the predicted time point X en ∈ R O×d , and the input of the decoder is the industrial robot end effector position sequence periodic term X des ∈ R (3I+O)×d of length (3I + O) and the trend term X det ∈ R (3I+O)×d of length (3I + O).Each input of the decoder contains two parts.The first part is the industrial robot end-effector position sequence X de with length (3I) before the prediction time point.Since the Transformer model captures long time series dependencies, this part of the input is three time series with an interval of τ, that is, where t represents the prediction time point, and X de is used to provide the information of the latest period of the output sequence.The second part is a scalar placeholder of length O.The overall process of the model based on long-term dependent prediction errors can be formally expressed as X des = Concat(X des , X 0 ) In addition, to comprehensively verify our algorithm, we organized the data collected by 13 IRs connected to PHIR.Similarly, we collected the current, speed, and position data of each axis of IR at a frequency of 100 Hz, forming a total of 18 dimensions.The working life of an industrial robot is 3~5 years, so we upsample the data for 10 min every 7 days.According to the type of robot failure, the data are divided into two subsets, where A is the data from robots with sudden failures, and B is from robots with long-term failures due to aging and wear.We have divided all abnormal data into the test set.The specific data can be seen in Table 1.

Metric
In the experiment, this paper selects the Best F1 Score and AUC value as evaluation indicators.The F1 is a commonly used indicator in anomaly detection problems.It is the harmonic average of precision and recall.The calculation process is depicted as follows: where TP (true positives) represents the number of samples predicted to be abnormal.FP (false positives) represents the number of samples predicted to be abnormal but normal.
FN (false negatives) is the number of samples predicted to be normal but abnormal.AUC is the area under the receiver operating characteristic (ROC) curve of the machine learning model.The abscissa of the ROC curve is the false positive rate (FPR), and the ordinate is the true positive rate (TPR).The two calculation methods are described as follows:

Baselines and Implementation Details
The proposed approaches are compared against the following sudden and long-term aging failure detection methods.
Sudden Failure Detection Baselines.Four commonly used sudden failure detection baselines are adopted: PCA, t-Stochastic Neighbor Embedding (t-SNE), Autoencoder (AE), and Autoencoder with Convolution (AE-C).PCA and t-SNE are completed with the Scikitlearn library.They can reduce the dimensionality of the 18-dimensional robot signal to 6 dimensions, and then input the signal into the One-class SVM model to detect anomalies.AE has six fully connected layers, and AE-C contains three convolutional layers and three deconvolutional layers.Both use Mean Squared Error (MSE) as the loss function.
Long-term Aging Failure Detection Baselines.Then, four long-term aging failure anomaly detection methods are used to measure the performance of detection, which include the LSTM network, Gated Recurrent Unit (GRU), self-attention Transformer, and Probabilistic Sparse Transformer (Transformer-PS) [39].LSTM consists of an encoder and a decoder.The encoder contains three layers of LSTM units.The dimension of the hidden state is 64, which is aligned with the structure of the decoder.The Dropout layer deactivates neurons with a probability of 0.5 and is used to connect each layer of LSTM units.Teacher-forcing mode is used when training the model, and the decoder uses the real value as the input of the next time step with a probability of 0.5.The GRU model structure is similar to LSTM, but it uses GRU units internally.
For AE-DTW, the original time series is cut into many equal-length samples by the sliding windows with window_size of 100 and step of 25.Furthermore, since Transformer-SD is prediction-based, we take every 300 timestamps (3s) as input and predict the next 100 ones (1s) instead of using sliding windows.The hyper-parameters of AE-DTW and Transformer-SD are shown in Table 2.The latent space dimension of AE-DTW is 64 dimensions.AE-DTW encoder and decoder have similar structures, with three convolutional layers and deconvolutional layers, respectively.The convolutional layer has a kernel of 3 and a stride of 1, and the deconvolution layer has a kernel of 5 and a stride of 2. The expansion coefficient of the convolution is 2, and the activation function is ReLU.Transformer-SD encoder contains two layers of Transformer modules, and the decoder contains three layers of Transformer modules.Lastly, all the models are built with PyTorch 1.7 and trained on the NVIDIA Tesla V100 GPU of our platform.

Results
We report the affiliation F1-score and AUC in Table 3. From a dataset perspective, the F1 and UCR of AE-DTW on subset A reach 0.837 and 0.752.On subset B, the F1 and UCR of Transformer-SD reached 0.772 and 0.693, respectively.These show that a single model is insufficient for detecting various IR faults, which drives the requirement for the cooperation of multiple types of models.In terms of the methods of detecting sudden failure, three conclusions can be drawn: First, PCA and t-SNE perform poorly on two subsets, showing that shallow methods are difficult to adapt to complex IR data.Second, the F1 of AE-DTW is higher than that of AE by more than 0.2, indicating that the causal convolution structure could improve the performance of detecting.That is because the causal convolution structure reduces the number of parameters and increases the depth of the network, which better captures the main components of the IR time series.Third, AE-DTW performs better than AE-C, indicating that the DTW loss function is more relevant to the nature of the time series.From the view of the long-term aging failure detection method, LSTM and GRU perform more poorly than Transformer and this indicates the efficiency of the self-attention mechanism.On the other hand, Transformer-SD with AUC over 0.69 performs well, showing that combining Transformer and STL decomposition is an effective way to extract long-term features of time series.Overall, the proposed AE-DTW and Transformer-SD outperform all baselines on both subsets, demonstrating the effectiveness and robustness of sudden and long-term aging failure detection models.

Hyper-parameters Analysis
In this section, a sensitivity analysis on subsets A and B is performed to study three main parameters: window size Ws, window step Ts, and output length Ol. Figure 7a,b show the effect of Ws and Ts on the overall performance of AE-DTW, where the y-axis stands for the F1-score.For subset A, we notice that DTW-AE is sensitive to Ws with a change amplitude exceeding 0.1, and when Ws = 100, F1 achieves the best.Thus appropriate window size Ws should be selected according to the depth of the model.Our AE-DTW model achieves the best performance when Ts = 25.It is also worth noting that when Ws = 100, our model is less sensitive to the Ts value.Figure 7c shows the results of varying the window step output length Ol of Transformer-SD in a range between 20 and 140.The model performs best on the subset B when Ol = 100, which suggests that increasing the output time length can improve the performance.However, a larger length can harm the performance as it exceeds the length of time that the model can be predictive based on the input data.

Conclusions
We developed an IR health monitoring platform PHIR that combines the IoT technology with data-driven methods, which collects data at high frequency in real-time and provides users with a customized interface for deep algorithm services, providing solutions for deep mining of multi-source data in the industrial field.In addition, we propose two deep models, AE-DTW for sudden faults and Transformer-SD for long-term ones.We deployed the system in a real production environment, collected operating data of various types and models of industrial robots for two years, and established a dataset, which proved the practicability of our system.Experiments based on this dataset demonstrate the effectiveness and advancement of our proposed model.We believe that our work can accelerate the application of deep learning algorithms in the industrial field, facilitate the integration of IT and OT, and contribute to the development of intelligent manufacturing.Due to the data-hungry nature of deep models, that is, the more training data, the better the performance, the future work will focus on the following two aspects.First of all, it is inevitable that more robots will be connected and more data collected, and it is necessary to improve the concurrency capability and high availability of the PHIR system.On the other hand, algorithm optimization should also be carried out to reduce the pressure on data collection and enable a more lightweight model to be deployed.

Figure 1 .
Figure 1.The framework of PHIR.The edge layer establishes direct connections to devices, ensuring privacy and real-time data processing.The cloud layer stores data and conducts complex training to ensure the performance of health condition monitoring.

Figure 2 .
Figure 2. The workflow definition interface, where nodes are data sources or processing processes and arrows represent the data flow.Take AE-DTW as an example; data from the data source flow to the normalization component and are then subjected to feature extraction in time, frequency, and time-frequency domains.The features are then fed into the AE-DTW algorithm and the results predicted by the model are stored.

Figure 3 .
Figure 3. Anomaly patterns.The red boxes are anomalies.

Figure 4 .
Figure 4.The overall architecture of the proposed AE-DTW model, where the purple blocks are original and reconstructed data, the blue blocks are learned features, and the yellow block is the loss.

Figure 5 .
Figure 5.The overall architecture of the proposed Transformer-SD model, where green blocks are the self-attention layers, the blue blocks are the series decomposition layers, and the pink blocks are the feed forward layers.

Figure 6 .
Figure 6.Application scenario of IRs in an assembly workshop.The key instruments including AGV, AGV controller, MSE, and our edge device are pointed out with arrows.

Figure 7 .
Figure 7. Three sensitivity analysis experiments on subsets A and B, where (a,b) are the hyperparameters window size Ws and window step Ts of AE-DTW, and (c) is the output length Ol of Transformer-SD.

Table 1 .
Statistics of the two sub-datasets used in experiments.

Table 3 .
F1 score and AOC of AE-DTW and Transformer-SD on sub-datasets A and B. The best results are shown in bold and the next best ones are underlined.