Integrating Physics and Data Driven Cyber-Physical System for Condition Monitoring of Critical Transmission Components in Smart Production Line

: In response to the lack of a uniﬁed cyber–physical system framework, which combined the Internet of Things, industrial big data, and deep learning algorithms for the condition monitoring of critical transmission components in a smart production line. In this study, based on the conceptu-alization of the layers, a novel ﬁve-layer cyber–physical systems framework for smart production lines is proposed. This architecture integrates physics and is data-driven. The smart connection layer collects and transmits data, the physical equation modeling layer converts low-value raw data into high-value feature information via signal processing, the machine learning modeling layer realizes condition prediction through a deep learning algorithm, and scientiﬁc decision-making and predictive maintenance are completed through a cognition layer and a conﬁguration layer. Case studies on three critical transmission components—spindles, bearings, and gears—are carried out to validate the effectiveness of the proposed framework and hybrid model for condition monitoring. The prediction results of the three datasets show that the system is successful in distinguishing condition, while the short time Fourier transform signal processing and deep residual network deep learning algorithm is superior to that of other models. The proposed framework and approach are scalable and generalizable and lay the foundation for the extension of the model.


Introduction
Cyber-physical systems (CPSs) are an indispensable part of intelligent manufacturing (IM) and Industry 4.0, which is gradually transforming the landscape of the global manufacturing industry. CPSs are multidisciplinary systems that combine computation, communication, and control technologies to conduct real-time measurements, data transmission, monitoring, decision making, feedback control, and other functions on widely distributed embedded computing systems [1,2]. CPSs are considered to be a major feature of the new industrial revolution [3]. The deployment of a CPS system in Industry 4.0 has attracted the interest of many scholars and researchers [4]. Lee et al. [5] believe that the use of CPSs will move the manufacturing industry towards Industry 4.0; they proposed a unified 5C architecture as a guideline for deploying CPS. This guideline shows how to use the feedback from the initial data collection through to analytics in the final decision-making stage. Zhang et al. [6] presented a four-level unified architecture for a cyber-physical production system based on a digital twin. Liu et al. [7] presented a three-level CPS framework for workshops and introduced a guideline for the connection of physical components; obtaining and preprocessing data; visualizing results; and the Table 1. The division of CPS layers.

Number of CPS Layer Framework Layers
five levels [2,3,5,[8][9][10][11][12]: 5C framework [13]: machine, data analysis, optimization, design layer four levels [6]: physical, network, virtual, application [14]: connection, features, mining and modeling, decision three levels [7]: physical connection, middleware, computation [15]: physical resources, local server, cloud server Unclear delineated [16][17][18][19][20][21] Sophisticated machine learning (ML) based on industrial big data is a key component of CPSs that will enable us to reach a new phase of IM [22]. ML can extract the features of data collected by various sensors in a physical system and ultimately contribute to the integration of the physical system and cyber space. ML has been applied in practical industry and achieved impressive performance [23]. Shin et al. [8] proposed deploying a shop-floor CPS framework where a support vector machine was used as an ML technology for predicting the occurrence of anomalies. They designed and simulated this CPS framework based on ML for Industry 4.0. Ahmed et al. [3] proposed a real-time data-driven CPS with classification and regression ML algorithms implemented in the system. Wu et al. [21] proposed a novel ML algorithm for imbalanced data in a Prognostics and Health Management (PHM) CPS. Table 2 shows the application area and algorithm implementation of CPS. It can be seen from Tables 1 and 2 that the methods implemented by deep learning (DL) do not have a clear CPS architecture but simply involve the CPS field. Meanwhile, these methods only realize CPS applications based on data modeling, without considering the influence of physical modeling, and lack a framework to monitor the condition of critical transmission components (CTCs) in smart production lines (SPLs) from the perspective of CPS hybrid modeling.
With the development of IM and network collaborative manufacturing, data-driven SPL now plays an important role in ensuring the stable production of workshops, enterprises, and industrial chains [24][25][26][27]. The CTCs of SPL, such as motor spindles, bearings, and gears, operate under variable conditions for a long time. The breakdown of mechanical equipment in production lines is caused by motors, bearings, and gears, which account for failure proportions of 27%, 41%, and 20%, respectively [28]. Xiao et al. [29] identified motors to be a vital component of the production line and showed that reliability analyses can ensure the reliable operation of the overall security of the SPL, enabling it to avoid economic losses and catastrophic accidents. Bampoula et al. [16] proposed a way to change from preventive maintenance to predictive in cyber-physical production systems and introduced a new DL model for estimating the remaining useful life (RUL) of the monitored equipment. A new automatic RUL prediction method for continuous production line equipment based on ML is proposed in [30]. Ayvaz et al. [31] developed a data-driven PHM for potential failures in production lines. Kang et al. [32] performed a systematic literature review of ML applications in production lines, where quality optimization, scheduling optimization, yield improvement, and product failure detection are the main research fields. DL technology has obtained significant achievements in SPL applications. There is great potential to realize the condition monitoring of CTC by integrating DL technology into a CPS architecture.
Within the background of network collaborative manufacturing, the integration of cyber and physical resources of SPL is further deepened and the condition monitoring of CTC has become a hot area in academia and industry. Some research gaps could be identified after reviewing the literature: (1) The research focuses on quality control, production scheduling, yield improvement, etc. The research on the condition monitoring of CTC in SPL is limited, especially that based on the data of motor spindles in the production line of machine tools. (2) There is a lack of a unified CPS framework and algorithm implementation that combines emerging technologies such as Internet of Things, big data, DL, etc., for CTC condition monitoring and reliability analysis in the manufacturing process of SPL. (3) The existing framework does not consider the key technical characteristics of CPS, especially those of deploying a CPS framework from the perspective of hybrid modeling.
In this paper, we proposed a theoretical CPS framework for CTC condition monitoring and reliability analysis, which is distinguished from existing solutions. Hybrid modeling, detailed algorithm implementation, and integrating DL rather than shallow ML are the main characteristics of the CPS framework. The accurate mapping of the vibration signal, which is easy to collect, and the CTC condition, which is difficult to obtain, was established. The effectiveness and predictive ability of the framework were validated with the use of multiple DL algorithms and signal processing steps through experiments on three datasets. The framework and approach proposed in this paper can be generalized to other SPL big data scenarios.
The structure of this paper is organized as follows. The deployment and implementation of the CPS framework and the description of the layers are presented in Section 2. The proposed hybrid model is elaborated on in Section 3. In Section 4, three experiments are carried out with different experimental results and the findings are provided to demonstrate the effectiveness of the developed CPS framework. Some observations and future directions are summarized in Section 5. Finally, conclusions are given in Section 6.

CPS Architecture for CTC Condition Monitoring
CPS is a computer-based technical system that has a clear architecture and sequential workflow manner that closely connects various complex processes and information from physical reality with that from cyber space, providing computation, communication, and control. Inspired by the layer architecture design of CPS, industrial big data, Internet of Things, and DL, this paper proposes a novel five-layer CPS framework which consists of the following layers: a smart connection layer, physics equation-based modeling (PEM) layer, ML-based modeling (MLM) layer, cognition layer, and configuration layer. The detailed architecture is shown in Figure 1, which presents the integration of physical components (e.g., motors, spindles, bearings, gears, and sensors) and cyber components (e.g., communication, computing, control).  Smart Connection Layer: data acquisition is the main function of this layer. Sensors are the sensing elements of physical systems. Through reasonable multi-sensor position arrangement and type selection [33], variable accurate and reliable signals, such as vibration, current, force, and rotation error signals, can be collected. The collected datasets are transmitted to a large-capacity data storage device and saved in the cloud. Two important problems must be considered at this layer. Firstly, the type and specification of the sensor ensures the validity of the data, meaning whether the data are reliable. Secondly, the location of the sensor is related to the accuracy of the DL model, which relates to whether the data can be used to make accurate predictions.

Full Connection Hidden Layer
PEM Layer: CPS modeling based on physical equations is the core technology used in this layer. The raw data are transferred to this layer through fieldbus and/or industrial Ethernet. Raw data are heterogeneous, imperfect, and large-scale [34] and cannot be directly used in the next layer. Data preprocessing, including data fusion, data normalization, and feature extraction, are the important considerations at this level. Through this layer, lowvalue data can be transformed into high-value information. The detailed physical modeling workflow will be discussed in Section 3.2.
MLM Layer: The meaningful information obtained from the PEM Layer is transferred to this level to build a CPS data-driven model, and a workflow from the physical components to the cyber components is established. This layer needs to provide support for the development of DL algorithms, the prediction model construction, and transfer learning and application implementation in a cyber virtual environment. Historical vibration data will be used to develop and train the ML predictive model off-line, while real-time data are fed into the pretraining model to achieve online prediction. In practical applications, the problem of new datasets or tasks will be encountered; transfer learning can effectively solve this problem of models requiring training from scratch. The essence of this method is to solve new but similar tasks by applying knowledge that has been learned previously.
Cognition Layer: The real-time condition monitoring data of CTC and the prediction results from the MLM Layer can be displayed to experts through visualization techniques such as histograms and confusion matrices. In addition, the meaningful prediction results can be further analyzed and mined to evaluate the real-time equipment state. This layer informs us of the optimal decisions to be taken.
Configuration Layer: The core purpose of this layer is to transform the decision information of the cognition layer into maintenance activities and realize supervisory control from physical components to cyber components and back to physical components. The required PHM actions, such as rotation error compensation, replace the bearing, and gear maintenance, can be taken. The evaluation results of the CTC condition monitoring and reliability analysis enable a transition from preventive maintenance activities into predictive ones to be made.

Physical and Data Driven Hybrid Model of CPS
There are two main modeling methods for CPS: modeling based on physical equations (PE) and modeling based on data. PE modeling uses underlying physics relationships to derive its mathematical representation; the main advantages of the PE model are its interpretability and scalability. Interpretability means that the PE model can better understand the modeling process from input to output, which is a white-box model. The causal relationship between parameters and variables is clear. Scalability is reflected in the complex CPS, which is composed of many physical subsystems. Any system of simple to moderate complexity can be established through the use of this method [35]. However, considering the complexity, uncertainty, and time-varying characteristics of CTC working conditions, the physical model is usually simplified [36] to a rough model that is an incomplete representation of the physical process of the real system. This process is time-consuming, expensive, and requires a high level domain expertise for industrial applications.
The ML method based on data eliminates the limitation of relying on system dynamics knowledge and completely depends on data, meaning that it is suitable for modeling different types of systems. However, as a complex mechanical system, the vibration signal of CTC usually contains a large amount of redundant information, and it is difficult to extract valuable state features without physical preprocessing. The use of appropriate data preprocessing will enable us to improve the prediction accuracy [37]. Therefore, the misuse of data and the DL model may lead to large errors or even obtain results that violate the laws of physics.
As shown in Figure 2, in order to fully utilize the advantages of PE and DL, a new hybrid CPS model based on physical preprocessing and data is proposed for CTC condition monitoring and reliability analysis. CPS modeling based on PE consists of three steps: raw vibration signal data fusion, data normalization, and feature extraction based on signal processing. Data fusion uses multi-sensor information, data normalization eliminates the influence of data distribution, and feature extraction can obtain valuable features from the original data. Input data X are transformed into X after physical preprocessing. The high-value features of the previous step X are divided into training sets and test sets and then sent into the DL model via CPS data modeling. Through model training and testing, the optimal DL model is used for the online deployment of SPL. The model training loss Y ML is calculated using the cross entropy loss function. The cross entropy loss function is expressed as: (1) where N is the number of the classes, q i (x) denotes predictive probability of the input data X belonging to ith class, p i (x) denotes real probability.

CPS Modeling Based on PE
Multi-sensor data fusion, as shown in Figure 2, can merge data to make them more informative. The data collected by CPSs are multi-source and heterogeneous. Normalization is therefore necessary in order to eliminate the limitations of data units and make them dimensionless, which is convenient for the comparison and weighting of different units or scales. Meanwhile, data normalization can accelerate the convergence of DL models [38]. As shown in Figure 2, the data distribution range of sensor 1 and sensor 2 is made more consistent through the use of data normalization. The normalization method is expressed as: where X represents the input sample, Max is the maximum value of the sample data, Min is the minimum value of the sample data, and X * is the normalized output. In order to extract the meaningful state features from the data and reduce the difficulty of DL training, it is necessary to perform signal processing, which is realized by the PE modeling of the raw data. This processing method can enable us to meet the requirements of 2D convolutional neural networks (CNNs) for two-dimensional matrix inputs. Short-time Fourier transform (STFT) and wavelet packet decomposition (WPD) are able to extract the time-frequency characteristics of data. In addition, this paper also evaluates the accuracy of data matrix transformation (DMT) without signal processing.
STFT: As shown in Figure 3a, through the spectrum function of MATLAB, a section of the time sequence signal can be directly transformed into a time-frequency heat map through STFT. The horizontal axis represents time and the vertical axis represents frequency, while different colors represent different values. The theory of STFT is to multiply the original signal by the window function and carry out the segment-by-segment Fourier transform of the original signal by moving the window function. The window functions commonly used by STFT include rectangular, Hanning, and Gaussian. The equation of STFT is expressed as: where t and τ denote the time, ω denotes the frequency, is a window function whose center is at time t, and STFT(t, ω) is the timefrequency matrix after STFT. Supposing that the length of a signal sample is 1024, through a Hanning window STFT with the length of 64, the output data are a 33 × 33 time-frequency matrix. (a) where j and n represent the number of WPD layers and sub-bands, respectively; C j,n denotes the wavelet coefficient sequence of the nth node in the jth layer; C 0,0 is the original signal; and h and g denote the high-pass filter and low-pass filter, respectively. The equation between h and g is given as: In this paper, DB25 is used as the wavelet basis function and a signal length of 1024 is transformed into a 64 × 64 wavelet coefficient matrix through 6-level WPD. A brief illustration of WPD is shown in Figure 4. The wavelet coefficients of different subbands can be obtained by multi-level WPD. DMT: The raw data are reshaped from a one-dimensional time sequence signal to a two-dimensional signal without signal processing. A simple illustration of DMT is shown in Figure 5, numbers 0 to 1023 denote time sequence signal data point, and each blue square represents the value of the matrix. Supposing that the length of a signal sample is 1024, through DMT, the output data are made into a 32 × 32 matrix.

CPS Modeling Based on DL
As the most important branch of ML, DL has been widely used in computer vision [39,40] and natural language processing [41,42]. We tested the performance of four categories of representative models that are based on CNN, including 5-layer CNN, LeNet, multi-scale CNN (MSCNN), and residual networks (ResNet).
CNN has great advantages in processing matrix data. Sparse connections and weight sharing are its main characteristics. As shown in Figures 6 and 7, CNNs generally include five parts: an input layer, convolution layer, pooling layer, full connection layer, and output layer. BN is batch normalization, ReLU is rectifier linear unit activation function, Conv is convolution layer, GAP is global average pooling layer, MP is max pooling layer, AMP denotes adaptive max pooling layer, green cubes with different shades of color represent convolution operations with different convolution kernel sizes. LeNet is a CNN structure with a small number of channels which was first applied for image recognition [43]. Wu et al. [44] applied LeNet for bearing fault diagnosis. A 5-layer CNN is a medium-sized CNN structure with a final output channel size of 128 [38]. The size of the convolution kernel is vital in CNN, as different convolution kernel sizes can extract features of different scales. The design idea of MSCNN comes from the novel network structure inception [45], which can extract multi-scale features through parallel convolution branches. The MSCNN contains three scales, and each scale consists of different kernel sizes, meaning that various global features and local features of the time-frequency matrix can be extracted. Finally, the extracted features are concatenated together for classification. The combination of local and global feature extraction used for MSCNN has been used in gearbox condition monitoring [46] and bearing fault diagnosis [47]. In CNN, as the number of network layers increases, the phenomena of gradient disappearance and gradient explosion will occur, which makes it difficult to train the network. Deep ResNet, as shown in Figure 7, is an improved variant of CNN that uses identity shortcuts to ease the difficulty of training [48]. ResNet include a series of residual building units (RBU); an RBU can be composed of BN, ReLU, and Conv. The output of each RBU is expressed as: where X denotes the input of RBU, F(X) denotes the output of convolution operation, and Y represents the output of RBU. Considering the WPD input as an example, the detailed structure and feature size of each layer are shown in Table 3, where 3 × 3, 5 × 5, and 7 × 7 denote Conv+BN+ReLU with kernel sizes of 3, 5, and 7; and ×2 means that the same RBU block is appended two times in sequence. The output sizes of each layer, such as 8, 62, 62 are the number of channels, the height, and the width. In the spindle datasets, the first MP was removed.

CPS Framework Implementation and Experimentation
Spindle motors, bearings, and gears are three CTCs of SPL equipment. Component failure will seriously affect the normal operation of the SPL and cause huge economic losses [29,[49][50][51]. Therefore, in this paper, the prediction of spindle motor rotation error and the fault diagnosis of bearings and gears based on two public datasets are used to verify the ability of the proposed CPS framework to tackle the condition monitoring problems of SPL.

Case 1: The Prediction of Spindle Motor Rotation Error
As complex systems integrating mechanical, electrical, hydraulic, and pneumatic aspects, spindle motors are the CTC of SPL. The rotation error of a spindle is closely related to the geometric error, surface quality, and roughness of the workpiece, which is an important index used to reflect the reliability of machine tools [52,53]. Therefore, the accurate prediction of spindle rotation error is of great significance for improving machining precision and efficiency. The existing rotation error measurement methods are usually implemented with the aid of a standard ball in the case of idle conditions [54][55][56], which makes it difficult to reflect the real error in the actual machining process. The derivation of the method used for predicting the rotation error based on the physical dynamics model is complicated and inaccurate [36,57]. It is difficult to achieve real-time condition monitoring and real-time rotation error compensation. The spindle rotation error can be affected by the speed [55] and wear state [36,57]. Fortunately, the spindle vibration data contain information related to the speed [58] and wear state [59]. Therefore, it is possible to predict the rotation error through vibration data.
As shown in Figure 8, the spindle reliability experiment platform of Tsinghua University was established based on CPS. A loading experiment was carried out through the spindle load spectrum [60] to ensure that the spindle load was close to the actual machining. The spindle rotation error was collected by the spindle check machine capability tester every 10 h, and the wear test was carried out during the rest time. Sensors: The vibration sensors were installed on the base, bearing, and spindle to collect vibration signals. The force sensor collected the load force to realize the closed-loop control of the load spectrum; the speed sensor measured the spindle speed to realize closedloop control; and the eddy current sensor was used to detect the spindle displacement. The rotation error could be obtained by a spindle check machine capability tester. A detailed description of the smart connection layer components is provided in Table 4, including the key components of the smart connection layer: specification, communication mode, and function. These constitute the smart connection layer of CPS. AirTac-100 PXIe-1082 controls the pneumatic loading loading unit unit through analog output channel PXIe-1082: A data acquisition and control unit is the core component of the system and is mainly used to realize the loading control of air pump, multi-sensor data acquisition, data transmission to the analysis platform, spindle drive, and feedback control. The configuration layer of the CPS is realized in this part.
Data analysis platform: This part is based on MATLAB and Python. The acquisition and control unit sends the data to the analysis platform to support the PEM layer, MLM layer, and cognition layer of CPS, which is the core part of the data processing and rotation error prediction.
After the data acquisition, it is necessary to preprocess the vibration signal and the corresponding rotation error. The spindle speed ranges from 1000 r/min to 4000 r/min; the experiment shows that the variation range of rotation error is 5-14.5 µm. After discretization, the rotation error data are rounded to the nearest 0.5. Therefore, there are 20 types of rotation error and the number class is set to 20. Two datasets of each class for a total of 40 datasets were selected for the experiment; each dataset contained 200,000 × 3 (×3 means 3 sensors) raw data points. The former 70% of data points were selected as the training data, while the last 30% were used as testing data. In this paper, the sample length was 1024 data points and the shift length [61] was 320 data points. Hence, the total number of samples in each class was 870 and the total number of training samples was 20 × 870. Data augmentation was not used in the testing set, and the total number of testing samples was 20 × 116.

Case 2: The Fault Diagnosis of Bearing
The bearing datasets contained 12 vibration sub-datasets, and the number class was 12. Vibration signals were collected at the three speeds of 600 r/min, 800 r/min, and 1000 r/min. Each working condition contained one normal state and three fault modes, which include rolling elements, outer rings, and inner rings. Detailed descriptions of the bearing datasets are shown in Table 5. The detailed description of the experimental platform can be found in reference [62]. After data preprocessing, the total number of training samples was 7038 and the total number of testing samples was 1764.

Case 3: The Fault Diagnosis of Gear
The gear datasets contained 20 vibration sub datasets and the number class was 20. Vibration signals were collected in two kinds of working conditions. Each working condition contained two normal states and eight fault modes. Detailed information on the gear datasets is displayed in Table 6. A detailed description of the experimental platform can be found in the reference [63]. In particular, for this gear drivetrain diagnostics simulator datasets, Gaussian noise with a signal-to-noise ratio of 1 dB is added to the vibration signal [64]. After data preprocessing, the total number of training samples was 16,380 and the total number of testing samples was 4100.

Experiment and Result Analysis
This part focuses on the performance of different DL methods rather than the setting of the hyperparameters. In the experiment, different methods in the same datasets used the same hyperparameters. The learning rate was 0.001 for spindle datasets. The learning rate was 0.001 from 0 to 29 epochs, 0.0001 from 30 to 59 epochs, and 0.00001 in the last 40 epochs for the bearing and gear datasets. The mini-batch size was 64 and the max number of epochs was 100. In the process of model training, Adam was used as the optimizer. Momentum is an important parameter of Adam and can accelerate the training process; it was set to 0.9 following the setup seen in [65].
The experiments were conducted on a system with i7 CPU @3.80 GHz and NVIDIA GeForce RTX 2060 SUPER. Python was used as a programming environment with pytorch. In the experiment, the maximum accuracy across all the epochs was chosen as the testing accuracy. In this experiment, we evaluated the accuracy when using different signal processing procedures and DL algorithms. In order to reduce the impact of the randomness, five trials were carried out for each experiment. The average performance of these methods, including LeNet, CNN, MSCNN, ResNet10, ResNet18, long short-time memory network (LSTM), and bidirectional LSTM (BiLSTM) [40], are shown in Tables 7-9 and Figures 9-11, in which the horizontal axis represents the combination of different DL algorithms and signal processing, the vertical axis represents the average accuracy, the histogram and black numbers represent the average accuracy value and the red box chart shows the dispersion at five times. STFT and ResNet obtained the best accuracy values of 92.08%, 95.12%, and 94.56%, respectively, in all three datasets.       Figure 11. The results of the gear datasets.
As can be seen from Tables 7-9 and Figures 9-11, STFT always obtains the best accuracy in three datasets, the overall average accuracy is STFT > WPD > DMT. STFT and WPD can be used to gain the time-frequency characteristics of the data, and their accuracy is significantly higher than that of DMT. This result proves the necessity of the proposed PEM layer. The reason why the accuracy of STFT is higher than that of WPD is that the matter of how to choose the optimal wavelet basis function and wavelet decomposition level is challenging. For example, in the experiment, it was found that the accuracy of the CNN in spindle datasets was 72.24% for five-level wavelet decomposition under a DB1 wavelet basis function and 88.21% for six-level wavelet decomposition under a DB25 wavelet basis function, which represents an improvement of 15.97%.
At the same time, we can see that the ResNet can always obtain the best accuracy. The order for the accuracy of the proposed hybrid modeling in three datasets with STFT is ResNet > MSCNN > BiLSTM > CNN > LeNet. Compared with other methods, the accuracy of LeNet is the lowest. As the depth of LeNet is the shallowest and the number of convolution channels here is the lowest, the features contained in the signal cannot be extracted completely. CNN has more channels than LeNet and can extract more feature information, so the accuracy of CNN is greater than LeNet. The BiLSTM designed in reference [38] can not only extract spatial feature information through convolution, but also extract temporal information. The accuracy of BiLSTM is greater than CNN. BiLSTM can extract bidirectional temporal features, and its accuracy is greater than LSTM. MSCNN can extract multi-scale features, which significantly improves the feature extraction ability, and the performance of MSCNN is better than that of BiLSTM, CNN and LeNet. With the network depth increases, ResNet can overcome the phenomena of gradient disappearance and explosion, which makes the network easy to train; thus, ResNet has the best accuracy. In the spindle and bearing datasets, ResNet18 performs better than ResNet10. In gear datasets, ResNet10 performs better than ResNet18. This may be due to the slight overfitting of the gear datasets.
Through the above analysis, the method proposed in this paper successfully monitors the condition of CTC in SPL. The experimental results of STFT, WPD, DMT and different DL algorithms show the necessity of CPS hybrid modeling and the feasibility of algorithm implementation.
To better understand the classification effect of these models for each label, Figure 12 shows the confusion matrices of LeNet, CNN, LSTM, BiLSTM, MSCNN, and ResNet18 in bearing datasets with STFT signal processing. In Figure 12, it is evident that almost all the categories in the ResNet18 model are easier to diagnose than those of other models, except for label 4, where ResNet18 is slightly less easy to diagnose than MSCNN.

Discussions and Future Works
(1) Modeling Based on PE Vibration data have the characteristics of high dimensionality, nonlinearity, and diversity. Signal processing is the core of the PEM layer. Other signal processing methods, including empirical mode decomposition, local mean decomposition, and Hilbert-Huang transform, can extract discriminative features from different domains, and those feature extraction methods are worth studying. In the PEM layer, the question of how to extract valuable features from noisy data and extract a sufficient number of features from imbalanced data are two problems that need to be considered. CTC often operates under harsh working environments, and the vibration signal collected often contains irregular noise. Feature information can easily be annihilated by strong background noise. At the same time, it was found that the distribution between each category of samples was imbalanced in our experiment. In the context of CPS hybrid modeling, there is still a great amount of progress to be made in the feature extraction of noisy and imbalanced data.
(2) Modeling Based on data Whether the data-driven CPS framework can be successfully deployed in the SPL depends on the quality of the datasets used. The more labeled multi-sensor data are available, the better the accuracy will be. In fact, the use of data with less labels limits the application of DL in the CPS. The other problem is represented by limited data. A major challenge in this CPS framework is in obtaining a sufficient amount of training data, which is a timeconsuming, laborious, and costly process. The use of semi-supervised or unsupervised DL for less-labeled data and few-shot learning with a meta-learning paradigm for limited data are techniques that have been widely used in the field of computer vision. These methods are worth using in attempts to solve the above problems.
(3) Integrating digital twin with CPS CPS improves the communication between physical and cyber space. In a digital twin, a high-fidelity digital copy is built through big data and physical models in physical systems to provide services for CPS monitoring, analysis, decision making, and feedback to physical entities. The question of how to integrate digital twins with CPS for the condition monitoring of CTC in SPL would be an interesting research direction to explore in the future.
(4) Infusing 5G into CPS Various industrial Ethernet and fieldbus technologies are still the main data transmission technologies used in CPSs. Some wireless transmission technologies, such as Bluetooth and ZigBe, etc., have also been applied to CPS. 5G promotes the development of wireless communication technology. Its main features are enhanced mobile broadband, low latency, high reliability, and mass machine communication. 5G can effectively meet the needs of CPS for large-scale data acquisition and sensing, precise control, remote processing. The question of how to deploy 5G in all layers of CPS, especially in the smart connection layer, should be addressed in future research.

Conclusions
In this paper, CPS hybrid modeling was investigated and a novel five-layer architecture which included a PEM layer and an MLM layer was proposed for CTC condition monitoring in SPL. The CPS framework integrated various DL-based algorithms. Firstly, the multi-sensor data of the SPL monitoring CTC were collected via a smart connection layer; then, a large amount of low-value data was transformed into high-value feature information through the PEM layer. After this, the feature information was used to train and evaluate the DL model through the MLM layer; visualization techniques helped us to make more effective and scientific decisions via the cognition layer. Finally, the decision information was transformed into predictive maintenance activities through a configuration layer. Therefore, the CPS framework was able to realize a closed-loop workflow and reduce the potential for failures that could impact SPL operations. The effectiveness and predictive power of the developed CPS framework, hybrid modeling, and algorithm implementation were verified by the DL and signal processing methods. The proposed framework and approach could be generalized in order to predict the RUL, product quality, etc. from scratch with the integration of additional sensors and a training DL model, which would lay the foundation for framework extensions.  Data Availability Statement: All data generated or analyzed during this study are included in this article.

Conflicts of Interest:
The authors declare no conflict of interest.