Pruning Quantized Unsupervised Meta-Learning DegradingNet Solution for Industrial Equipment and Semiconductor Process Anomaly Detection and Prediction

: Machine-and deep-learning methods are used for industrial applications in prognostics and health management (PHM) for semiconductor processing and equipment anomaly detection to achieve proactive equipment maintenance and prevent process interruptions or equipment downtime. This study proposes a Pruning Quantized Unsupervised Meta-learning DegradingNet Solution (PQUM-DNS) for the fast training and retraining of new equipment or processes with limited data for anomaly detection and the prediction of various equipment and process conditions. This study utilizes real data from a factory chiller host motor, the Paderborn current and vibration open dataset, and the SECOM semiconductor open dataset to conduct experimental simulations, calculate the average value, and obtain the results. Compared to conventional deep autoencoders, PQUM-DNS reduces the average data volume required for rapid training and retraining by about 75% with similar AUC. The average RMSE of the predictive degradation degree is 0.037 for Holt–Winters, and the model size is reduced by about 60% through pruning and quantization which can be realized by edge devices, such as Raspberry Pi. This makes the proposed PQUM-DNS very suitable for intelligent equipment management and maintenance in industrial applications.


Introduction
Research related to prognostics and health management (PHM) indicates that anomaly detection and prediction are important approaches to monitoring equipment faults and semiconductor process abnormalities.However, this type of detection often relies on subjective assessments performed by operators with prior experience.Automating the detection of equipment faults or semiconductor process anomalies is essential for reliable predictive maintenance and can potentially eliminate the need for manual monitoring.Moreover, interconnected intelligent monitoring systems play a crucial role in Industry 4.0, which focuses on artificial intelligence (AI)-driven factory automation.
The related techniques include dense autoencoders (AEs) [11], convolutional AEs [12], and pretrained convolutional neural networks [13].Although these deep-learning approaches exhibit excellent performance in anomaly detection, their widespread adoption in real factory settings remains limited.One major reason for the slow adoption is the high computational resource requirements of many deep-learning-driven anomaly detection methods.These methods often lack practical considerations and system integration, ignoring factors such as multi-machine or multi-production line deployment, model retraining, optimization, and systematic integration.Consequently, their applicability in real factory environments is hindered.
Many studies have been conducted in the field of PHM.Pradeep et al. proposed that machine-learning techniques could be used to predict wafer defects with a random forest classifier, achieving an accuracy of over 93.62% [14].This predictive maintenance approach enhanced the semiconductor manufacturing productivity.Nuhu et al. introduced synthetic data generation techniques that combined two missing value imputation methods and feature selection techniques [15].This approach achieved an accuracy ranging from 99.5% to 100% when paired with the proposed machine-learning (ML) methods.Mao et al. introduced a novel deep AE (DAE) method that fused discriminative information with a gradient descent optimization approach [16].This technique enhanced the numerical stability of the model in cases with limited training data.Abbasi et al. presented a series of highly compact deep convolutional AE network architectures that reduced the model size while maintaining a detection accuracy comparable to that of structures with over four million parameters [17].Givnan et al. proposed an ML method for modeling and detecting anomalies during the operation of rotating machinery.This ML approach learned and generalized based on the fault severity to generate threshold values for anomaly detection [18].
A DAE model specifically designed for factory scenarios involving chillers was introduced that effectively distinguished between normal and abnormal vibration signals based on reconstruction differences [19].Additionally, meta-learning was employed to improve the accuracy of the new sensor models with limited vibration data.For the new sensor model with fewer vibration data, the accuracy increased by about 33.50%.However, this method is mainly oriented to anomaly detection; it has not yet considered model retraining, anomaly prediction, lightweight models, edge computing, and integration with a complete intelligent management system.
Considering the aforementioned issues, this study proposes a Pruning Quantized Unsupervised Meta-learning DegradingNet System (PQUM-DNS) to address the realworld conditions of practical factories.This approach integrates five key features based on the actual needs of factories as illustrated in Figure 1.This approach rapidly establishes models for new machines or production lines with limited data by leveraging meta-learning and unsupervised learning through AEs, thereby achieving anomaly detection and prediction objectives.
(3) Meta-learning Adaptive Model Retraining (Sections 3.3 and 4.3) Machine-specific models are adaptively retrained by employing meta-learning, quickly adjusting to the slow-changing characteristics of the machine over time and enabling longterm anomaly detection and prediction.
(4) Lightweight AI Models (Sections 3.4 and 4.4) The proposed pruning and quantization compression model significantly reduces model size and conserves computational resources.
(5) Edge Device Computation (Sections 3.5 and 4.5) Substituting traditional AI inference engines (IPC) with embedded Raspberry Pi systems enhances lightweight deployment, resource savings, cost reduction, and largescale deployment feasibility.

Related Study
The following fundamental hardware, software techniques, and specifications are commonly used in semiconductor process and industrial scenarios for intelligent equipment management.

Vibration Signal Data Acquisition
(1) Device Sensors One study collected real-time vibration data from equipment in an actual factory using an Advantech WISE-2410 LoRaWAN wireless sensor that integrated an ARM™ Cortex-M4 processor, LoRa transceiver, three-axis accelerometer, and temperature sensor [20].It operated within a temperature range of −20 • C to 85 • C and was powered via USB.
(2) Vibration Data Feature Transformation The raw vibration signals received from the sensors undergo feature transformation.The transformed features are critical for determining the vibration state and mainly include values such as velocity root mean square (RMS), acceleration RMS, acceleration peak, displacement kurtosis, displacement crest factor, displacement skewness, displacement peak to peak, and displacement deviation.

Vibration Signal Data of Chiller
Chillers are primarily utilized to build air conditioning systems.Chiller motors drive compressors and facilitate the exchange of heat and cold.In this study, vibration sensors were installed on a chiller motor to detect vibration values, as shown in Figure 2. Key vibration characteristics representing actual vibration measurements in a factory field were obtained through feature transformation [19].This study refers to ISO10816 [22] as shown in Figure 3 to determine the normal and abnormal vibration data of the chiller host motor.Results may vary depending on the size of the equipment where the sensor is installed.

Paderborn University Bearing Dataset
A condition-monitoring experimental bearing dataset based on vibration and motor current signals was used from Germany's Paderborn University, as shown in Figure 4 [23].Experimental datasets were generated by installing different types of damaged ball bearings in the bearing test module.The setup consisted of healthy bearings with 1920 and 1600 entries for outer and inner ring damage, respectively.The most common bearing damage analysis [24] involves using motor current signals (MCSs) to convert time-domain signals into frequency-domain signals to observe the spectrum differences between normal and abnormal bearings as shown in Figure 5.
Feature engineering was applied to extract features from raw data, including common statistical, signal factor-related, fast Fourier transform (FFT), power spectral density (PSD)related, and wavelet packet decomposition (WPD)-related features, as listed in Table 2.

SECOM Semiconductor Analysis Dataset
The semiconductor manufacturing process utilizes semiconductor wafers as substrates and processes them through a series of steps.The main steps include: cleaning the wafer, depositing the film, cleaning after film formation, exposure, development, etching, inserting impurities, generating semiconductor properties, activation, assembly, and packaging.At each stage, sensors measure relevant parameters, including film thickness, size, resistance, temperature, etc.Through data analysis and machine learning, predictive maintenance, fault detection, process monitoring, and yield improvement are performed.These methods can also extract useful information from limited event records.Large amounts of data help solve predictive maintenance issues and build fault detection and diagnosis models.Through timely fault detection and diagnosis, downtime is reduced, costs are lowered, and product quality is enhanced [25].
Data were collected from a complex modern SECOM semiconductor manufacturing process that was under consistent surveillance through the monitoring of signals/variables collected from sensors or process measurement points.The dataset contained 1567 examples, 591 features, and 104 failures.Random forests were used to extract the 16 most important features for subsequent analysis to filter the key features because the SECOM dataset contained 591 features, as shown in Figure 6 [26][27][28].For the SECOM dataset, the target column "−1" corresponds to a pass and "1" corresponds to a fail.

Data Preprocessing
Data often encounter issues in practical scenarios, such as incomplete or missing data, noise, and outliers.These issues can disrupt the proper functioning of models.Hence, data preprocessing is essential for adjusting and manipulating data before applying analytical algorithms and preventing inaccurate judgments owing to flaws in the data.In this context, missing value deletion and statistical analysis were utilized to identify and eliminate values exceeding three standard deviations.This resulted in cleaner data, which facilitated subsequent analyses.

Equipment Process Degradation Level
The algorithm employed was an AE, which is a technique based on unsupervised learning.This approach involves computing the root mean square error (RMSE) by comparing the reconstructed output values of the model with the numerical input values.The calculated RMSE serves as an indicator of the equipment process degradation level, where smaller and larger values indicate healthier and poorer states, respectively [7,8].

Equipment Process Data Storage
A real-time PostgreSQL database was used to store the data collected from the sensors, including sensor names, equipment names, sensor registration times, operational data, and access to AI model training and inference.The inference results were stored and the database offered interfaces for other platforms to access the required data.

Area under Curve
Area under curve (AUC) is the area under the receiver operating characteristics (ROC) curve.AUC is a widely used evaluation metric in ML that assesses the performance of binary or multi-class classifiers [6,29].Values range from 0 to 1, with a higher AUC value indicating better classifier performance.A notable advantage of AUC is its immunity to threshold variations, providing a comprehensive evaluation across different thresholds.This attribute makes AUC particularly robust in scenarios with imbalanced datasets.

Intelligent Equipment Management System
The proposed PQUM-DNS was integrated into an intelligent equipment management system for practical field applications.Signals were initially collected from the device sensors within the system before feature extraction transformation and data preprocessing, as shown in Figure 8.The feature extraction transformation involved converting raw vibration feature values into multiple key parameters related to machine health.Data preprocessing involved eliminating unnecessary empty and abnormal values to retain only the normal values required for unsupervised learning.The processed data were stored in a database (PostgreSQL) until there were sufficient accumulated data (e.g., 3000 records, adjustable).Subsequently, both AE and Holt-Winters algorithms were used to train the anomaly detection and prediction models, respectively.An inference was performed using anomaly detection and prediction models to generate results that indicated the degree of equipment or process degradation.The inference results were stored in a database and abnormal detection and prediction outcomes were sent to the visual dashboard of the intelligent management system, providing users with insights into the machine conditions.The system issued alert notifications to relevant personnel if the results of the abnormal detection and prediction exceeded the threshold of the AI equipment or process degradation.When retraining the model with a small amount of data using PQUM-DNS, the anomaly detection model can adapt after the first training of the Pretrain and Metatrain models, and then use a small amount of data to fine-tune retraining to obtain the new model.

Meta-Learning for Rapid Training of Multi-Machine Models for Anomaly Detection and Prediction
The rapid training of meta-learning multi-machine models for anomaly detection and prediction used in the PQUM-DNS was introduced in a previous study [19].This method utilizes abundant data from numerous machines, trains the Metatrain model using the AE + meta-learning approach, and fine-tunes the Metatrain model with a small amount of data from new machines.This process yields a model adapted to a new machine, facilitating the inference for anomaly detection.In this study, the anomaly detection results were combined with prediction models to forecast future anomalies based on past anomaly detection outcomes.The following provides a brief introduction to the techniques employed.
(1) Meta-learning Meta-learning is a technique aimed at enabling machine-learning systems to swiftly adapt to new tasks or environments [4].Traditional machine-learning algorithms often require large amounts of labeled data to train models.In addition, it is necessary to collect and label substantial data for retraining when faced with new tasks.In contrast, the goal of meta-learning is to train a "learner" that is capable of rapidly learning new tasks from a small amount of labeled data.This approach typically relies on prior experience with numerous similar tasks and applies this experience to new tasks.These tasks can be expressed as: where where Ti is a (meta-learning) task.
An illustration of the meta-learning method is shown in Figure 9 [31].Various meta-learning models have been proposed for deep learning that are generally categorized as follows: learning good weight initializations, metamodels that generate the parameters of other models, and learning transferable optimizers.Model-agnostic meta-learning belongs in the first category and learns a good initial weight initialization to achieve fast adaptation to new tasks, enabling rapid convergence and fine-tuning of small-scale training samples [32].
(2) Anomaly Detection The AI degradation level index was used to detect anomalies and the model was established using an unsupervised AE algorithm, as shown in Figure 10.This method involves constructing a model with normal data and applying an AE to compute the root mean square error (RMSE) between the input and output data, referred to as the reconstruction error.Here, the reconstruction error was defined as the AI degradation level index.A smaller value indicates a closer alignment between the model input and output values, leading to a better data reconstruction capability and a higher likelihood of normal equipment or processes.In practical applications, suitable threshold values are defined based on the conditions of the equipment or processes used for anomaly determination.These are expressed as: Minimize RMSE X, X ′ (3) Anomaly Prediction Time-series algorithms can use historical data from the past to predict future trends.In this study, historical records of the health status of semiconductor manufacturing processes and machine equipment in the past can be used to predict future health status using time-series algorithms.The proposed anomaly detection models trained on the anomaly detection dataset could only detect current and past data anomalies.Therefore, an additional anomaly prediction model was required for future anomaly states.The historical data of the AI degradation level index obtained from anomaly detection were compared with various commonly used prediction models.Finally, the Holt-Winters algorithm was applied to the anomaly prediction model [7].
Various anomaly prediction algorithms are introduced below and the overview is in Table 3.The historical data of the variable itself are used to predict its own data, and the autoregression must meet the requirements of stationarity Moving Average (MA) A simple smoothing forecasting technique that calculates the sequence average of a certain number of items in turn according to the time-series data and the passage of time items to reflect the long-term trend Autoregressive integrated moving average (ARIMA) In the case analysis of non-stationary time series, the originally non-stationary time series becomes a stationary time series after many differences Seasonal ARIMA (SARIMA) ARIMA (differentially integrated moving average autoregressive) time-series-forecasting method with seasonal periodicity (a) Simple Exponential Smoothing (SES) This algorithm is used when there is no clear trend or seasonal pattern in the predictive data [33,34].The prediction is calculated using weighted averages, meaning that the largest and smallest weights are associated with the most recently and least recently observed values, respectively.This is expressed as: where ŷT+1 | T denotes the one-step-ahead forecast for time T + 1, y T denotes the most recent observation, and 0 ≤ α ≤ 1 denotes the smoothing parameter.
(b) Holt (Double Exponential Smoothing Method) The Holt double exponential smoothing method is an extension of the simple exponential smoothing method that predicts trends in data [9].This method is suitable for linear trending sequences without seasonal patterns and consists of one prediction and two smoothing equations that represent the level and trend components (l t , b t ), which are respectively expressed as: ŷt+h|t = l t + hb t (10) where y t and l t denote the observed value and level at time t, respectively.b t , h, α, and β * denote the trend at time t, weight for the level (0 ≤ α ≤ 1), and weight for the trend (0 ≤ β * ≤ 1), respectively.
(c) Holt-Winters Forecasting (Triple Exponential Smoothing) Holt-Winters forecasting, also known as triple exponential smoothing, is a method used to predict the behavior of time-series data that includes trends and seasonality.This algorithm considers three factors: the level l t , trend b t , and seasonal component s t .It is effective for forecasting time-series data with seasonality patterns.There are two variations of this method: the additive and multiplicative models.
In the additive model, the components are expressed as: In the multiplicative model, the components are expressed as: where s t , k, m, α, β * , and y denote the season at time t, integer part of (h − 1)/m, number of cycles/frequency of seasonality (e.g., four for quarterly data), level smoothing parameter, trend smoothing parameter, and seasonal smoothing parameters/weight for the season (0 ≤ γ ≤ 1), respectively.
In the additive model, the forecast value for each data element is the sum of the baseline, trend, and seasonality components.However, a multiplicative model is preferred when seasonal variations change proportionally to the level of the series.

(d) Autoregressive Model
The autoregressive (AR) model is a statistical method used to analyze time-series data that predict the future value of a variable using its own historical data [35].AR is an evolution of linear regression analysis, where it analyzes the relationship between parameter x and its own past value instead of analyzing the relationship between the parameter x and dependent variable y.This is expressed as: where y t , C, p, ϕ i , and ε t denote the stationary time series, constant term, autoregressive order, non-zero autocorrelation coefficients, and independent error term, respectively.
(e) Moving Average Moving average (MA) is a simple smoothing prediction technique used for time-series data that calculates a moving average over a certain number of terms to reflect long-term trends [36].However, it is difficult to discern the development trend when time-series data are influenced by periodic and random variations causing large fluctuations.Using MAs can eliminate these influences and reveal the direction and trend of the events, which is expressed as: where y t , µ, q, and θ i denote the stationary time series, mean of the sequence, moving average order, and non-zero autocorrelation coefficients, respectively.
(f) Autoregressive Integrated Moving Average Autoregressive integrated moving average (ARIMA) is an evolution of the AR, MA, and autoregressive moving average models.This approach is used to analyze nonstationary time-series data by transforming them into stationary data through differencing [37].This method is employed when dealing with non-stationary time-series data that exhibit a changing mean and variance over time.A new stationary time series can be obtained by using the differences in the data, and a suitable probabilistic model can be derived from historical data to represent the dependence between time and data.ARIMA can be expressed as ARIMA (p, d, q), where p, d, and q denote the autoregressive, differencing, and moving average orders, respectively.Furthermore: where ϕ i and θ i denote non-zero autocorrelation coefficients.

(g) Seasonal Autoregressive Integrated Moving Average
The seasonal autoregressive integrated moving average (SARIMA) model incorporates seasonal factors into the ARIMA model [38].Generally, the SARIMA model is denoted as SARIMA (p, q, d)(P, Q, D)s, where s, P, Q, D denote the seasonal period and seasonal autoregressive, seasonal moving average, and seasonal differencing orders, respectively.This is expressed as: where B denotes the lag operator, and Φ i and Θ i denote non-zero constants.

Meta-Learning Adaptive Model Retraining
Meta-learning was employed to develop an adaptive method for retraining machinespecific models [4].This approach enables models to quickly adjust to the slow changes observed in each machine over time, thereby achieving prolonged anomaly detection.The concept of rapid training models for different machine devices was extended to different time segments by combining the background technique of AEs with meta-learning.
The data were segmented into three intervals based on chronological order: Pretrain (older and long-running machine data), Metatrain (newer and long-running machine data), and Fine-tune (latest operational data).Leveraging the principles of meta-learning, Pretrain and Metatrain data were used to train a generalized anomaly detection model, whereas Fine-tune adapted the model to the most recent machine conditions.This approach automatically trains models that adapt to data changes over time.
Model retraining was based on the operational conditions of the factory to maintain the effectiveness of the anomaly detection model.The historical data were also divided into Pretrain, Metatrain, and Fine-tune segments by utilizing seven days of equipment operation data, as shown in Figure 11.This process led to the training of an anomaly detection model that could assess the degree of equipment degradation, as shown in Figure 12.Additionally, predictive algorithms were applied to forecast equipment degradation over the next seven days.The ultimate goal was to achieve continuous automatic updates for anomaly detection and prediction.

Lightweight AI Model
A pruning and quantization-based meta-learning anomaly detection model was introduced owing to the possibility of certain neural neuron weights being small or negligible during the retraining process of the neural network of the proposed model.This approach significantly reduced the model size and enhanced the computational speed.
(1) Model Pruning Deep-learning neural network models often contain redundant parameters, with many neuron weights approaching zero.Model pruning involves removing these neurons while preserving the same model expressive capability.Model pruning retains the essential weights and parameters, reducing the number of connections between the neural network layers, as shown in Figure 13 [39].This reduction helps to decrease the number of parameters involved in the calculations, thereby lowering the computation requirements.By maintaining the performance of the model, this approach reduces the storage space, lowers computational costs, and accelerates the training process.Pruning algorithms typically employ a three-stage pipeline: training, pruning, and fine-tuning.The weight adjustment process in the three-step training pipeline for pruning is shown in Figure 14.The model's weights are trained, pruning techniques are applied to remove neurons with weights approaching zero, and the model is fine-tuned to adjust the remaining weights to approximate the performance of the original model.This iterative process helps align the performance of the model with that of the original model.(2) Model Quantization The principle of quantization involves reducing the precision of the bits used to represent model parameters (typically 32-bit floating-point (float32) numbers) [40].This approach results in smaller model sizes and faster computations.Model quantization involves approximating the continuous values (or a large number of possible discrete values) of the floating-point model weights with a limited set of discrete values (usually 8-bit integer (int8) numbers) at a lower inference accuracy loss, as shown in Figure 15 [40].A lower-bit data type is used to approximate the finite-range floating-point data, which leads to a reduced model size, decreased memory consumption, and faster inference speed.The calculations are expressed as: where R, Q, z, and s denote the real floating-point value, fixed-point value after quantization, fixed-point value after quantization of the 0 floating-point value, and minimum scale that can be represented after fixed-point quantization, respectively.The model-pruning approach is applied using weight sparsity, where weights close to zero are removed from the original model.Subsequently, the model is retrained to adjust its performance.Additionally, quantization techniques are combined to convert the weights from float32 to int8, thereby significantly reducing the model size and enhancing the computational speed.

Edge Device Computing
The pruned and quantized models are deployed on embedded systems, such as a Raspberry Pi, replacing traditional IPCs.This lightweight approach conserves resources, reduces costs, and facilitates a large-scale deployment.Edge devices can analyze data and promptly alert onsite maintenance personnel in industrial scenarios where certain equipment is located in hard-to-reach locations [10,18].This enables real-time responsiveness to the equipment conditions, allowing immediate intervention and preventing downtime.
Benefits of edge computing are as follows: • Provides rapid real-time reflection of situations, enabling onsite personnel to detect anomalies promptly and take immediate action.

•
Solves bandwidth issues in cloud and edge transmissions because edge devices only need to send inference results back to the control centers.• Addresses cybersecurity concerns, protecting against network attacks that could lead to factory shutdowns.

•
Reduces energy consumption because lightweight edge-computing models conserve power.

Intelligent Equipment Management System
Automated methods are used for vibration signal sensing, data transmission, data preprocessing, model training, and retraining.The results are visually presented through dashboards.The system utilizes AI degradation level values to detect anomalies and sends warning notifications for timely handling by managers and onsite personnel.A comparison between this system and traditional methods is presented in Table 4.A manager will only be notified of a situation by onsite personnel when an abnormality occurs in the equipment or the production line stops, thus not dealing with the situation in a more timely manner.

Meta-learning anomaly detection and prediction
Apply meta-learning to quickly train AI models for automatic detection and prediction of new equipment anomalies for preventive equipment maintenance.
New machine models require a great deal of data to train, personnel need to confirm the condition of the equipment from time to time, and preventative maintenance cannot be performed in advance.

Meta-learning adaptive modeling with retraining
Meta-learning can be used to quickly adapt to the characteristics of machines that change slowly over time, thus realizing the purpose of model updating over a long period of time.
The model is retrained by AI analysts when an abnormality occurs in the model, which is labor-intensive and increases risk to the equipment.

Lightweight quantitative AI models
Dramatically reduces the size of the model and increases the speed of the operation.
Larger models consume more hardware space for storage and run more slowly.

Edge computing
It can be lightweight, save resources, reduce cost, and achieve the purpose of large-scale parts.
Larger PC computing devices are bulky, heavy, costly, energy-intensive, lack mobility, and are difficult to deploy on a large scale.

Meta-Learning for Rapid Training of Multi-Machine Models for Anomaly Detection and Prediction
For the sake of versatility, meta-learning is applied to multi-machine anomaly detection and prediction models with different datasets, such as factory chiller vibration data, publicly available datasets for analyzing the vibration and current of Paderborn bearings, and SECOM semiconductor analysis. (

1) Rapid Training of Multi-Machine Models for Anomaly Detection
The proposed PQUM-DNS method is compared with the general DAE method.PQUM-DNS achieves similar AUC values with minimal data compared to those of the DAE method when used to train models for new machines.This is because PQUM-DNS used meta-learning to train a versatile model applicable to various conditions (Metatrain model).Therefore, training a new machine model only requires a small amount of data for fine-tuning, resulting in a rapidly adaptable anomaly detection model.Test results for different data types, such as chiller vibration, SECOM, and Paderborn current and vibration datasets are presented in Table 5 and Figure 16.Compared with the DAE, PQUM-DNS reduces the required data for training new machine models on average by approximately 75%, with a decrease in AUC performance of only 0.35%, which is a very close AUC value.(2) Anomaly Prediction: PQUM-DNS detects machine degradation levels through anomaly detection and evaluated these levels using various prediction algorithms.Seven prediction algorithms are compared and one based on RMSE calculations is selected based on the predicted and actual values.The performances of these algorithms on different datasets are presented in Table 6 and Figure 17.The Holt-Winters algorithm demonstrates the best performance, with the lowest RMSE value of approximately 0.037, making it the chosen algorithm for anomaly prediction in the PQUM-DNS.

Meta-Learning Adaptive Model Retraining
An adaptive method is employed for retraining the machine model using metalearning.This enabled the model to adapt quickly to gradual changes over time, thereby facilitating long-term anomaly detection.PQUM-DNS chronologically segments data from the same machine and fine-tunes the model using the latest data, thereby achieving a model suited to the machine's latest condition.Unlike the general DAE method that requires retraining with all data, PQUM-DNS significantly reduces the amount of data needed for retraining, as shown in Table 7.This is because PQUM-DNS already trains a versatile meta-learning model using past data (a meta-trained model), enabling efficient fine-tuning with a small amount of new machine data.

Lightweight AI Model
PQUM-DNS drastically reduces the model size while maintaining a similar AUC performance to that of the non-lightweight DAE model.This is achieved by removing the near-zero weights from the original DAE-trained model and compressing the model data format from float32 to int8.Consequently, the model size is significantly reduced.The application of PQUM-DNS to various datasets demonstrates that the lightweight model size is reduced by approximately 60% with AUC performance maintained at similar levels, as shown in Table 8 and Figure 18.

Edge Device Computing
The PQUM-DNS, with its reduced and compressed model size, is well suited for lightweight edge-computing devices.Therefore, it is applied to replace traditional IPCs with embedded systems, such as a Raspberry Pi.This substitution reduces the size and weight, conserves resources, lowers costs, and supports large-scale deployments.

Conclusions
This study proposes a new PQUM-DNS model, which is an intelligent device management system that combines pruning, quantization, meta-learning, anomaly detection, prediction using AEs, adaptive model retraining, and edge inference.This system effectively reduces the manual labor, provides fault notifications, prevents downtime, decreases model computational resources, accelerates the model inference speed, and enables edge inference.
The system is suitable for various factory scenarios and types of machine equipment and process states.Compared with general DAEs, the system achieves a similar AUC while reducing the training data by approximately 75%.The average RMSE of the predictive degradation degree is 0.037 for Holt-Winters, retraining is conducted using 75% fewer data with similar AUC performance, and the model size is reduced by approximately 60% through pruning and quantization.The proposed system can be deployed on lightweight edge devices, such as a Raspberry Pi, enabling real-time anomaly detection and prediction.The system demonstrates superior performance, thereby realizing intelligent equipment management and maintenance.

Figure 1 .
Figure 1.Overview of the proposed Intelligent Equipment Management System with related techniques highlighted in this paper.

( 1 )
Intelligent Equipment Management System (Sections 3.1 and 4.1) This system includes automated methods for vibration signal sensing, data transmission, data preprocessing, model training and retraining, anomaly detection, and prediction.Visual results are presented through dashboards and alert notifications are sent to onsite personnel and managers to facilitate timely problem solutions.(2) Meta-learning for Rapid Training of Anomaly Detection and Prediction Models across Multiple Machines (Sections 3.2 and 4.2)

Figure 2 .
Figure 2. Vibration sensor (in the red square) installed on the motor of a chiller machine.

Figure 6 .
Figure 6.Extracting 16 key features from the SECOM dataset's 591 features using random forests.

Figure 7
is an introduction to statistical principles related to AUC[30].

Figure 7 .
Figure 7.The results obtained from negative sample (left curve) overlap with the results obtained from positive samples (right curve).By moving (green arrow) the result cutoff value (vertical bar), the rate of false positives (FP) can be decreased, at the cost of raising the number of false negatives (FN), or vice versa (TP = True Positives, TPR = True Positive Rate, FPR = False Positive Rate, TN = True Negatives).

Figure 8 .
Figure 8. Flowchart of the proposed intelligent equipment management system.

Figure 9 .
Figure 9. Illustration of the meta-learning method.

Figure 11 .
Figure 11.Historical data of seven days segmented into Pretrain, Metatrain, and Fine-tune for training an anomaly detection model to detect equipment degradation.

Figure 12 .
Figure 12.Automatic model retraining based on time intervals, with the retrained model used for predicting anomalies in the next seven days.

Figure 13 .
Figure 13.Model pruning retains important weights and parameters while reducing the number of connections between neural network layers.

Figure 14 .
Figure 14.Weight adjustment process in the three-step training pipeline for pruning.

Figure 16 .
Figure 16.Comparison of PQUM-DNS and DAE training data.

Figure 17 .
Figure 17.Comparison of PQUM-DNS prediction algorithm performance on different datasets.

Figure 18 .
Figure 18.Comparison of PQUM-DNS and DAE model sizes for different data types.

Table 1 .
[21]ine condition monitoring (MCM) was used to convert vibration signals into 44 key time-and frequency-domain physical and statistical feature values, as listed in Table 1[21].Key time-and frequency-domain physical and statistical feature values of vibration signals transformed by MCM.Clearance factor, coefficient of variation, crest factor, frequency, impulse factor, kurtosis, local max, local min, max in range, max, mean, median, min, percentile, peak to peak, RMS, shape factor, skewness, standard deviation, variance, X of max, X of min

Table 2 .
Feature values of vibration and current features after feature extraction.

Table 3 .
Overview of time-series algorithms.

Table 4 .
Comparison of intelligent equipment management system and traditional approaches.

Table 5 .
Comparison of PQUM-DNS and DAE training performance for different data types.

Table 6 .
Comparison of PQUM-DNS prediction algorithm performance on different datasets.

Table 7 .
Comparison of PQUM-DNS and DAE retraining performance for different data types.

Table 8 .
Comparison of PQUM-DNS and DAE model sizes for different data types.