Preventing Forklift Front-End Failures: Predicting the Weight Centers of Heavy Objects, Remaining Useful Life Prediction under Abnormal Conditions, and Failure Diagnosis Based on Alarm Rules

This paper addresses the critical challenge of preventing front-end failures in forklifts by addressing the center of gravity, accurate prediction of the remaining useful life (RUL), and efficient fault diagnosis through alarm rules. The study’s significance lies in offering a comprehensive approach to enhancing forklift operational reliability. To achieve this goal, acceleration signals from the forklift’s front-end were collected and processed. Time-domain statistical features were extracted from one-second windows, subsequently refined through an exponentially weighted moving average to mitigate noise. Data augmentation techniques, including AWGN and LSTM autoencoders, were employed. Based on the augmented data, random forest and lightGBM models were used to develop classification models for the weight centers of heavy objects carried by a forklift. Additionally, contextual diagnosis was performed by applying exponentially weighted moving averages to the classification probabilities of the machine learning models. The results indicated that the random forest achieved an accuracy of 0.9563, while lightGBM achieved an accuracy of 0.9566. The acceleration data were collected through experiments to predict forklift failure and RUL, particularly due to repeated forklift use when the centers of heavy objects carried by the forklift were skewed to the right. Time-domain statistical features of the acceleration signals were extracted and used as variables by applying a 20 s window. Subsequently, logistic regression and random forest models were employed to classify the failure stages of the forklifts. The F1 scores (macro) obtained were 0.9790 and 0.9220 for logistic regression and random forest, respectively. Moreover, random forest probabilities for each stage were combined and averaged to generate a degradation curve and determine the failure threshold. The coefficient of the exponential function was calculated using the least squares method on the degradation curve, and an RUL prediction model was developed to predict the failure point. Furthermore, the SHAP algorithm was utilized to identify significant features for classifying the stages. Fault diagnosis using alarm rules was conducted by establishing a threshold derived from the significant features within the normal stage.


Introduction
This paper focuses on the challenge of mitigating durability degradation and accurately predicting failures in forklifts operating under demanding conditions.As classified by the American Society of Mechanical Engineers (ASME), forklifts, powered vehicles used for various material handling tasks, often encounter durability and performance issues while maneuvering heavy loads across logistics warehouses, construction sites, factories, and similar environments.These issues escalate service and maintenance costs, introduce safety hazards, and potentially endanger human lives.To address these challenges, the integration of prognostics and health management (PHM) technologies become essential.PHM, a comprehensive framework leveraging sensors, aims to assess system health, diagnose anomalies, and predict remaining useful life [1].By encompassing condition monitoring, assessment, fault diagnosis, and prediction, PHM optimizes decision making for conditionbased maintenance [2].It necessitates the application of a systematic methodology to select appropriate feature engineering techniques and algorithms tailored to the specific context [3].Notably, the study by Meng et al. [4] presented a comprehensive overview of research trends in PHM, particularly focusing on lithium-ion batteries.Categorizing prediction approaches into physics-based, data-driven, and hybrid categories, their work provides valuable insights into the diverse avenues within the field.
Recognizing the limitations of traditional reliability analysis, which relies on mean time to failure data and failure probability distributions, contemporary efforts are directed towards harnessing sensor-based PHM technologies to overcome these challenges.Previous studies in reliability and fault diagnosis have predominantly focused on statistical analyses under average load conditions or known failure scenarios [5][6][7][8][9].However, the advent of the internet of things (IoT) is driving the transition to sensor-based PHM in a variety of domains as it evolves into a new phase of asset management [10].Physics-based PHMs formalize the complex behavior of equipment and systems to help diagnose and predict failures when mechanical systems exceed predefined physical thresholds.In contrast, sensor-based methodologies have attracted considerable attention due to their ability to proficiently cope with complex troubleshooting scenarios due to their powerful representation and automated feature learning capabilities [11].However, sensor-based PHMs face several challenges.Data collected from multiple sensors are prone to noise during acquisition and transmission.Since each type of failure produces different failure signals, signals and failures may not correspond exactly one-to-one in general [12].The limitations of directly monitoring raw signals in deep-learning-based PHM are recognized, necessitating a transformation process termed feature engineering [12,13].This process encompasses noise filtration, statistical feature extraction, frequency conversion, and context-dependent conditional reduction, thereby enhancing the data's meaningfulness and applicability.
Within the data-driven and deep-learning-based PHM framework, considerable research is dedicated to fault diagnosis and the prediction of remaining useful life.While PHM technologies aim for high fault detection accuracy and predictive capabilities, the practicality of models is also contingent on achieving a lightweight design.Ding et al. [14] introduced a lightweight multiscale convolutional network tailored for bearing fault diagnosis in edge computing scenarios, specifically targeting train bogie bearings.To complement model-based lightweighting efforts, feature-based lightweighting is pursued through dimension reduction.Lee et al. [3] emphasized the importance of effective feature engineering and data dimensionality reduction in PHM.
Addressing the challenge of imbalanced data situations, Zhang et al. [15] proposed an integrated multi-task intelligent bearing fault diagnosis method, leveraging representation learning under unbalanced sample conditions.This approach is particularly relevant given the common scenario of skewed sample ratios between fault and normal data.Furthermore, the scarcity of fault data arising in distinct environments necessitates innovative solutions.To overcome this data scarcity challenge, techniques such as data augmentation via generative adversarial networks (GANs) [16] and sensor signal transformation [17] have been advanced.Numerous studies have contributed comprehensive insights into remaining useful life (RUL) prediction within the PHM process [18][19][20][21][22][23][24][25][26][27][28].RUL prediction encompasses four pivotal technical stages: data collection, construction of health indices, health state segmentation, and RUL prediction [18].These stages synergistically form a comprehensive framework for accurate prediction.
The Industrial Truck Association (ITA) classifies forklifts into eight distinct product families, ranging from Class I to Class VIII.These classifications are based on specific usage characteristics and structural differentiations.For the purpose of this investigation, a PHM Sensors 2023, 23, 7706 3 of 27 study was undertaken, focusing on the front-end structure responsible for lifting heavy loads.This structural component serves as the core element within ITA Class I-type electric counterbalance forklifts.Durability evaluation of forklifts typically involves testing under average loads and standardized conditions, conforming to established standards such as ISO 3691-1, ANSI/ITSDF B56.1, EN 1726-1, and ASME B56.1.However, the applicability of these conventional statistical approaches can be limited in non-standard operational settings where anomalies can arise.In real-world work environments, forklifts often encounter abnormal loading conditions that can result in rollovers or structural failures.Moreover, when latent strength deficiencies accumulate during assembly, the risk of structural failure becomes significantly elevated.
Among the various safety concerns linked with forklifts, the most significant hazard is the occurrence of vehicle rollovers.These incidents commonly occur when maneuvering heavy objects that exceed the forklift's designated capacity or when handling unbalanced or improperly centered loads.Furthermore, the repetitive use of forklifts while carrying unbalanced heavy loads can lead to performance degradation over time.This cumulative deterioration can eventually culminate in safety accidents.This particular failure mode operates outside the established standard and stands as an outlier load condition.It resides beyond the boundaries of average reliability ranges and is, therefore, not explicitly accounted for in typical design considerations.Paradoxically, these abnormal failures frequently occur within real-world operational environments.This inherent discrepancy necessitates the development of abnormal failure prevention technologies.These technologies should extend beyond the confines of existing reliability-based lifetime management approaches.Specifically, there is a need for a PHM technique that leverages machine learning and deep learning methodologies, utilizing sensor data to detect changes in forklift operational states.This enables the accurate classification and prediction of impending failures.Consequently, this study introduces an effective PHM approach designed to mitigate safety accidents stemming from abnormal forklift usage conditions.By addressing these anomalies, the aim is to enhance forklift durability, thereby curbing maintenance costs and fostering a value chain that prioritizes accident prevention.
This paper provides a systematic presentation of forklift failure diagnosis and prediction, aiming to ensure the reliability of the forklift front-end, as illustrated in Figure 1.The following overview outlines the content of each section.

•
In Section 2, vibration data were collected under weight-imbalance conditions, and a subsequent feature engineering process was executed to facilitate a comprehensive classification of the centers of heavy objects.The primary objective of this classification study was to diagnose weight imbalance, a critical factor influencing forklift durability.By addressing this imbalance, the study aimed to proactively prevent factors leading to durability degradation in forklifts, thereby minimizing the risk of vehicle rollover.

•
Continuing in Section 3, vibration and sound data were collected from forklifts operating under abnormal weight-imbalance conditions.Employing another feature engineering process alongside health stage classification, this section further delves into the realm of anomaly detection and classification.

•
Section 4 is devoted to conducting remaining useful life (RUL) analysis and an alarmrule-based fault diagnosis, serving the overarching goal of enhancing operational maintenance.The RUL prediction was based on the probability model established in Section 3, enabling accurate predictions.Additionally, significant features were extracted from both classifiers and features previously generated in Section 3.This information was then employed for an early front-end failure diagnosis through an alarm-rule-based approach.

•
Lastly, Section 5 encapsulates the findings and conclusions, summarizing the contributions derived from the investigation.
tracted from both classifiers and features previously generated in Section 3.This information was then employed for an early front-end failure diagnosis through an alarm-rule-based approach.

•
Lastly, Section 5 encapsulates the findings and conclusions, summarizing the contributions derived from the investigation.

Diagnosing and Classifying the Weight Center of Heavy Objects Carried by Forklifts
This section addresses the diagnosis of weight imbalances, which have a direct impact on the overall durability of forklifts.To achieve this, a structured approach was followed that included several essential steps: data preprocessing, feature engineering, data augmentation, and careful selection and evaluation of appropriate machine learning models.The overall goal was to develop a comprehensive method to detect the center of gravity, as shown in Figure 2.

Diagnosing and Classifying the Weight Center of Heavy Objects Carried by Forklifts
This section addresses the diagnosis of weight imbalances, which have a direct impact on the overall durability of forklifts.To achieve this, a structured approach was followed that included several essential steps: data preprocessing, feature engineering, data augmentation, and careful selection and evaluation of appropriate machine learning models.The overall goal was to develop a comprehensive method to detect the center of gravity, as shown in Figure 2.

•
Lastly, Section 5 encapsulates the findings and conclusions, summarizing the co butions derived from the investigation.

Diagnosing and Classifying the Weight Center of Heavy Objects Carried by Fo lifts
This section addresses the diagnosis of weight imbalances, which have a direc pact on the overall durability of forklifts.To achieve this, a structured approach wa lowed that included several essential steps: data preprocessing, feature engineering, augmentation, and careful selection and evaluation of appropriate machine learning els.The overall goal was to develop a comprehensive method to detect the center of ity, as shown in Figure 2.

Experimental Data Acquisition and Feature Engineering
Vibration (acceleration) data were acquired from the forklift's front-end structure and presented a process to diagnose and classify the weight center of heavy objects carried by forklifts.The front-end structure of the forklift consists of a mast, backrest, carriage, and forks, as shown in Figure 3, and the acceleration signals were measured from the outer beam of the mast.

Experimental Data Acquisition and Feature Engineering
Vibration (acceleration) data were acquired from the forklift's front-end structure and presented a process to diagnose and classify the weight center of heavy objects carried by forklifts.The front-end structure of the forklift consists of a mast, backrest, carriage, and forks, as shown in Figure 3, and the acceleration signals were measured from the outer beam of the mast.In the measurement experiment for data acquisition, the weight center of heavy objects carried by the forklift was measured in three configurations: center, left, and right.The condition segments of the dataset were classified and organized into center, left, and right according to each center of gravity condition.Two embedded devices (one on the left and one on the right) were attached to the front-end structure of the forklift truck to measure the vibration acceleration in three axes (x, y, z) (sampling rate 500 Hz), as shown in Figure 4. Considering the load conditions under which the forklift operates, the operating environments of the two datasets (datasets 1 and 2) were simulated in the experiments while maintaining a state that included ground noise.In dataset 1 (only driving mode), the vehicle was loaded with 3200 kg of weight and traveled 80 m at maximum speed, as shown on the left in Figure 5.All measurements were taken for approximately 20 min for center, left, and right condition segments to eliminate data imbalances within condition segments.In dataset 2 (complex mode), the forklift made a round trip of 80 m, as shown on the right in Figure 5, and added lifting, lowering, and back-and-forth tilting tasks at the end of the trip.In dataset 2, approximately 32 min of data were acquired, which is 12 min longer than in dataset 1.Similarly, approximately 32 min of data were acquired for the center, left, and right condition segments to eliminate the data imbalance.In the measurement experiment for data acquisition, the weight center of heavy objects carried by the forklift was measured in three configurations: center, left, and right.The condition segments of the dataset were classified and organized into center, left, and right according to each center of gravity condition.Two embedded devices (one on the left and one on the right) were attached to the front-end structure of the forklift truck to measure the vibration acceleration in three axes (x, y, z) (sampling rate 500 Hz), as shown in Figure 4. Considering the load conditions under which the forklift operates, the operating environments of the two datasets (datasets 1 and 2) were simulated in the experiments while maintaining a state that included ground noise.In dataset 1 (only driving mode), the vehicle was loaded with 3200 kg of weight and traveled 80 m at maximum speed, as shown on the left in Figure 5.All measurements were taken for approximately 20 min for center, left, and right condition segments to eliminate data imbalances within condition segments.In dataset 2 (complex mode), the forklift made a round trip of 80 m, as shown on the right in Figure 5, and added lifting, lowering, and back-and-forth tilting tasks at the end of the trip.In dataset 2, approximately 32 min of data were acquired, which is 12 min longer than in dataset 1.Similarly, approximately 32 min of data were acquired for the center, left, and right condition segments to eliminate the data imbalance.
Sensors 2023, 23, x FOR PEER REVIEW 5 of

Experimental Data Acquisition and Feature Engineering
Vibration (acceleration) data were acquired from the forklift's front-end structur and presented a process to diagnose and classify the weight center of heavy objects carrie by forklifts.The front-end structure of the forklift consists of a mast, backrest, carriag and forks, as shown in Figure 3, and the acceleration signals were measured from th outer beam of the mast.In the measurement experiment for data acquisition, the weight center of heavy ob jects carried by the forklift was measured in three configurations: center, left, and righ The condition segments of the dataset were classified and organized into center, left, an right according to each center of gravity condition.Two embedded devices (one on th left and one on the right) were attached to the front-end structure of the forklift truck t measure the vibration acceleration in three axes (x, y, z) (sampling rate 500 Hz), as show in Figure 4. Considering the load conditions under which the forklift operates, the ope ating environments of the two datasets (datasets 1 and 2) were simulated in the exper ments while maintaining a state that included ground noise.In dataset 1 (only drivin mode), the vehicle was loaded with 3200 kg of weight and traveled 80 m at maximum speed, as shown on the left in Figure 5.All measurements were taken for approximatel 20 min for center, left, and right condition segments to eliminate data imbalances withi condition segments.In dataset 2 (complex mode), the forklift made a round trip of 80 m as shown on the right in Figure 5, and added lifting, lowering, and back-and-forth tiltin tasks at the end of the trip.In dataset 2, approximately 32 min of data were acquired which is 12 min longer than in dataset 1.Similarly, approximately 32 min of data wer acquired for the center, left, and right condition segments to eliminate the data imbalanc  Six acceleration signals (2 sensors × (x, y, z accelerations)) were included in Data sets 1 and 2. In Dataset 2, for one acceleration signal, 960,000 feature vectors were collected at 500 (Hz) × 32 (min) × 60 (s/min).Given the large data size, which could lead to inefficient analysis, this study referred to previous studies [27,28] to handle the data.Eight features were extracted from each window at one-second intervals: min, max, peak to peak, mean (abs), rms (root mean square), variance, kurtosis, and skewness.In addition, four features were added by combining the max, rms, and mean (abs) features: crest factor, shape factor, impulse factor, and margin factor, as listed in Table 1.Six acceleration signals (2 sensors × (x, y, z accelerations)) were included in Data sets 1 and 2. In Dataset 2, for one acceleration signal, 960,000 feature vectors were collected at 500 (Hz) × 32 (min) × 60 (s/min).Given the large data size, which could lead to inefficient analysis, this study referred to previous studies [27,28] to handle the data.Eight features were extracted from each window at one-second intervals: min, max, peak to peak, mean (abs), rms (root mean square), variance, kurtosis, and skewness.In addition, four features were added by combining the max, rms, and mean (abs) features: crest factor, shape factor, impulse factor, and margin factor, as listed in Table 1.Through the above process, 12 features were extracted, and the measured data were compressed for efficient analysis.The data compression process transformed the dimensionality of the data from 960,000 × 6 × 1 to 1920 × 6 × 12.In this way, the number of feature vectors in the data was reduced, but the number of features was increased twelvefold, resulting in 72 features.The aim was to enable effective data processing and facilitate the diagnosis of the weight center of heavy objects at a one-second interval.Furthermore, the data range of the feature vectors was scaled from 0 to 1 using min-max normalization.Furthermore, an exponentially weighted moving average (EWMA) with a window size of 2 to 3 s was used to smooth the noise signals of the generated features.This approach helped to minimize the noise and outliers in the feature data.Moving averages average out the effects of past data, and exponentially weighted moving averages have the advantage of exponentially attenuating these effects.Therefore, they are used for tasks such as time series forecasting and noise reduction [29,30].The exponentially weighted moving average is used in a way that adjusts the value of alpha (α) based on the window size, as shown in Equation (1).As described in Equation (3), this method smooths out noise in the feature vector to minimize outliers while also reducing the influence of past vectors.
The parameters employed within Equations (1) through (3) are listed below: • : the weight coefficient of exponentially moving average.

•
: the number of data points used by EWMA •  : the exponentially weighted moving average value at the current time t.Table 1.Description of features (crest factor, shape factor, impulse factor, and margin factor).

Feature Dimension
Crest factor max/RMS Shape factor RMS/mean (abs) Impulse factor max/mean (abs) Margin factor max/mean ((abs) 2 ) Through the above process, 12 features were extracted, and the measured data were compressed for efficient analysis.The data compression process transformed the dimensionality of the data from 960,000 × 6 × 1 to 1920 × 6 × 12.In this way, the number of feature vectors in the data was reduced, but the number of features was increased twelvefold, resulting in 72 features.The aim was to enable effective data processing and facilitate the diagnosis of the weight center of heavy objects at a one-second interval.Furthermore, the data range of the feature vectors was scaled from 0 to 1 using min-max normalization.Furthermore, an exponentially weighted moving average (EWMA) with a window size of 2 to 3 s was used to smooth the noise signals of the generated features.This approach helped to minimize the noise and outliers in the feature data.Moving averages average out the effects of past data, and exponentially weighted moving averages have the advantage of exponentially attenuating these effects.Therefore, they are used for tasks such as time series forecasting and noise reduction [29,30].The exponentially weighted moving average is used in a way that adjusts the value of alpha (α) based on the window size, as shown in Equation (1).As described in Equation (3), this method smooths out noise in the feature vector to minimize outliers while also reducing the influence of past vectors.
The parameters employed within Equations (1) through (3) are listed below: • α: the weight coefficient of exponentially moving average.

•
window size: the number of data points used by EWMA.• y t : the exponentially weighted moving average value at the current time t.

•
x t : the input data value at the current time t.
The exponentially weighted moving average was applied to take advantage of its ability to minimize outliers in the feature vector.Figure 6 shows the 'min' feature extracted from the x-acceleration signal using the EWMA technique.Tables 2 and 3 contrast features extracted from the dataset before and after the application of EWMA.They showcase the initial data state alongside the effects of EWMA, including smoothing and value adjustments.
The exponentially weighted moving average was applied to take advantage of ability to minimize outliers in the feature vector.Figure 6 shows the 'min' feature e tracted from the x-acceleration signal using the EWMA technique.Tables 2 and 3 contra features extracted from the dataset before and after the application of EWMA.They show case the initial data state alongside the effects of EWMA, including smoothing and valu adjustments.Through the previous feature engineering process, as shown in Table 4, 3699 feature vectors were generated from dataset 1 (only driving) in approximately 20 min, and 5755 feature vectors were derived from dataset 2 (complex mode), measured for approximately 32 min.In total, 9454 datasets were obtained, which were further divided into training and test datasets at a 7:3 ratio.The training and test datasets comprised 6617 and 2837 samples, respectively, with each feature vector containing 72 features.An attempt was made to use the training dataset to develop machine learning classifier models and check the performance of the machine learning models on the test dataset.The number of feature vectors in the dataset was also augmented to minimize overfitting during the training process.Data augmentation was performed using additive white Gaussian noise (AWGN) and long short-term memory autoencoder (LSTM AE), which expanded the training dataset to a maximum of 19,851 samples (Table 5).AWGN was applied by referring to prior studies [31], and the target signal-to-noise ratio (SNR) was set to 20 dB.An additional dataset could be generated by mixing noise with the original data, as shown in Figure 7. AWGN is a method of adding noise with a Gaussian distribution to the input or output signal of a system.SNR serves as a scale that quantifies the ratio between the signal's strength and the noise level.A higher SNR value corresponds to a more robust signal, reducing the relative impact of noise.AWGN based on SNR can be expressed as follows [31]: • N(t): the noise of the signal.

•
A: the magnitude of the input or output signal.• SNR/10: the standard deviation σ of the noise, calculated from the SNR value.• w(t): the white noise, which follows a Gaussian distribution with a mean of 0 and a variance of 1.
The autoencoder is a neural network that can use unlabeled training data to learn a code that efficiently represents the input data.This type of coding is useful for dimensionality reduction because it typically has much lower dimensionality than the input.In particular, it works as a powerful feature extractor that can be used for the unsupervised pre-training of deep neural networks.An autoencoder consists of an encoder that converts the input to an internal representation and a decoder that converts the internal representation back to output [32].The output result is called reconstruction because the autoencoder reconstructs the input.This study used the mean square error (MSE) as the reconstruction loss in training.LSTM, an artificial recurrent neural network, was designed to address the vanishing gradients in traditional recurrent neural networks (RNNs) [33].As the number of hidden layers in a neural network and the number of nodes in each layer increase, the last layer is trained while the initial layer is not trained.The autoencoder is a neural network that can use unlabeled training data to lea code that efficiently represents the input data.This type of coding is useful for dimens ality reduction because it typically has much lower dimensionality than the input.In ticular, it works as a powerful feature extractor that can be used for the unsupervised training of deep neural networks.An autoencoder consists of an encoder that converts input to an internal representation and a decoder that converts the internal representa back to output [32].The output result is called reconstruction because the autoenco reconstructs the input.This study used the mean square error (MSE) as the reconstruc loss in training.LSTM, an artificial recurrent neural network, was designed to address vanishing gradients in traditional recurrent neural networks (RNNs) [33].As the num of hidden layers in a neural network and the number of nodes in each layer increase last layer is trained while the initial layer is not trained.
This long-term dependency problem arises from the vanishing gradient prob where the gradients tend to converge to zero during the gradient propagation proc particularly when the data length increases during the training phase of RNNs [34,35] the other hand, unlike traditional RNNs, LSTM can effectively overcome the vanish gradient problem by incorporating long-and short-term state values in the learning cess, enabling successful learning even with long training durations [36,37].In this stu an LSTM autoencoder consisting of two LSTM layers was implemented because time ries data were used, as shown in Figure 8.Each layer is used as an encoder and deco [38].Furthermore, the repeat vector was used in the decoder part to restore the c pressed representation to the original input sequence.The repeat vector function rep the compressed latent space representation to produce a representation that matches sequence length.This allows the decoder to use the compressed representation mult times to reconstruct the original input sequence.Using the LSTM autoencoder, the o nal feature vectors were trained as input data, and the output vectors were used as augmentation dataset.The output dataset was generated and replicated to minimize MSE, resulting in a dataset with similar characteristics and patterns to the input data as shown in Figure 9.The equations and parameter descriptions for the LSTM aut coder are as follows [32,33].This long-term dependency problem arises from the vanishing gradient problem, where the gradients tend to converge to zero during the gradient propagation process, particularly when the data length increases during the training phase of RNNs [34,35].On the other hand, unlike traditional RNNs, LSTM can effectively overcome the vanishing gradient problem by incorporating long-and short-term state values in the learning process, enabling successful learning even with long training durations [36,37].In this study, an LSTM autoencoder consisting of two LSTM layers was implemented because time series data were used, as shown in Figure 8.Each layer is used as an encoder and decoder [38].Furthermore, the repeat vector was used in the decoder part to restore the compressed representation to the original input sequence.The repeat vector function repeats the compressed latent space representation to produce a representation that matches the sequence length.This allows the decoder to use the compressed representation multiple times to reconstruct the original input sequence.Using the LSTM autoencoder, the original feature vectors were trained as input data, and the output vectors were used as the augmentation dataset.The output dataset was generated and replicated to minimize the MSE, resulting in a dataset with similar characteristics and patterns to the input dataset, as shown in Figure 9.The equations and parameter descriptions for the LSTM autoencoder are as follows [32,33].
• x t : the input time series data.•  : the output (reconstructed time series data) of the decoder.
• ℎ and ℎ ′ : the hidden states of the LSTM.

Result of Classification
To compare the accuracy of failure prediction, the selected classification algorithms were random forest [39] and LightGBM [40].To enhance their performance, the 'Bayesian optimization' method was employed for hyperparameter tuning.Random forest, an ensemble technique, is rooted in decision trees and serves as a classifier.Decision trees build tree-like models based on input variables, efficiently growing the tree by identifying optimal splitting rules at each branch.However, the vulnerability of a single decision tree to overfitting can hinder its ability to generalize well.To address this concern, the random forest algorithm is applied to alleviate overfitting concerns.A decision tree, by itself, operates as a tree algorithm for data classification or prediction.It navigates the classification or prediction process by creating a tree structure grounded in the data, partitioning it into multiple child nodes through evaluations of specific conditions at each node.The criteria of these conditions are typically determined by metrics such as information gain ( IG) or the Gini index ( Gini).These metrics measure the impurity of class distribution at each node, selecting a splitting criterion that minimizes the difference in impurity before and after the split [39,40].
• D p : the data of the parent node.Random forest stands out as a notable ensemble learning technique, leveraging the power of decision trees.The methodology of random forest is structured around the collaborative efforts of multiple decision trees, which are subsequently aggregated to yield prediction results.The workflow of random forest unfolds as follows: 1.
Bootstrap sample creation: The process starts by randomly selecting a subset of the input data to create what is known as a bootstrap sample.This sample comprises a distinct dataset consisting of data instances randomly extracted from the original input dataset.

2.
Multiple decision tree generation: Next, numerous decision trees are generated, each stemming from a bootstrap sample.These decision trees come into existence with a random element, ensuring their diversity and independence.

3.
Data prediction by decision trees: Each of the generated decision trees is then utilized to predict the input data.This prediction process is carried out individually for all the decision trees in the ensemble.4.
Aggregation of predictions: The prediction results obtained from the individual decision trees are aggregated.This aggregation can take the form of averaging the predictions or adopting a majority voting approach, depending on the task.The aggregated outcome serves as the foundation for the final predictions made by the random forest model.
By following these steps, random forest harnesses the collective insights of multiple decision trees, effectively enhancing prediction accuracy and generalization capabilities.Random forest addresses overfitting by generating multiple models from different data subsets.This diversification improves robustness against noise and uncertainties.The adjustment of the optimal number of decision trees and the splitting criteria can be performed through hyperparameter tuning.Typically, hyperparameters such as the splitting criteria of decision trees and the tree depth are set.The prediction function of random forest is as follow, where T is the number of generated decision trees and f t (x) represents the prediction function of the tth decision tree.
LightGBM is a machine learning model based on the gradient boosting decision tree (GBDT) algorithm.Gradient boosting works by improving the prediction model as new models compensate for the errors of the previous model.Therefore, multiple decision trees can be combined to develop a more robust model that minimizes overfitting.The working mechanism of lightGBM is similar to the conventional GBDT algorithm, but it utilizes the leaf-wise approach during the splitting process (Figure 10).This approach allows lightGBM to produce more unbalanced trees than the traditional level-wise approach, resulting in improved predictive performance.Furthermore, lightGBM includes the feature to perform splitting using only a subset of the data, ensuring faster processing speed for large-scale datasets.However, lightGBM may result in deeper trees, depending on their leaf-wise characteristics and hyperparameter settings, which may lead to deeper trees [41].While this can improve the prediction accuracy of the training data, it may result in lower accuracy when predicting new data because of the overfitting problem.In the case of lightGBM, the aim was to minimize the overfitting problem through the feature augmentation conducted previously.The objective function and parameter descriptions for LightGBM are as follows [42]: ) • l y i , y (t−1) i + f t (x i ) : the loss functions used in the objective function.
• y i : the actual value for the ith data.
• y (t−1) i : the prediction from the previous time step (t − 1).• f t (x i ): the prediction of the tth tree for the ith data point x i .
• Ω( f t ): a term used to regulate the complexity of the tree.

•
T: the number of leaf nodes in the tree.

•
γ: the cost parameter associated with the number of leaf nodes.

•
λ: a coefficient that regulates the weight of leaf nodes.

•
w: the weights assigned to the tree nodes.
Bayesian optimization is a method for finding the optimal solution that maximizes an arbitrary objective function.This optimization technique can be applied to any function for which observations can be obtained and is particularly useful for optimizing black-box functions with high cost and unknown shapes [43].Therefore, Bayesian optimization is used mainly as a hyperparameter optimization method for machine learning models, taking advantage of the characteristics of such optimization techniques [44].The optimized hyperparameters were derived through Bayesian optimization, as shown in Table 6.In Bayesian optimization, the aim is to identify the hyperparameter combination x * that minimizes or maximizes the objective function f (x).The objective function typically takes a form similar to Equation (15).η(x) represents the actual value of the objective function for the hyperparameter combination x. f (x) is defined as the sum of the actual objective function value η(x) and the noise (x).Bayesian optimization involves experimentation with various hyperparameter combinations while modeling both the genuine objective function value η(x) and the accompanying noise (x).Through iterative processes, the next hyperparameter combination to explore is forecast, advancing the optimization process.Bayesian optimization is a method for finding the optim an arbitrary objective function.This optimization technique c for which observations can be obtained and is particularly box functions with high cost and unknown shapes [43].Ther is used mainly as a hyperparameter optimization method fo taking advantage of the characteristics of such optimizatio mized hyperparameters were derived through Bayesian opti 6.In Bayesian optimization, the aim is to identify the hype that minimizes or maximizes the objective function  ).Th takes a form similar to Equation (15). ) represents the a function for the hyperparameter combination . ) is defi objective function value  ) and the noise  ).Bayesian o imentation with various hyperparameter combinations while objective function value  ) and the accompanying noise cesses, the next hyperparameter combination to explore is for zation process.
) =  ) +  )  Because forklifts move continuously, vibration data has the characteristics of time series data.The time series data and the state changes in the condition segment (center of heavy objects carried by forklift) do not depend on the state at a single point in time but on the past values.Therefore, the probability of classifying the conditioning segment was the EWMA to diagnose the conditioning segment contextually using the moving average instead of diagnosing the conditioning segment only by the probability at that time, as shown in Table 7. Contextual diagnosis in machine learning is a technique to diagnose by considering the context of the given data [45].This provides a more profound understanding than simply analyzing and predicting data patterns.It simply considers the context of the data before and after, rather than individual data points, to help make an accurate diagnosis.In addition, it is used effectively for outlier detection in time series [46] and partial data [47].This study attempted to minimize the effect of noise, such as outliers, using the exponentially weighted moving average for contextual diagnosis.In applying contextual diagnosis, this study examined the effects of the window size on the moving average. Figure 11 presents the learning and prediction process flow.As a result of contextual diagnosis through the exponentially weighted mo erage, the classification probability of each condition segment predicted by mach As a result of contextual diagnosis through the exponentially weighted moving average, the classification probability of each condition segment predicted by machine learning changes, as listed in Table 7. Forklifts carry and transport unbalanced heavy objects during continuous movement or operation, and the centers of heavy objects do not fluctuate on a one-second basis.Therefore, a two to three-second window was used to calculate the moving average probability.As a result, when the condition segment was diagnosed as "center", applying a moving average to the probabilities in certain outlier segments diagnosed as "left" or "right" would result in lower values influenced by past data.Figures 12 and 13 present these probabilities as graphs as a function of time.From the observed results, the centers of heavy objects carried by the forklift were diagnosed more accurately by the generated classifier.As a result of contextual diagnosis through the exponentially weighted moving average, the classification probability of each condition segment predicted by machine learning changes, as listed in Table 7. Forklifts carry and transport unbalanced heavy objects during continuous movement or operation, and the centers of heavy objects do not fluctuate on a one-second basis.Therefore, a two to three-second window was used to calculate the moving average probability.As a result, when the condition segment was diagnosed as "center", applying a moving average to the probabilities in certain outlier segments diagnosed as "left" or "right" would result in lower values influenced by past data.Figures 12 and 13 present these probabilities as graphs as a function of time.From the observed results, the centers of heavy objects carried by the forklift were diagnosed more accurately by the generated classifier.Because the data imbalance was minimized, the performance was compared using the accuracy score of 24 classifiers, including random forest and lightGBM (Table 8).First, the accuracy increased in all cases as the window size of the exponentially weighted moving average for smoothing the feature vector time series data increased and the alpha value decreased.The classification probabilities of the condition segment predicted by machine learning were subjected to the moving average to achieve a contextual diagnosis.In all cases, the classification accuracy of the condition segment increased gradually as the size of the moving average window increased, and the alpha decreased.On the other hand, although the data augmentation minimized the overfitting of the model, it resulted in the same or slightly lower accuracy.Because the data imbalance was minimized, the performance was compared using the accuracy score of 24 classifiers, including random forest and lightGBM (Table 8).First, the accuracy increased in all cases as the window size of the exponentially weighted moving average for smoothing the feature vector time series data increased and the alpha value decreased.The classification probabilities of the condition segment predicted by machine learning were subjected to the moving average to achieve a contextual diagnosis.In all cases, the classification accuracy of the condition segment increased gradually as the size of the moving average window increased, and the alpha decreased.On the other hand, although the data augmentation minimized the overfitting of the model, it resulted in the same or slightly lower accuracy.Applying moving average to the probabilities of the lightGBM model resulted in an overall increase in the scores (cases [13][14][15][16][17][18][19][20][21][22][23][24].This trend can be seen in the probability graphs diagnosing each condition segment in Figures 12 and 13.However, lightGBM, with its leaf-wise growth strategy, tended to overfit with increasing tree depth and often has probabilities highly skewed towards 0 or 1.Therefore, it was difficult to observe performance improvement in the combination of the augmented dataset and contextual diagnosis for the model (cases 13, 16, 19, and 22).
Compared to lightGBM, random forest exhibited relatively less overfitting and showed gradual fluctuations in probabilities corresponding to changes in the time series.This trend can be observed in Figure 13.The diagnosis of the condition segments was not sigmoidal but rather smooth and gradual because the probability was calculated by averaging the voting values of multiple randomly generated decision trees.These characteristics were expressed in classifiers utilizing the augmented dataset with AWGN and LSTM autoencoders.When applying AWGN and LSTM autoencoders to the dataset and contextual diagnosis in cases 10-12, the application of the probability moving average resulted in a higher score of 0.0331 (3.31%) compared to cases 1-3.On the other hand, the random forest score was lower than lightGBM in all cases when the moving average was not applied (without a contextual diagnosis).
By applying machine learning and contextual diagnosis, the diagnosis of the centers of heavy objects carried by forklifts was performed by the condition segment during the process of lifting, tilting, and moving heavy objects on uneven ground using a forklift.As a result, the random forest model (case 12) achieved a maximum accuracy of 0.9563, while the lightGBM model (case 24) achieved a maximum accuracy of 0.9566.

Abnormal Lifting Weight Stage Classification
In Section 3, vibration and sound data were acquired from forklifts operating under repeated abnormal weight lifting conditions.In addition, feature engineering and health stage classification of the data were performed following a systematic PHM procedure as depicted in Figure 14.
the lightGBM model (case 24) achieved a maximum accuracy of 0.9566.

Abnormal Lifting Weight Stage Classification
In Section 3, vibration and sound data were acquired from forklifts operating under repeated abnormal weight lifting conditions.In addition, feature engineering and health stage classification of the data were performed following a systematic PHM procedure as depicted in Figure 14.

Experimental Data Acquisition and Feature Engineering
Safety accidents and failure situations were induced by continuous lifting of unbalanced heavy objects in a laboratory environment.Data measurements and condition diagnoses were conducted for these situations.During the data measurement process, the forklift repeatedly lifted and lowered an unbalanced load of 1500 kg to the right at a consistent speed every 20 s for five hours.The forklift remained stationary throughout this process and did not perform any driving activity.In addition, four three-axis (x, y, z) acceleration sensors were mounted on the left and right sides of the front-end structure to collect vibration data (Figure 15).Repeated acceleration tests were performed until a failure condition occurred; the forklift swayed, and a loud noise was generated in the structure.In the actual operating environment, it is difficult to diagnose the condition and predict the lifespan by sound because of ambient noise.In this study, a microphone (sampling rate 51.2 kHz) was installed at the driver's position to eliminate ambient noise in an anechoic chamber environment, and labeling was performed using sound data.At the same time, the noise generated was measured by the forklift.These sound data were used to classify and label the failure stages into three stages: normal, failure entry, and failure.

Experimental Data Acquisition and Feature Engineering
Safety accidents and failure situations were induced by continuous lifting of unbalanced heavy objects in a laboratory environment.Data measurements and condition diagnoses were conducted for these situations.During the data measurement process, the forklift repeatedly lifted and lowered an unbalanced load of 1500 kg to the right at a consistent speed every 20 s for five hours.The forklift remained stationary throughout this process and did not perform any driving activity.In addition, four three-axis (x, y, z) acceleration sensors were mounted on the left and right sides of the front-end structure to collect vibration data (Figure 15).Repeated acceleration tests were performed until a failure condition occurred; the forklift swayed, and a loud noise was generated in the structure.In the actual operating environment, it is difficult to diagnose the condition and predict the lifespan by sound because of ambient noise.In this study, a microphone (sampling rate 51.2 kHz) was installed at the driver's position to eliminate ambient noise in an anechoic chamber environment, and labeling was performed using sound data.At the same time, the noise generated was measured by the forklift.These sound data were used to classify and label the failure stages into three stages: normal, failure entry, and failure.The dataset obtained in the experiment contains 12 acceleration signals (four sensors × (x, y, z accelerations)).For each signal of one sensor, 512 (Hz) × 300 (min) × 60 (s/min) acceleration data were measured, resulting in a total of 9,216,000 data points.In addition, 921,600,000 (51,200 Hz × 300 min × 60 s/min) sound data were collected.As shown in Figure 16, the plotted sound signal obtained in the time domain made it difficult to track the state changes.By observing the frequency changes over time using the short-time Fourier transform (STFT), it was possible to determine if the forklift was operating over time.On the other hand, state changes and detailed differences were difficult to compare.The data size was large, and it was difficult to observe the distinctive state changes when analyzing the raw signals.Twelve features were extracted from the time-domain signal, including min, max, peak to peak, mean (abs), RMS, variance, kurtosis, skewness, crest factor, shape factor, impulse factor, and margin factor for every 20 s window to compress the data and solve these difficulties.Based on this, the dimensions of the acceleration signal and sound signal data were reduced to 900 × 12 × 12 and 900 × 1 × 12, respectively.As reported in Section 2, an exponentially weighted moving average (alpha 0.5) was applied to the reduced dataset to minimize the noise signal in the data.After feature engineering, the CART (classification and regression tree) algorithm [40] was used to classify the failure stages over time using sound features.The CART algorithm is a decision tree algorithm for classification and regression analysis that evaluates the importance of each variable based on the input data and prioritizes the important variables to produce a decision tree.The CART algorithm was used to derive 12 decision trees for each feature by pruning to prevent overfitting and classify the status into three levels (Figure 17).Significant branching points could be derived from eight of the 12 decision trees, and the failure stage was classified based on the average value of the branching points derived from the eight decision trees (Table 9).Based on the classification results, breakpoint 1 was determined to be at 1.18 h and breakpoint 2 at 1.77 h, which were used to distinguish the labeling of the data (Table 10).Furthermore, a comparison of the recorded forklift experimental video showed that beyond breakpoint 1, the forklift exhibited early failure symptoms (beginning to shake), demonstrating a complete failure state at breakpoint 2.  The data size was large, and it was difficult to observe the distinctive state changes when analyzing the raw signals.Twelve features were extracted from the time-domain signal, including min, max, peak to peak, mean (abs), RMS, variance, kurtosis, skewness, crest factor, shape factor, impulse factor, and margin factor for every 20 s window to compress the data and solve these difficulties.Based on this, the dimensions of the acceleration signal and sound signal data were reduced to 900 × 12 × 12 and 900 × 1 × 12, respectively.As reported in Section 2, an exponentially weighted moving average (alpha 0.5) was applied to the reduced dataset to minimize the noise signal in the data.After feature engineering, the CART (classification and regression tree) algorithm [40] was used to classify the failure stages over time using sound features.The CART algorithm is a decision tree algorithm for classification and regression analysis that evaluates the importance of each variable based on the input data and prioritizes the important variables to produce a decision tree.The CART algorithm was used to derive 12 decision trees for each feature by pruning to prevent overfitting and classify the status into three levels (Figure 17).Significant branching points could be derived from eight of the 12 decision trees, and the failure stage was classified based on the average value of the branching points derived from the eight decision trees (Table 9).Based on the classification results, breakpoint 1 was determined to be at 1.18 h and breakpoint 2 at 1.77 h, which were used to distinguish the labeling of the data (Table 10).Furthermore, a comparison of the recorded forklift experimental video showed that beyond breakpoint 1, the forklift exhibited early failure symptoms (beginning to shake), demonstrating a complete failure state at breakpoint 2.
The 900 data generated were divided into 630 data for training and 270 data for testing.Given that the dataset is imbalanced, the performance was evaluated using the F1 score during the validation process.The F1 score is one of the metrics that evaluate the accuracy of a model and is calculated as the harmonic mean of precision and recall.When data are unbalanced, the prediction accuracy for a small number of data classifications is often high, but the prediction accuracy for a large number of data classifications is often low.In such situations, relying solely on accuracy may give the impression of high prediction accuracy for a small number of data classifications.On the other hand, the accuracy for most data classifications may be relatively lower, leading to inadequate overall model performance evaluation.The F1 score considers both precision and recall, which makes it more suitable for dealing with imbalanced classification problems [48][49][50].Therefore, when the data are unbalanced, using the F1 score is a more accurate way to evaluate the model performance.Consequently, the F1 score was used to evaluate the performance in this unbalanced dataset.The 900 data generated were divided into 630 data for training and 270 data for testing.Given that the dataset is imbalanced, the performance was evaluated using the F1 score during the validation process.The F1 score is one of the metrics that evaluate the accuracy of a model and is calculated as the harmonic mean of precision and recall.When data are unbalanced, the prediction accuracy for a small number of data classifications is often high, but the prediction accuracy for a large number of data classifications is often low.In such situations, relying solely on accuracy may give the impression of high prediction accuracy for a small number of data classifications.On the other hand, the accuracy for most data classifications may be relatively lower, leading to inadequate overall model performance evaluation.The F1 score considers both precision and recall, which makes it more suitable for dealing with imbalanced classification problems [48][49][50].Therefore, when the data are unbalanced, using the F1 score is a more accurate way to evaluate the model performance.Consequently, the F1 score was used to evaluate the performance in this unbalanced dataset.

Stage Classification Result
In the previous steps, labels were assigned to instances using sound features.Leveraging these labels, logistic regression and random forest classifiers were trained.The dataset was divided into two distinct cases: case 1 utilized solely vibration features, encompassing 144 dimensions; in contrast, case 2 integrated both vibration and sound features, totaling 156 dimensions.While sound features were primarily introduced for labeling purposes, their inclusion in the training prompted additional cases to assess their impact on classifier performance.In both cases, logistic regression and random forest were selected as the machine learning models, with the details of random forest elaborated upon

Stage Classification Result
In the previous steps, labels were assigned to instances using sound features.Leveraging these labels, logistic regression and random forest classifiers were trained.The dataset was divided into two distinct cases: case 1 utilized solely vibration features, encompassing 144 dimensions; in contrast, case 2 integrated both vibration and sound features, totaling 156 dimensions.While sound features were primarily introduced for labeling purposes, their inclusion in the training prompted additional cases to assess their impact on classifier performance.In both cases, logistic regression and random forest were selected as the machine learning models, with the details of random forest elaborated upon in Section 2.2.For multiclass classification, logistic regression employs the SoftMax function alongside the cross-entropy loss.In this context, the linear discriminant function h k (x) is defined as follows: where w k and b k denotes the weights and bias for each class k, respectively, while x i represents the ith input variable.p(y = j|x i ), the probability of class j, predicted by the SoftMax function is as follows [32]: The cross-entropy loss function, J(θ), is calculated as follows [32]: • y ij : the binary value indicating whether class j is the correct target for the ith data point.

•
y ij : the predicted probability of class j for the ith input data point.

•
N: the total number of data points.• K: the total number of classes.
LightGBM, used in the previous section, is prone to overfitting when the dataset size is small.LightGBM was deemed inappropriate because this dataset consists of 900 feature vectors and was not used in this study.No additional hyperparameter tuning was conducted, and only the default parameters provided by the scikit-learn module in Python were utilized.Based on the validation results of the test data shown in Table 11, the logistic regression classifier achieved an F1 score (macro) of 0.9599 for case 1 and 0.9790 for case 2. In the case of random forest, the F1 scores (macro) for cases 1 and 2 were 0.9116 and 0.9220, respectively.Additionally, Figure 18 depicts a confusion matrix, where the x-axis shows the actual (ground truth) class labels, and the y-axis shows the predicted class labels generated by a model.This confirms that it is possible to classify the forklifts as healthy or aging under repeated unbalanced load weight conditions based on acceleration signals.ℎ ) =   +  (16) where  and  denotes the weights and bias for each class , respectively, while  represents the th input variable.  = | ), the probability of class , predicted by the SoftMax function is as follows [32]: The cross-entropy loss function,  ), is calculated as follows [32]: •  : the binary value indicating whether class  is the correct target for the th data point.

•
: the predicted probability of class  for the th input data point.
• : the total number of data points.• : the total number of classes.
LightGBM, used in the previous section, is prone to overfitting when the dataset size is small.LightGBM was deemed inappropriate because this dataset consists of 900 feature vectors and was not used in this study.No additional hyperparameter tuning was conducted, and only the default parameters provided by the scikit-learn module in Python were utilized.Based on the validation results of the test data shown in Table 11, the logistic regression classifier achieved an F1 score (macro) of 0.9599 for case 1 and 0.9790 for case 2. In the case of random forest, the F1 scores (macro) for cases 1 and 2 were 0.9116 and 0.9220, respectively.Additionally, Figure 18 depicts a confusion matrix, where the x-axis shows the actual (ground truth) class labels, and the y-axis shows the predicted class labels generated by a model.This confirms that it is possible to classify the forklifts as healthy or aging under repeated unbalanced load weight conditions based on acceleration signals.

RUL Prediction and Fault Diagnosis with Alarm Rule for Abnormal Lifting
Numerous studies [18][19][20][21][22][23][24][25][26][27][28] have comprehensively explored RUL prediction aligned with fault diagnosis.Zhang et al. [19] introduced an inventive approach by parallelly integrating spatial and temporal features using a hybrid neural network.This model combined a 1D convolutional neural network (CNN) with a bidirectional gated recurrent unit (BiGRU).Furthermore, they assessed the limitations of RUL prediction using CNN, LSTM, and the transformer algorithm on aircraft turbofan engine data.To overcome these limitations, they introduced the integrated multi-head dual sparse self-attention network (IMDSSN), an architecture incorporating the ProbSparse self-attention network (MPSN) and LogSparse self-attention network (MLSN) components [20].Other strategies in RUL prediction span a broad spectrum.Pham et al. [21] proposed RUL prediction for a methane compressor, employing flow system identification, proportional hazard modeling, and support vector machines.Loutas et al. [22] introduced an ε-support vector machine-based approach for estimating rolling bearing RUL.Gugulothu et al. [23] developed Embed-RUL, addressing differing patterns in embeddings between and degraded machines.This technique employs a sequence-to-sequence model based on RNNs to generate embeddings for multivariate time series subsequences.Hinchi et al. [24] introduced a method for bearing RUL prediction centered on a convolutional LSTM network.Niu et al. [25] proposed an RUL prediction technique utilizing a 1D-CNN LSTM network.Jiang et al. [26] presented a time series multiple channel convolutional neural network (TSMC-CNN) integrating CNN with LSTM and attention mechanisms.They further integrated TSMC-CNN with an attention-based long short-term memory (ALSTM) network for bearing RUL prediction.Saidi et al. [28] presented RUL prediction using support vector regression (SVR).These regression-based RUL predictions require well-structured feature vectors and substantial RUL data collection.To address this challenge, several studies have focused on performance enhancement, employing ensemble techniques and a range of model-based approaches, including CNN, LSTM, transformers, attention mechanisms, and SVR.While many works have explored convolutional LSTM networks and sequence-to-sequence models, Ley et al. [18] systematically outlined four technical processes including data collection, health index construction, health stage segmentation, and RUL prediction, with a dedicated focus on RUL.
Considering the algorithms proposed in the literature and the model of Ley et al. in [18], the present study combines appropriate algorithms in an integrated manner, adapted to the unique demands of abnormal lifting scenarios.This customized integration facilitates the model in capturing latent patterns and interactions arising from unbalanced load conditions, thus leading to enhanced RUL predictions.This section demonstrates real-time active diagnosis using the life model equation, even with limited data, thereby enhancing the practicality of the proposed approaches.As Section 3 discussed the classification of health stages, subsequently, in Section 4, the dataset and classifiers are applied to perform RUL prediction and fault diagnosis using alarm rules, thereby establishing a foundation for condition-based management [51].This study also introduces real-time active forklift condition diagnosis using the life model equation, even with limited data.Moreover, alarm rules are devised utilizing key features extracted from the health stage classifier and designated as health indices.Figure 19 summarizes the RUL calculation process and the methods used.

Life Prediction Model and RUL Verification
In the case of logistic regression, as shown in Figure 20, the classification probabilities of the classifier were derived as either 0 or 1, making it difficult to track the progressive changes in the state.Therefore, it was not feasible to adopt degradation curves for logistic regression.On the other hand, as shown in Section 3, when the classification probabilities of the entire dataset belonging to each stage were visualized using a random forest classifier, a gradual change in state was observed in the graph (Figure 21).This represents the degradation state of the forklift using classification probabilities, and the probabilities of Stages 1 and 3 were combined and averaged to generate the degradation curve (Figure 22).After analyzing the degradation curve, Stage 3 was reached after approximately 106 min, and the threshold was a probability of 0.7.

Life Prediction Model and RUL Verification
In the case of logistic regression, as shown in Figure 20, the classification probabilities of the classifier were derived as either 0 or 1, making it difficult to track the progressive changes in the state.Therefore, it was not feasible to adopt degradation curves for logistic regression.On the other hand, as shown in Section 3, when the classification probabilities of the entire dataset belonging to each stage were visualized using a random forest classifier, a gradual change in state was observed in the graph (Figure 21).This represents the degradation state of the forklift using classification probabilities, and the probabilities of Stages 1 and 3 were combined and averaged to generate the degradation curve (Figure 22).After analyzing the degradation curve, Stage 3 was reached after approximately 106 min, and the threshold was a probability of 0.7.

Life Prediction Model and RUL Verification
In the case of logistic regression, as shown in Figure 20, the classification probabilities of the classifier were derived as either 0 or 1, making it difficult to track the progressive changes in the state.Therefore, it was not feasible to adopt degradation curves for logistic regression.On the other hand, as shown in Section 3, when the classification probabilities of the entire dataset belonging to each stage were visualized using a random forest classifier, a gradual change in state was observed in the graph (Figure 21).This represents the degradation state of the forklift using classification probabilities, and the probabilities of Stages 1 and 3 were combined and averaged to generate the degradation curve (Figure 22).After analyzing the degradation curve, Stage 3 was reached after approximately 106 min, and the threshold was a probability of 0.7.

Life Prediction Model and RUL Verification
In the case of logistic regression, as shown in Figure 20, the classification probabilities of the classifier were derived as either 0 or 1, making it difficult to track the progressive changes in the state.Therefore, it was not feasible to adopt degradation curves for logistic regression.On the other hand, as shown in Section 3, when the classification probabilities of the entire dataset belonging to each stage were visualized using a random forest classifier, a gradual change in state was observed in the graph (Figure 21).This represents the degradation state of the forklift using classification probabilities, and the probabilities of Stages 1 and 3 were combined and averaged to generate the degradation curve (Figure 22).After analyzing the degradation curve, Stage 3 was reached after approximately 106 min, and the threshold was a probability of 0.7.The model for RUL was constructed using an exponential function equation, as follows.
Exponential coefficients were determined using the method of least squares, focusing on data collected within the 30 min prior to the diagnostic time.The least squares method is a statistical tool used in regression analysis to determine model parameters that reduce the discrepancy (error) between observed data and the values expected by the model.The following equations and parameters illustrate the implementation of the least squares method with exponential functions, specifically in Equation ( 20), for a given dataset.
Dataset : ((x 1 , y 1 ), (x 2 , y 2 ), . . . ,(x n , y n )) • y i : the observed value of the dependent variable at a point • x ti : the value of independent variable x i at a time t.Based on these characteristics, the exponential function continued to change with time, and the time when the y-value of the exponential function reached 0.7 was calculated to analyze the RUL.During the analysis of the RUL, the confidence interval was included by considering Y(0.9x t ) to Y(1.1 x t ) in the pre-and post-prediction time points.The analysis showed that the early data predicted the failure rather early, as shown in Figure 23.The gap between the actual and the predicted RUL decreased gradually as the failure point approached (Figure 24).

Alarm-Rule-Based Fault Diagnosis
The SHAP (Shapley additive explanations) algorithm was used to diagnose faults based on the key features and alarm rules.The goal was to determine which features were critical for state classification using the SHAP algorithm with the previously trained random forest classifier.This analysis allowed the identification of the significant features used for state classification.SHAP is an algorithm used to analyze the contribution of features to the predictions made by machine learning models.It helps to determine the importance of each feature in the model's predictions.The SHAP algorithm considers all features necessary to explain the model predictions and calculates the impact of each feature on the prediction outcome [52].Calculating the Shapley value for all feature combinations is computationally expensive.Therefore, the SHAP algorithm uses approximations that can be calculated relatively quickly for tree-based models, such as the random forests or XGBoost [29].The SHAP algorithm provides various interpretation results, such as feature importance, feature contribution, and feature effect, which help to interpret the model and explain the prediction results reliably [52].The SHAP algorithm improves the interpretability of machine learning models and plays a vital role in model development and helping users to understand the model prediction results [52].

Alarm-Rule-Based Fault Diagnosis
The SHAP (Shapley additive explanations) algorithm was used based on the key features and alarm rules.The goal was to determine w critical for state classification using the SHAP algorithm with the prev dom forest classifier.This analysis allowed the identification of the used for state classification.SHAP is an algorithm used to analyze the tures to the predictions made by machine learning models.It helps to portance of each feature in the model's predictions.The SHAP algo features necessary to explain the model predictions and calculates the ture on the prediction outcome [52].Calculating the Shapley value fo nations is computationally expensive.Therefore, the SHAP algorithm tions that can be calculated relatively quickly for tree-based models, s forests or XGBoost [29].The SHAP algorithm provides various interpre as feature importance, feature contribution, and feature effect, which h model and explain the prediction results reliably [52].The SHAP algo interpretability of machine learning models and plays a vital role in m and helping users to understand the model prediction results [52].
The features that contribute significantly to health diagnostics we the SHAP algorithm.As shown on the left in Figure 25, the 'abs.meancontributes the most to the health diagnosis.A progressive state cha degradation model graph, could be observed by observing the disper ries of the corresponding feature (Figure 25, right image).The 2-sigma of the 'abs.meanch In LH z' feature were extracted from the distribu The features that contribute significantly to health diagnostics were identified using the SHAP algorithm.As shown on the left in Figure 25

Discussion
This study addresses the critical issue of preventing front-end failures in forklifts by predicting the center of gravity.The study begins by emphasizing the fault diagnosis of front-end failures in forklifts and the significance of accurately predicting the gravity center.In this pursuit, the study acquired acceleration signals during lifting, tilting, and moving heavy objects by the forklift on uneven ground and in various operating environments.These acceleration signals were captured from the outer beams on both the left and right sides of the forklift's front-end.To ensure effective processing and feature determination, time-domain statistical features were extracted and established as variables by applying a window at one-second intervals.To mitigate potential overfitting, the dataset

Discussion
This study addresses the critical issue of preventing front-end failures in forklifts by predicting the center of gravity.The study begins by emphasizing the fault diagnosis of front-end failures in forklifts and the significance of accurately predicting the gravity center.In this pursuit, the study acquired acceleration signals during lifting, tilting, and moving heavy objects by the forklift on uneven ground and in various operating environments.These acceleration signals were captured from the outer beams on both the left and right sides of the forklift's front-end.To ensure effective processing and feature determination, time-domain statistical features were extracted and established as variables by applying a window at one-second intervals.To mitigate potential overfitting, the dataset was augmented with AWGN and LSTM autoencoders.Following these data enhancements, classifiers were used to accurately categorize the center of objects being transported by forklifts during their driving and working phases.This classification task was accomplished using the robust capabilities of the random forest and lightGBM models.The random forest model and lightGBM model were able to predict the center of gravity with an accuracy of 0.9563 and 0.9566, respectively.During the prediction of the gravity center, an exponentially weighted moving average was conducted to smooth out the noise of the features.As a result of applying this moving average, the dataset's outliers were reduced without necessitating complex noise filtering methods.Moreover, an observation was made: as the window size for the exponentially weighted moving average was increased, the accuracy of the machine learning models also improved.This suggests that a larger window captured more accurate data trends, enhancing predictive performance.By using data augmentation, overfitting was minimized, while maintaining similar or slightly lower accuracy scores.In the lightGBM model, the implementation of a moving average on the classification probabilities showed a tendency to improve the scores.In contrast, lightGBM tended to overfit with increased tree depth, resulting in biased classification probabilities.This made it difficult to improve performance when combining augmented datasets with contextual diagnosis.However, the random forest model showed less overfitting compared to lightGBM and showed gradual changes in classification probabilities over time.These trends were observed in the augmented dataset using the AWGN and LSTM autoencoders.Applying the augmented dataset and contextual diagnosis, along with the moving average to the classification probabilities, resulted in an average score improvement of 3.31%.Notably, the random forest score was lower than the lightGBM score without the moving average.
After predicting the center of gravity, the paper presented a procedure for forecasting the RUL during abnormal operational scenarios and diagnosing failures through the application of alarm rules.To forecast RUL during repetitive unbalanced load conditions, statistical features were extracted from acceleration data using a 20 s window.This data collection took place in an anechoic chamber, with simultaneous recording of sound signals via a microphone.The CART algorithm classified and labeled statistical features derived from the sound signals.Logistic regression and random forest models were then used for failure stage classification, achieving F1 scores of 0.9790 and 0.9220, respectively.Notably, logistic regression achieved the highest score among the classifiers.Conversely, when examining the classifier's probability change, the probabilities were often skewed toward 0 or 1.This skewed distribution made it difficult to accurately track the forklift's state changes.Consequently, monitoring the forklift's degradation status was accomplished through the random forest's generated classification probabilities.The results showed a gradual change in the condition of the forklift over time.Based on these results, the probabilities generated by the random forest for each stage were combined and averaged to create a degradation curve.The failure point was predicted by the failure threshold that was determined through degradation curve analysis.
In conclusion, predicting the gravity center of objects carried by the forklift yields insights into operations that impact forklift durability.This approach enhances equipment longevity and operational safety.Additionally, providing RUL data facilitates the development of operational plans and efficient maintenance by identifying critical features for setting failure thresholds.

Figure 1 .
Figure 1. Outline of procedure used to predict the failure of forklift front-end.

Figure 2 .
Figure 2. Classification procedure of imbalanced operations based on the center of gravity.

Figure 1 .
Figure 1. Outline of procedure used to predict the failure of forklift front-end.

Figure 1 .
Figure 1. Outline of procedure used to predict the failure of forklift front-end.

Figure 2 .
Figure 2. Classification procedure of imbalanced operations based on the center of gravity.

Figure 2 .
Figure 2. Classification procedure of imbalanced operations based on the center of gravity.

Figure 3 .
Figure 3. Electric-powered counterbalance forklift of ITA Class I type.

Figure 4 .
Figure 4.The location of DAQ and the apparatus of a forklift carrying object.

Figure 3 .
Figure 3. Electric-powered counterbalance forklift of ITA Class I type.

Figure 3 .
Figure 3. Electric-powered counterbalance forklift of ITA Class I type.

Figure 4 .
Figure 4.The location of DAQ and the apparatus of a forklift carrying object.

Figure 4 .
Figure 4.The location of DAQ and the apparatus of a forklift carrying object.

Figure 7 .
Figure 7. Results of noise mixed augmentation using AWGN.

Figure 7 .
Figure 7. Results of noise mixed augmentation using AWGN.

•
,  ,  , and  : the learnable parameters (weights and bia • : the activation function, typically sigmoid or tanh function.• MSE: the mean squared error, loss function.•  : the ith element of the input data.•  : the ith element of the model's prediction (reconstructed dat • : the number of elements in the input data.

Figure 9 .
Figure 9. Result of feature augmentation using the LSTM autoencoder.

Figure 9 .
Figure 9. Result of feature augmentation using the LSTM autoencoder.

Figure 9 .
Figure 9. Result of feature augmentation using the LSTM autoencoder.

Figure 11 .
Figure 11.Schematic procedure of training and prediction.

Figure 11 .
Figure 11.Schematic procedure of training and prediction.

Figure 13 .
Figure 13.Contextual diagnosis graph of random forest (left: original, right: after moving average).

Figure 13 .
Figure 13.Contextual diagnosis graph of random forest (left: original, right: after moving average).

Figure 14 .
Figure 14.Classification of failure (health) stages in imbalanced abnormal operations.Figure 14.Classification of failure (health) stages in imbalanced abnormal operations.

Figure 14 .
Figure 14.Classification of failure (health) stages in imbalanced abnormal operations.Figure 14.Classification of failure (health) stages in imbalanced abnormal operations.

Figure 15 .
Figure 15.Images of the repeated abnormal lifting experiment in an anechoic chamber with microphone and vibration sensors.

Figure 15 .
Figure 15.Images of the repeated abnormal lifting experiment in an anechoic chamber with microphone and vibration sensors.

Figure 16 .
Figure 16.Waveform and short-time Fourier transform (STFT) of measured sound data.

Figure 17 .
Figure 17.Visual examples of CART algorithm results.

Figure 17 .
Figure 17.Visual examples of CART algorithm results.

Figure 21 .
Figure 21.Probability of the random forest classifier.

Figure 19 .
Figure 19.Prediction of RUL and fault diagnosis.

Figure 19 .
Figure 19.Prediction of RUL and fault diagnosis.

Figure 21 .
Figure 21.Probability of the random forest classifier.

Figure 19 .
Figure 19.Prediction of RUL and fault diagnosis.

Figure 21 .
Figure 21.Probability of the random forest classifier.Figure 21.Probability of the random forest classifier.

Figure 21 .Figure 22 .Figure 22 .
Figure 21.Probability of the random forest classifier.Figure 21.Probability of the random forest classifier.Sensors 2023, 23, x FOR PEER REVIEW 24 o • a, b, c: the parameters of the exponential function.• e i : the residuals associated with the exponential function's prediction.• E: the objective function to compute the sum of squared residuals, which is minimized to determine the values of a, b, and c.

Figure 23 .
Figure 23.Degradation curve of forklift using the average probability of a random forest classifier.

Figure 23 .
Figure 23.Degradation curve of forklift using the average probability of a random forest classifier.

Figure 23 .Figure 24 .
Figure 23.Degradation curve of forklift using the average probability of a ran
, the 'abs.meanch In LH z' feature contributes the most to the health diagnosis.A progressive state change, similar to the degradation model graph, could be observed by observing the dispersion of the time series of the corresponding feature (Figure 25, right image).The 2-sigma and 3-sigma values of the 'abs.meanch In LH z' feature were extracted from the distribution of the normal data range in stage 1.These values were then used as the threshold for fault diagnosis.Next, an alarm rule was set to diagnose a failure when the exponentially weighted moving average value of the feature (alpha 0.17) exceeds a threshold.As shown in Figure 25 (right), the 3-sigma threshold can diagnose failure just before failure, and the 2-sigma threshold can predict failure at the point of the precursor symptoms.Sensors 2023, 23, x FOR PEER REVIEW 26 of 29

Figure 25 .
Figure 25.Results of SHAP and fault diagnosis based on the alarm rules.

Figure 25 .
Figure 25.Results of SHAP and fault diagnosis based on the alarm rules.

Table 1 .
Description of features (crest factor, shape factor, impulse factor, and margin factor).

Table 2 .
Table of features before applying the exponentially weighted moving average.

Table 3 .
Table of features after applying the exponentially weighted moving average.

Table 2 .
Table of features before applying the exponentially weighted moving average.FeaturesFeature Vectors without Exponentially Weighted Window 1 . . .

Table 3 .
Table of features after applying the exponentially weighted moving average.

Table 4 .
Configuration of feature vectors in the dataset.

Table 5 .
Dataset size based on feature augmentation.

Table 6 .
Hyperparameter tuning results of the random forest and lightGBM.

Table 7 .
Result of contextual diagnosis.
EER REVIEW Figure 11.Schematic procedure of training and prediction.

Table 8 .
Result of case studies.

Table 8 .
Result of case studies.

Table 9 .
Stage-labeling results of the sound data features using the CART (classification and regression tree) algorithm.

Table 10 .
Labeling and structure of a dataset.

Table 9 .
Stage-labeling results of the sound data features using the CART (classification and regression tree) algorithm.

Table 10 .
Labeling and structure of a dataset.

Table 11 .
Validation result of stage classification.For multiclass classification, logistic regression employs the SoftMax function alongside the cross-entropy loss.In this context, the linear discriminant function ℎ ) is defined as follows:

Table 11 .
Validation result of stage classification.