Next Article in Journal
HA: An Influential Node Identification Algorithm Based on Hub-Triggered Neighborhood Decomposition and Asymmetric Order-by-Order Recurrence Model
Previous Article in Journal
Design Particularities of Quadrature Chaos Shift Keying Communication System with Enhanced Noise Immunity for IoT Applications
Previous Article in Special Issue
Penalized Exponentially Tilted Likelihood for Growing Dimensional Models with Missing Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis

1
School of Mathematics and Statistics, Huaiyin Normal University, Huai’an 223300, China
2
Department of Mathematics, Yanbian University, Yanji 133002, China
3
School of Statistics and Data Science, LPMC, LEBPS and KLMDASR, Nankai University, Tianjin 300071, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(3), 297; https://doi.org/10.3390/e27030297
Submission received: 1 February 2025 / Revised: 7 March 2025 / Accepted: 11 March 2025 / Published: 13 March 2025

Abstract

:
This study introduces an innovative two-stage framework for monitoring and diagnosing high-dimensional data streams with sparse changes. The first stage utilizes an exponentially weighted moving average (EWMA) statistic for online monitoring, identifying change points through extreme value theory and multiple hypothesis testing. The second stage involves a fault diagnosis mechanism that accurately pinpoints abnormal components upon detecting anomalies. Through extensive numerical simulations and electron probe X-ray microanalysis applications, the method demonstrates exceptional performance. It rapidly detects anomalies, often within one or two sampling intervals post-change, achieves near 100% detection power, and maintains type-I error rates around the nominal 5%. The fault diagnosis mechanism shows a 99.1% accuracy in identifying components in 200-dimensional anomaly streams, surpassing principal component analysis (PCA)-based methods by 28.0% in precision and controlling the false discovery rate within 3%. Case analyses confirm the method’s effectiveness in monitoring and identifying abnormal data, aligning with previous studies. These findings represent significant progress in managing high-dimensional sparse-change data streams over existing methods.

1. Introduction

In recent years, the rapid advancement of modern sensing and data acquisition technologies has enabled the real-time collection of high-dimensional data streams across diverse industrial applications. Such data streams often exhibit shifts in statistical properties, transitioning from an in-control (IC) state to an out-of-control (OC) state due to anomalies. For instance, in electron probe X-ray microanalysis, real-time monitoring of compositional changes in materials requires the detection of subtle shifts in high-dimensional spectral data, which may occur either abruptly or gradually. Addressing such scenarios requires robust online monitoring and fault diagnosis methods tailored for high-dimensional and sparsely changing data streams.
Before proceeding, for an intuitive explanation, we present a simple example to illustrate IC and OC observations in a multidimensional data stream. Figure 1 illustrates a p-dimensional ( p = 10 ) time series data stream recorded over t = 300 time points. The IC observations, denoted as X i , are independently and identically distributed (i.i.d.) multivariate normal random variables with the mean vector μ 0 = ( 0 , 0 , , 0 ) p , and a covariance matrix Σ p = I p . After a change point τ = 150 , the p s = 2 components of the observations experience an anomaly, resulting in a shift in the data stream. The figure presents two distinct scenarios of mean-shift anomalies in a multidimensional data stream: the left subfigure depicts a mean abrupt shift scenario, where the change occurs suddenly after a time point τ , while the right subfigure illustrates a mean gradual shift scenario, where the shift changes in a duration of about a 50-time-point interval, ultimately stabilizing when t > 200 .
In the examples mentioned, the change points in the data stream are easily identifiable due to the low dimensionality, allowing easy visual detection. However, in real-world applications, monitoring complex and high-dimensional data streams is more challenging. Anomalies may only appear in a small subset of variables at any given time, and the intricate interactions and high dimensionality of these variables complicate online monitoring and fault detection. This is especially true when pinpointing the exact time and specific elements that show abnormal variations. For instance, in semiconductor manufacturing, subtle sensor anomalies may indicate critical equipment issues, while in energy infrastructure, gradual pressure drifts could signal potential failures. Timely detection and diagnosis of such anomalies are crucial for operational safety and efficiency. Yet, the high dimensionality, sparsity of changes, and complex variable interdependencies present significant challenges to existing monitoring and diagnostic frameworks.
In order to overcome these challenges, a series of articles have been developed on the monitoring and diagnosis of high-dimensional data streams. Here, we roughly divide them into the following three categories.
(1) Statistical and Control Chart-Based Methods. Statistical and control chart-based methods are widely employed for monitoring high-dimensional data streams, with prominent techniques including Hotelling’s T 2 , Multivariate Exponentially Weighted Moving Average (MEWMA), Multivariate Cumulative Sum (MCUSUM), and principal component analysis (PCA)-based monitoring. Zou et al. (2015) [1] introduce a robust control chart utilizing local CUSUM statistics, demonstrating effective detection capabilities across both sparse and dense scenarios, albeit with certain limitations in parameter assumptions and implementation complexity. Ebrahimi et al. (2021) [2] developed an adaptive PCA-based monitoring method incorporating compressed sensing principles and adaptive lasso for change source identification, though its computational demands may be substantial for large-scale datasets. Li (2019) [3] proposes a flexible two-stage monitoring procedure with user-defined IC average run length (ARL) and type-I error rates, which requires meticulous calibration of control limits for both stages. For comprehensive reviews of online monitoring and diagnostic methods based on statistical process control (SPC), refer to studies such as [4,5,6,7,8,9,10,11,12].
(2) Information-Theoretic and Entropy-Based Methods. Entropy-based measures, including Shannon entropy, Renyi entropy, and approximate entropy, are widely used to assess the complexity and randomness of data streams in monitoring and diagnostics. Recent research has made significant progress in both theoretical and practical aspects of this field. Mutambik (2024) [13] develops E-Stream, an entropy-based clustering algorithm for real-time high-dimensional Internet of Things (IoT) data streams. By incorporating an entropy-based feature ranking method within a sliding window framework, the algorithm reduces dimensionality, improving clustering accuracy and computational efficiency. However, its dependence on manual parameter tuning restricts its adaptability to diverse datasets. Wan et al. (2024) [14] propose an entropy-based method for monitoring pressure pipelines using acoustic signals. Their approach combines Denoising Autoencoder (DAE) and Generative Adversarial Network (GAN) with entropy-based loss functions for noise reduction. While effective, the adversarial training process requires substantial computational resources, limiting its feasibility for real-time applications in resource-constrained environments. For a deeper understanding of recent advancements in entropy-based theories, methods, and applications for data stream monitoring and diagnostics, see [15,16,17,18,19]. These studies highlight the latest developments in this area.
(3) Machine Learning and Data-Driven Methods. In recent years, machine learning and data-driven methods have become increasingly important for monitoring and diagnostics in various fields. Online learning techniques, such as Online Support Vector Machines (OSVMs), Online Random Forests, and Stochastic Gradient Descent (SGD), have been successfully applied to real-time monitoring tasks. Additionally, deep learning models like Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and autoencoders have shown strong performance in capturing temporal patterns and detecting anomalies in high-dimensional data streams. Recent developments in this area include OLFA (Online Learning Framework for sensor Fault diagnosis Analysis) by Yan et al. (2023) [20], which provides a solution for real-time fault diagnosis in autonomous vehicles by addressing the non-stationary nature of sensor faults. Chen et al. (2024) [21] present a real-time fault diagnosis method for multisource heterogeneous information fusion based on two-level transfer learning (TTDNN), focusing on gearbox dataset, bearing dataset, etc. Another important contribution is from Li et al. (2025) [22], who propose a novel framework combining Deep Reinforcement Learning (DRL) with SPC for online monitoring of high-dimensional data streams under limited resources. Although their approach shows good scalability through deep neural networks, the computational demands of the Double Dueling Q-network training process may be a limitation for real-time applications with strict computational constraints. For comprehensive reviews of machine learning-based online monitoring and diagnostic methods, see studies such as [21,23,24,25,26,27].
Despite these advancements, three key limitations remain. First, many SPC methods face computational challenges in ultra-high dimensions due to reliance on covariance matrix inversions or extensive hypothesis testing. Second, entropy-based techniques, though effective for noise reduction, show limited sensitivity to sparse changes where only few variables deviate, while machine learning approaches often demand substantial labeled data that may be unavailable. Third, current diagnostic procedures typically emphasize either speed or accuracy, struggling to achieve both real-time alerts and precise root-cause identification. This paper bridges these gaps by proposing a two-stage monitoring and diagnostic framework designed for high-dimensional data streams with sparse mean shifts. The first issue is online monitoring, which aims to check whether the data stream has changed over time points and accurately estimate the change point τ after the change occurs. The second issue is the diagnosis of the fault, which aims to identify the components that have undergone abnormal changes; it will help eliminate the root causes of the change.
As the EWMA statistic is sensitive to small changes and includes the information of previous samples, we use it for online monitoring [28]. At the first stage, we construct a monitoring statistic based on the EWMA statistic. Under certain conditions, we prove the asymptotic distribution of the monitoring statistic and then obtain the threshold of online monitoring. At any time point, if the monitoring statistic value is greater than the threshold, a real-time alert is issued, indicating that the data stream is out of control. Then, we move into the second stage, to identify the components that have undergone abnormal changes over time points, which belongs to the scope of fault diagnosis. Specifically, when the monitoring procedure proposed in the first stage alerts that the data stream is abnormal, we allow the monitoring procedure to continue running for a while, and obtain more sample data, then we conduct multiple hypothesis tests on the data before and after the alarm to determine which components of the data stream have experienced anomalies. The desired two-stage monitoring and diagnosis scheme in this study can not only correctly estimate the time point in time but also accurately identify the OC components of the data stream.
The contributions of this paper are as follows: (1) Methodological Innovation: This study introduces an EWMA-based monitoring statistic that leverages max-norm aggregation to enhance sensitivity to sparse changes. After its asymptotic distribution is derived, a dynamic thresholding mechanism is established for real-time anomaly detection. (2) Integrated Diagnosis: Upon detecting an OC state, the framework employs a delayed multiple hypothesis testing procedure to isolate anomalous components. This approach minimizes false discoveries while accommodating temporal dependencies in post-alarm data. (3) Industrial Applicability: Our method is validated through electron probe X-ray microanalysis (EPXMA), a critical technique in materials science for real-time composition monitoring. The method’s efficiency in handling high-dimensional sparse shifts makes it equally viable for IoT-enabled predictive maintenance and industrial process control.
This paper is structured as follows: Section 2 formulates the monitoring and diagnosis problem, detailing the EWMA statistic, fault isolation strategy, and performance metrics. Section 3 evaluates the method via simulations, benchmarking against state-of-the-art techniques. Section 4 demonstrates its practicality through an EPXMA case study, highlighting its superiority in detecting micron-scale material defects. Conclusions and future directions are presented in Section 5.

2. Methodology

2.1. Problem Formulation

Consider a high-dimensional data stream denoted by X 1 , X 2 , , X t , , where each observation X t = ( x 1 t , x 2 t , , x p t ) R p denotes a p-dimensional vector at time point t. In the IC state, the observations are generalized from N ( μ 0 , Σ p ) , with μ 0 = ( μ 01 , , μ 0 p ) R p and Σ p = ( σ j k ) p × p , j , k = 1 , , p . However, following an unknown change point τ , the process experiences a mean shift from μ 0 to μ a , transitioning to an OC state. In practical applications, process changes are typically driven by only a small subset of components within the data stream. Consequently, the shift vector μ a μ 0 exhibits sparsity, with most elements being zero and only a few non-zero components. It is important to note that the transition from μ 0 to μ a can occur either abruptly or gradually, representing two distinct scenarios of mean-shift anomalies: abrupt mean shifts and gradual mean shifts.
This study aims to address two critical challenges in high-dimensional data stream monitoring: The first one is to propose an effective online monitoring method that can accurately estimate the location of the change point. Once the data stream changes abnormally, it can issue an alarm in time. The second task is to accurately identify the components of abnormal changes in high-dimensional data streams while controlling the error rate.

2.2. The EWMA Statistic

As is well known, the EWMA statistic is sensitive to small changes, and it includes information from previous samples [29]. Thus, an EWMA statistic is chosen in this study to monitor a high-dimensional data stream, the EWMA sequence is defined by
Y t = γ ( X t μ 0 ) + ( 1 γ ) Y t 1 , t = 1 , 2 ,
where γ [ 0 , 1 ] is a weigh parameter. As described in Feng et al. (2020) [30], a smaller value of γ leads to a quicker detection of smaller shifts, and the detailed comparison results are provided in the numerical simulation section.
Let Y t = ( y 1 t , y 2 t , , y p t ) , the j-th component of Y t is denoted as
y j t = γ ( x j t μ 0 j ) + ( 1 γ ) y j ( t 1 ) , j = 1 , , p , t = 1 , 2 , ,
where Y 0 = ( 0 , 0 , , 0 ) is prearranged, i.e., y j 0 = 0 for j = 1 , , p . It is obvious that when the data stream is IC, we have
E ( Y t ) = ( 0 , 0 , , 0 ) , C o v ( Y t ) = ( γ 2 γ ) ( 1 ( 1 γ ) 2 t ) Σ .
As t , we have ( 1 ( 1 γ ) 2 t ) 1 , then the asymptotic distribution of Y t is as follows,
Y t N p ( 0 , γ 2 γ Σ ) , t .
Correspondingly, the j-th component of Y t obeys the following normal distribution, i.e.,
y j t N ( 0 , γ 2 γ σ j j ) , t ,
where σ j j is the j-th diagonal element of Σ . However, once the mean value has undergone abnormal changes, Y t does not follow the multivariate normal distribution mentioned above.

2.3. Online Monitoring Procedure

Without loss of generality, we assume that the mean vectors of Y 1 , Y 2 , Y 3 , …, Y t , …mentioned above are μ 1 , μ 2 , …, μ t ,…, respectively. Thus, the online monitoring of high-dimensional data streams can be considered as a hypothesis testing change point detection problem, with the null and the alternative hypothesis are
H 0 t : μ 1 = μ 2 = , , = μ t = 0 , H 1 t : τ > 0 , μ 1 = μ 2 = , , = μ τ 1 = 0 , but μ τ + 1 , , μ t are not equal to 0 .
It is obvious that the monitored data stream is IC up to time point t if H 0 t is accepted, but a rejection of H 0 t indicates that some components of the stream have changed abnormally after the change point τ .
Therefore, we construct the test statistic as the following max-norm information:
M t = max 1 j p y j t 2 γ 2 γ σ j j , t = 1 , 2 , .
Intuitively, the larger the value of M t , the more likely the observation undergoes abnormal at time point t. If M t exceeds a threshold, it implies that an anomaly occurred.
It should be noted that the mean and covariance values of Y t are unknown, so the reasonable estimators of μ and Σ are needed. Suppose X 1 , X 2 , , X m be the initial IC data stream, the sample mean and covariance of these m observations are defined as μ ^ 0 = 1 m t = 1 m X t and Σ ^ = 1 m 1 t = 1 m ( X t μ ^ 0 ) ( X t μ ^ 0 ) , respectively.
Consequently, the estimated max-normal monitoring statistic is defined as
M ˜ t = max 1 j p y j t 2 γ 2 γ σ ^ j j , t = m + 1 , m + 2 , .
where σ ^ j j is the j-th main diagonal element of Σ ^ .
For simplicity, let z j t = y j t γ 2 γ σ ^ j j , j = 1 , , p , it is obvious that, z j t N ( 0 , 1 ) , t . The monitoring statistic can be rewritten as
M ˜ t = max 1 j p z j t 2 .
Let Z t = ( z 1 t , z 2 t , , z p t ) , then Z t is a high-dimensional normal random vector with the mean vector of zero and the covariance matrix of Ω = ( ω j k ) p × p for 1 j , k p , where the diagonal ω j j = 1 for 1 j p . In order to obtain the threshold of online monitoring, we need to prove the asymptotic distribution of the statistic M ˜ t when the null hypothesis holds.
To this end, this requires the following two assumptions:
A.1. 
max 1 j k p | ω i , j | r < 1 for some constant 0 < r < 1 ;
A.2. 
max 1 j k p ω i , j 2 C 0 for some constant C 0 .
Assumptions (A.1) and (A.2) are mild and common assumptions that are widely used in the high-dimensional means test under dependence. Under the above two assumptions, Proposition 1 studies the asymptotic distribution of M ˜ t when the null hypothesis holds.
Proposition 1. 
Under the assumptions of (A.1) and (A.2), when the null hypothesis H 0 t holds, for any z R , M ˜ t follows the asymptotic distribution
P ( M ˜ t 2 log ( p ) + log ( log ( p ) ) z ) exp { 1 π exp { z 2 } } , as t , p .
Proposition 1 shows that M ˜ t 2 log ( p ) + log ( log ( p ) ) follows the type I extreme value distribution with the cumulative distribution function exp { 1 π exp { z 2 } } . The proof of this proposition is similar to that of Lemma 6 in Cai et al. (2014) [31], so it is omitted here.
Thus, the data stream shows abnormal change at time point t, if
M ˜ t > L α ,
where L α = 2 log p log ( log p ) + q α is the threshold at a given significance level α , and q α = log ( π ) 2 log ( log ( 1 α ) 1 ) .
It should be noted that not all the observations of M ˜ t > L α are considered as change points, and these points may also be outliers. Therefore, when M ˜ t > L α occurs, we continue to monitor for a while to obtain some more observations and if all the values of M ˜ t are greater than L α . Thus, the change point τ can be estimated naturally by
τ ^ = inf { t : t > t ic | M ˜ t + i > L α , i = 0 , 1 , , n } .
In this case, we claim that the data stream changes abnormally after time point τ , and the process enters into an OC state.

2.4. Fault Diagnosis Procedure

To this end, we first keep monitoring a p-dimensional data stream by using the monitoring statistics M ˜ t . After the OC alarm is issued, we enter into the second stage of statistical fault diagnosis to identify the components that have undergone abnormal changes. To this end, we continue to monitor the data stream for a while and collect some more OC observations. The diagnostic problem can be regarded as the following multiple hypothesis testing. The null hypothesis and the alternative hypothesis are
H 0 j t : μ 1 j μ 0 j = 0 , v . s . H 1 j t : μ 1 j μ 0 j 0 , for j = 1 , , p , t > τ ^ .
It is obvious that if H 0 j t is accepted, the j-th component of the stream is IC; otherwise, a rejection H 0 j t indicates that the j-th component of the stream is OC when t > τ ^ .
To address this problem, a diagnostic statistic W ˜ j t is proposed in the following form,
W ˜ j t = y ¯ j t 2 γ n ( 2 γ ) σ ^ j j , j = 1 , , p , t > τ ^ ,
where y ¯ j t = 1 n t = τ ^ + 1 τ ^ + n y i t is the mean value of the observed samples, and the sample variance σ ^ j j is defined as that in Equation (6). It can be seen that if the value of W ˜ j t is large, it indicates that abnormal changes may occur in the data stream. Therefore, it can be inferred that the null hypothesis in Equation (11) is rejected if W ˜ j t too large to exceed a threshold.
In many applied multiple testing problems, there are many test strategies, such as the traditional critical value test methods and some false discovery rate (FDR) controlling procedures [32,33,34]. It is obvious that E ( y ¯ j t ) = 0 , V a r ( y ¯ j t ) = γ n ( 2 γ ) σ ^ j j , and y ¯ j t γ n ( 2 γ ) σ ^ j j N ( 0 , 1 ) as n under the null hypothesis H 0 j t . Thus, the diagnostic statistic W ˜ j t follows Chi-squared distribution
W ˜ j t χ 2 ( 1 ) , as n .
In the diagnosis process, we should ensure that the global per-comparison error rate (PCER) can be controlled within a prespecified level α . Without loss of generality, we assume the first p 0 components of the high-dimensional observations are IC, and the remaining p p 0 components are OC when t = τ + 1 , τ + 2 , , τ + n . The PCER can be defined as the expected number of type-I errors divided by the number of hypotheses, that is,
PCER t = 1 p j = 1 p 0 P H 1 t ( W ˜ j t > c h | t > τ ^ ) α .
Therefore, we need to determine c h such that the Equation (14) is satisfied. The choice c h can be determined through numerical simulation. More specifically, we let the global per-comparison error rate equals α , that is,
1 p j = 1 p P H 1 t ( W ˜ j t > c h | t > τ ^ ) = α .
It is obvious that when Equation (15) is satisfied, the Equation (14) above is naturally controlled within α and holds true.
For a given initial threshold value of c h , (such as c h = χ 1 α 2 ( 1 ) ), the proportion of W ˜ j t > c h is denoted as p ^ when t > τ ^ . Then, we repeat it B times ( B = 2000 ), the average values of p ^ over B replications can be used to approximate 1 p j = 1 p P H 1 t ( W ˜ j t > c h | t > τ ^ ) = α . Conversely, if Equation (15) is satisfied, we use numerical methods to obtain an approximate value, denoted as c ^ h . Therefore, if W ˜ j t is not larger than c ^ h , we can control the global per-comparison error rate at a level α , otherwise if W ˜ j t is larger than c ^ h , this indicates that the j-th component of the data stream is abnormal.

2.5. Algorithm Steps

The main algorithm is summarized as the following steps.
Algorithm 1: The main algorithm steps of monitoring and diagnosis
    O n l i n e M o n i t o r i n g S t a g e
1    Given in-control data streams X ic ;
2    Calculate the sample mean X ¯ ic of the IC observations;
3    Observe p-dimensional data stream X t , the EWMA statistic as calculate
       Y ˜ t = γ ( X t X ¯ ic ) + ( 1 γ ) Y t 1 , t = 1 , 2 , ;
4    Construct global monitoring statistic M ˜ t = max 1 j p y ˜ j t 2 γ 2 γ σ ^ j j ;
5    If M ˜ t > L α , then raise the OC alarm, the estimated change point is
       τ ^ = inf { t : t > t ic | M ˜ t + i > L α , i = 0 , 1 , , n } .
    F a u l t D i a g n o s i s S t a g e
1    Collect n observations after the OC alarm signal occurs;
2    Construct diagnostic statistic W ˜ j t = y ¯ j t 2 γ n ( 2 γ ) σ ^ j j ,   j = 1 , , p , t = 1 , 2 , , τ ^ + n ;
3    Record the observations set with M ˜ t L α for t = 1 , 2 , , τ ^ + n as H,
      let p ^ = 1 p j = 1 p P H 0 t ( W ˜ j t > c h | M ˜ t > L α ) ;
4    Randomly sample B times from data set H, the average of p ^ over B replications
      is used to approximate c ^ h ;
5    Obtain J ^ t = { j : W ˜ j t > c ^ h | M ˜ t > L α } , the diagnosed variables are the ones
      corresponding to the indicators of J ^ t .

2.6. Performance Evaluation Measures

In the online monitoring phase, in order to measure the performance of the proposed monitoring method, we choose the following three metrics: The first is the accuracy of the change point estimation, that is, when the estimated value τ ^ is close to the real change point τ , the change point estimation is accurate. The second is the type-I error rate of online monitoring, which is the proportion of normal observations mistakenly identified as outliers. The last one is the power value of online monitoring, that is, the proportion of abnormal observations that are accurately identified. The type-I error rate can be well controlled; meanwhile, the larger of the power value, approaching 100%, the better of the proposed monitoring method.
In the fault diagnosis stage, we define two relevant metrics to verify the effectiveness of the method: One is the false positive rate (FPR), a measure of the proportion of not-faulty variables that are incorrectly identified as faulty variables. The other is the true positive rate (TPR), defined as the percentage of the ratio of the number of variables that are correctly detected as faulty over the number of all not-faulty variables.
Without loss of generality, after an change point τ , the IC components of the stream are denoted as J I C = { j : j = 1 , 2 , , p 0 } , and the OC components set is J O C = { j : j = p 0 + 1 , p 0 + 2 , , p } . At any time point t, the estimated OC components are defined as J ^ t = { j : W ˜ j t > c ^ h | M ˜ t > L α } . Thus, the TPR and FPR values at time point t are defined as
T P R t = | J ^ t J O C | | J O C | · 100 % ; F P R t = max { 0 , | J ^ t J O C | } | J I C | · 100 % ,
where | A | represents the number of elements in set A. Correspondingly, up to time point t, the average TPR and FPR values can be defined as follows:
T P R = 1 t τ ^ i = τ ^ + 1 t T P R i ; F P R = 1 t τ ^ i = τ ^ + 1 t F P R i .
In fact, we should ensure that the higher the average TPR value, the better; meanwhile, we should also ensure that the higher the FPR value, the better the performance.

3. Simulation Studies

In this section, we conduct a comprehensive simulation study to evaluate the performance of our proposed monitoring and diagnosis framework, comparing it with several existing methodologies. For the experimental setup, we generate IC data streams from a multivariate normal distribution N ( μ 0 , Σ p ) . To assess the effectiveness of our approach, we consider two distinct OC scenarios characterized by mean shifts:
(i) Mean abrupt shift: the OC stream follows N ( κ μ 1 , Σ p ) , where κ is a parameter used to measure the degree of drift, and μ 1 is a p-dimensional mean vector.
(ii) Mean gradual shift: the OC stream follows N ( κ μ t , Σ p ) , where the mean vector μ t evolves progressively over time according to the relationship, μ t + 1 = μ t + δ t , with δ t representing a small incremental change vector over time.
The two mean shift scenarios described above may impact only a subset of dimensions within the data. In the case of an abrupt mean shift, it is assumed that the initial mean vector μ 0 is equal to 0 , while the post-shift mean vector μ 1 is a sparse p-dimensional vector. Specifically, most of its components remain zero, with only a few components assuming a value of 1. In the context of a gradual mean shift, the change in the mean vector is similarly sparse. That is, the incremental change δ t affects only a few components, while the majority of its elements remain zero. The duration of the mean shift in each affected dimension is denoted by d, and the magnitude of the change at each time point is given by k / d , reflecting a linear trend of increase or decrease. Over time, after a period of continuous gradual drift, the mean vector stabilizes, and the data stream enters a post-drift phase.
In both scenarios, the abnormal change occurs at a specific time point τ . Prior to τ , (i.e, when t < τ ), the process remains IC. However, for t > τ , the mean values of p s components undergo a change, while the remaining p p s components remain unchanged. This sparsity in the mean shift is a critical characteristic of high-dimensional data streams, where changes are often localized to a small subset of dimensions.
The covariance matrix Σ p , which characterizes the correlation among p-dimensional random variables, plays a crucial role in the simulation study. In the experimental design, we maintain the assumption that Σ p remains invariant before and after the change point. Specifically, we investigate three distinct covariance matrix structures:
(a) Independent structure case: Σ p = I p × p , where I p × p represents the p × p identity matrix, indicating no correlation between variables;
(b) Long-range dependence structure case: Σ p = ( ρ | k l | ) p × p for k , l = 1 , , p , with the correlation coefficient ρ fixed at 0.5. This structure exhibits slowly decaying correlations between variables;
(c) Short-range correlation case: Σ p is a block-diagonal matrix, within each block matrix, Σ b = ( ρ | k l | ) b × b , k , l = 1 , , b , where b is the block size and ρ = 0.5 , b = 10 .
The method presented in this study is fundamentally grounded in extreme value theory (EVT), henceforth referred to as the EVT method. To comprehensively evaluate its efficacy, we conducted comparative simulations with three alternative approaches, assessing their respective monitoring and/or diagnostic capabilities. The first comparative method employs an adaptive principal component (APC) selection mechanism utilizing hard-thresholding techniques, as proposed by Samaneh et al. [2]. The second method, developed by Ahmadi and Mohsen [5], incorporates a two-stage detection system combining a single Hotelling’s T 2 control chart with a Shewhart control chart, subsequently denoted as the AM method. The third approach, proposed by Aman et al. [27], presents a fault detection framework utilizing Slow Feature Analysis (SFA), specifically designed for time series models and SPC. In this study, we perform a thorough performance comparison with the Dynamic Slow Feature Analysis (DSFA) method, establishing a robust evaluation framework for our proposed methodology.
In the simulation framework, these methods are systematically applied to monitor continuous data streams and detect anomalies. The comparative analysis is performed through extensive computational experiments, with all performance metrics calculated from 1000 independent simulation runs to ensure statistical reliability and robustness.
Table 1 presents the simulated type-I error rates ( α ˜ ) and power values ( β ˜ ) obtained from the proposed online monitoring procedure under various model configurations. The simulation was conducted with parameters set at p = 200 , n i c = 150 , t = 500 , τ = 200 , α = 0.05 , and the duration of the mean shift d = 30 in Scenario (ii), while examining three levels of abnormality degree κ = 1.5 ,   2.0 ,   2.5 , respectively. The results demonstrate that when t > τ and the proportion of abnormal data stream components ( p s / p ) ranges from 10% to 25%, the empirical type-I error rates are effectively maintained around the nominal level of 5%, in most scenarios. Furthermore, the power values exhibit a positive correlation with the abnormality degree ( κ ), indicating that higher degrees of abnormality facilitate easier detection of anomalous data. Notably, in Scenario (i), the power values consistently exceed 99% across all models and proportions when κ 2.0 , demonstrating the method’s exceptional detection capability for abrupt changes. In contrast, Scenario (ii) shows relatively lower power values ranging from 86.6% to 94.0%, reflecting the increased challenge in detecting gradual changes compared with abrupt mutations. The standard deviations, presented in parentheses, indicate the stability of the monitoring results across different experimental conditions. These findings collectively validate the effectiveness of the proposed method in maintaining controlled type-I error rates while achieving high power across various abnormal data scenarios.
Figure 2 shows the change trend of the proposed monitoring statistics over time points in different model scenarios. The proportion of abnormal data stream is 5 % . The three subfigures on the left depict the mean abrupt shift model, while the three subplots on the right represent the mean gradual shift model. In the high-dimensional data stream being monitored, the real change point time τ = 400 . From the results presented in Figure 2, it is evident that in all three model scenarios, the values of the monitoring statistics exhibit a significant upward trend after the change point occurs. Notably, the monitoring statistic values in the left subplots show a more pronounced upward trend post-change point compared with the right subplots. This is attributed to the abrupt mean shift in the left subplots, which leads to a more substantial increase in the monitoring statistic values. It is important to note that before the change point occurs, the values of the monitoring statistics may occasionally exceed the threshold at certain time points. This phenomenon is likely caused by outliers in the data. Overall, Figure 2 demonstrates that the proposed monitoring statistic effectively captures the changing trends in high-dimensional data streams, providing a reliable method for detecting shifts in the data.
Table 2 presents a comprehensive comparison of three change point detection methods (EVT, A-M, and APC) under different scenarios and parameter settings. The evaluation is based on three key metrics: Bias (the absolute deviation between estimated and true change points), Sd (the standard deviation of estimators), and P j (the probability that the absolute difference between true and estimated change points is within a specified threshold j). The results demonstrate a clear trend across all methods: as the signal strength parameter κ increases from 1.0 to 2.0, the estimation accuracy improves significantly. This improvement is evidenced by the decreasing values of Bias and Sd, along with the increasing values of P j . Specifically, when κ reaches 2.0, all methods achieve their best performance, with P j values approaching or exceeding 90% in most cases.
In Scenario (i) with τ = 200 , the EVT method shows superior performance, particularly at higher κ values. When κ = 2.0 , EVT achieves a remarkably low Bias of 1.1 and Sd of 1.1, with P 3 reaching 99.5%. The APC method also demonstrates competitive performance, showing consistent improvement across different κ values. The A-M method, while showing improvement with increasing κ , generally underperforms compared with the other two methods in this scenario. For Scenario (i) with τ = 400 , similar patterns emerge, though the absolute values of the metrics are slightly different. The EVT method maintains its leading position, achieving a Bias of 1.2 and Sd of 0.8 at κ = 2.0 , with P 3 remaining at 99.5%. The APC method shows particularly strong improvement in this scenario, with P 3 increasing from 57.1% at κ = 1.0 to 92.7% at κ = 2.0 . In Scenario (ii), which represents a more challenging detection environment, all methods show higher Bias and Sd values compared with Scenario (i). However, the relative performance ranking remains consistent, with EVT maintaining its advantage, followed by APC and then A-M. Notably, even in this more difficult scenario, the EVT method achieves P 15 values above 90% when κ reaches 2.0.
These results collectively demonstrate that while all methods benefit from stronger signals (higher κ values), the EVT method consistently outperforms the others across different scenarios and parameter settings. The findings suggest that the choice of detection method should consider both the expected signal strength and the required precision of change point estimation.
The size of the smoothing parameter γ plays a crucial role in determining the performance of the EWMA-based monitoring procedure, particularly affecting the type-I error rate, detection power, and change point estimation accuracy. Table 3 presents a comprehensive simulation study evaluating these performance metrics under different γ values (0.2, 0.4, and 0.6) across various abnormality indicators ( κ = 1.0, 1.5, 2.0) and sparsity levels ( p s / p = 0.05, 0.10, 0.15) of the mean abrupt shift model. The simulation results reveal several important patterns. First, the type-I error rate remains stable around the nominal level of 5% across all γ values, demonstrating the robustness of the proposed method in maintaining false alarm control. However, the power β ˜ shows a strong dependence on γ , with higher values of γ leading to substantially reduced detection rates. For instance, when κ = 1.0 and p s / p = 0.05 , the detection power decreases from 80.4% at γ = 0.2 to only 17.8% at γ = 0.6.
The change point estimation accuracy, measured by τ ^ , also exhibits sensitivity to γ selection. Smaller γ values consistently provide more precise change point detection, particularly for smaller mean shifts ( κ = 1.0). For example, at p s / p = 0.05 and κ = 1.0, the estimation error increases from 9.5 (209.5–200) at γ = 0.2 to 90.1 at γ = 0.6. This pattern is particularly pronounced for smaller κ values, confirming the EWMA statistic’s sensitivity to small mean drifts. Interestingly, as κ increases, the influence of γ on both detection power and change point estimation diminishes. At κ = 2.0, even with γ = 0.6, the method achieves detection power above 85% across all sparsity levels, and the change point estimation error remains within 5.5 time units. This suggests that while γ selection is critical for detecting small changes, its impact becomes less significant when dealing with larger mean shifts. The results also indicate that higher sparsity levels ( p s / p ) generally improve detection performance, particularly for smaller γ values. This pattern is most evident in the κ = 1.0 scenario, where increasing p s / p from 0.05 to 0.15 improves detection power from 80.4% to 95.9% at γ = 0.2.
These findings collectively suggest that while smaller γ values (around 0.2) are preferable for detecting small changes and achieving accurate change point estimation, larger γ values may be more suitable when robustness to noise is prioritized over sensitivity to small shifts. The choice of γ should therefore be guided by the specific monitoring objectives and the expected magnitude of process changes.
Figure 3 illustrates a comparative analysis of the performance metrics, including power values and type-I error rates, across three monitoring methods under varying κ parameter values, evaluated against three distinct covariance matrix structures in the mean abrupt shift scenario. The left panel of the figure demonstrates the power values obtained by each method. When κ = 0.4 or κ = 0.8, the AM method exhibits a slight advantage over our proposed EVT method. However, as κ increases to 1.2 or beyond, the EVT method significantly outperforms both the AM and APC methods in terms of power values. The right panel of Figure 3 focuses on the type-1 error rates. It is evident that the EVT method effectively controls the type-1 error rate around the nominal level of 5%. In contrast, the AM method fails to maintain control over the type-I error rate. The APC method also shows suboptimal performance in controlling type-I errors. Overall, Figure 3 highlights the robustness and superiority of the EVT method, especially for larger values of κ , both in terms of power and type-I error control. The results underscore the limitations of the AM and APC methods in maintaining statistical control under varying conditions.
Table 4 lists the fault diagnosis results of high-dimensional data streams by using different diagnostic methods. We have calculated the values of two main indicators here: one is the correct recognition rate, and the other is the incorrect recognition rate. At the same time, we also considered the value of change point estimation during the online monitoring phase. The results show that as k increases, the change point estimation becomes closer to the true value. Regardless of the proportion of p s being equal to 5%, 10%, or 15%, the EVT method is significantly better than the other two methods. Furthermore, from the diagnostic results, the TPR values of the EVT method are close to 100%. It should be noted that the AM method and APC method also have good diagnostic results. As pointed out by Vilenchik et al. (2019) [35], PCA-based approaches, such as APC, face a problem of difficult interpretation, especially for high-dimensional data. Considering this, as well as the results of point estimation, our method is, overall, superior to the other two methods.
Recently, Aman et al. [27] proposed a novel methodology for fault detection based on Slow Feature Analysis (SFA), specifically tailored for time series models and SPC. Their comprehensive analysis demonstrates that Kernel SFA (KSFA) and Dynamic SFA (DSFA) significantly outperform traditional methods by offering enhanced sensitivity and fault detection capabilities.
To further evaluate the effectiveness of fault detection methods, we compare the proposed EVT online monitoring method with the DSFA method. The results of this comparison are presented in Figure 4 and Figure 5. It is important to note that, as observed in prior analyses, the DSFA method struggles to effectively monitor high-dimensional and ultra-high-dimensional sparsely changing time series data streams, primarily due to the challenges posed by the curse of dimensionality. To facilitate a more meaningful comparison, we select a data stream with a slightly lower dimensionality. Specifically, we set the parameters as follows: p = 50 , t = 300 , n i c = 150 , τ = 200 , with 30 components exhibiting abnormal changes. We examine two scenarios of mean change: (i) mean abrupt shift and (ii) mean gradual shift. In the gradual shift scenario, the duration of the change is set to d = 30 time units.
Figure 4 illustrates the performance of both the EVT and DSFA methods under these two scenarios. The left subfigure depicts the mean abrupt shift case, while the right subfigure represents the mean gradual shift case. In both subfigures, the gray lines denote IC data, and the red lines indicate OC data. From Figure 4, it is evident that both methods exhibit significant changes in monitoring statistics after the change point, indicating their ability to detect faults. In the first scenario (mean abrupt shift), the EVT method promptly reflects the abnormal changes in the data stream, with the monitoring statistics clearly exceeding the threshold at the change point. Moreover, in the absence of changes, most of the monitoring statistics remain below the threshold, demonstrating the method’s robustness. In the second scenario (mean gradual shift), both methods exhibit a delay in detection due to the gradual nature of the change. However, the DSFA method incorrectly identifies a substantial amount of normal data as abnormal before the change point occurs, highlighting its limitations in handling gradual shifts. In contrast, the EVT method demonstrates superior performance by more accurately distinguishing between normal and abnormal data, even in the presence of gradual changes.
Overall, the results in Figure 4 underscore the advantages of the EVT method over the DSFA method, particularly in terms of timely and accurate fault detection in both abrupt and gradual change scenarios. This comparison reinforces the effectiveness of the EVT approach for online monitoring of time series data streams.
Figure 5 illustrates the change trends in monitoring power and type-I error rates for the EVT and DSFA methods as the degree of abnormality κ increases from 0.4 to 4.0. The figure examines two types of mean changes: abrupt and gradual shifts. The top two subplots (a) and (b) depict the performance of the two methods in terms of monitoring power and type I error under abrupt mean changes, while the bottom two subplots (c) and (d) show the corresponding trends under gradual mean changes.
From Figure 5, it is evident that the monitoring power of both methods increases with the rise in κ , as shown in the left subplots. However, the EVT method consistently outperforms the DSFA method, demonstrating superior monitoring efficiency. Specifically, the power values monitored by the EVT method are significantly higher than those of the DSFA method, regardless of whether the mean change is abrupt or gradual. This indicates that the EVT method is more effective in detecting anomalies as κ increases. The right subplots focus on the type-I error rates. The EVT method effectively controls the error rate around the 5% threshold, maintaining reliability in false positive detection. In contrast, the DSFA method struggles to control the type-I error, with many monitoring statistics exceeding the threshold before the change point occurs. This aligns with the results observed in Figure 4, further highlighting the robustness of the EVT method in managing error rates. In summary, Figure 5 demonstrates that while both methods improve in monitoring power with increasing κ , the EVT method is significantly more effective and reliable, particularly in controlling type-I errors, making it a preferable choice for detecting mean changes in data streams.

4. Real Data Analysis

To validate the effectiveness of the proposed methodology, we apply our monitoring and diagnostic framework to a real-world industrial dataset involving archaeological glass vessel analysis. This dataset simulates a typical high-dimensional process monitoring scenario in material manufacturing industries. The data originates from Electron Probe X-ray Microanalysis (EPXMA) measurements of sixteenth- to seventeenth-century glass artifacts [36]. Previous studies have extensively analyzed this dataset for anomaly detection, including Hubert et al. (2005) [37], who applied RobPCA and identified row-wise outliers, and Hubert et al. (2019) [38], who employed MacroPCA to flag anomalous samples.
In industrial practice, EPXMA serves as a critical tool for quality control in glass production, semiconductor manufacturing, and metallurgy, where real-time monitoring of material composition ensures product consistency and enables early anomaly detection. For instance, modern glass manufacturing requires precise detection of elemental composition deviations caused by raw material impurities or furnace temperature fluctuations, which could otherwise lead to defective batches. Our case study treats the glass artifact data as a high-dimensional production process stream, containing 180 sequential time points (analogous to production cycles) and 750 spectral intensity variables. These EPXMA intensities reflect elemental composition characteristics comparable to real-time spectroscopic monitoring in industrial production lines. Each time point corresponds to a manufactured batch, with variables representing critical quality indicators such as SiO2, CaO, and trace element concentrations.
During preliminary analysis, we excluded 13 variables with missing values or measurement artifacts (e.g., sensor dropout events), retaining 737 process variables. This preprocessing aligns with industrial protocols for dynamically excluding faulty sensors from monitoring systems. Shapiro-Wilk normality tests reveal non-normal distributions for most components (p-values < 0.01). Following previous studies, we resampled the first 30 IC observations 100 times to generate a 3000-sample IC reference set. All IC observations served as historical data, with 180 subsequent observations used for testing. To approximate normality assumptions, we applied an inverse normal transformation: Φ 1 ( F ^ j ( X j t ) ) for j = 1 , , 737 , t = 1 , , 180 , where F ^ j ( X j t ) denotes the empirical cumulative distribution function for the j-th IC component. Notably, this marginal transformation does not enforce joint normality.
We implement our online monitoring framework with a control limit α = 0.05 and weight parameter γ = 0.4 . As shown in Figure 6, the system detects an OC state at t = 57 , indicating a systemic process deviation. This finding corroborates Hubert et al. (2019) [38], who identified similar anomalies using offline Robust PCA methods. In industrial practice, such detection would trigger immediate production checks for potential causes like raw material contamination or equipment malfunctions.
In the second stage, to identify the root causes of data stream anomalies, we perform a detailed analysis of observations within a defined window ( ω = 5), covering five consecutive time points (58–62) after the change point. By using the fault diagnosis Algorithm 1 proposed in this study, we demonstrate significant intensity shifts in 36 variables, suggesting potential anomalies. While the specific chemical elements corresponding to the spectral data remain unidentified, we postulate that these shifts may indicate variations in certain chemical components, particularly those related to potassium (K) and lead (Pb) oxides. In an industrial setting, this precise diagnostic capability facilitates targeted quality adjustments for specific deviations, such as changes in K/Pb ratios due to contaminated raw materials. This diagnostic method marks a substantial improvement over conventional approaches that simply identify ‘abnormal batches’ without offering actionable insights for operational adjustments.

5. Conclusions and Discussion

This study develops a statistically rigorous framework for online monitoring and fault diagnosis in high-dimensional data streams with sparse anomalies. The proposed two-stage methodology integrates an EWMA-based monitoring scheme with the diagnostic Algorithm 1, demonstrating quantifiable improvements over conventional approaches. Numerical simulations validate that the monitoring stage achieves rapid anomaly detection (typically within 1 or 2 sampling intervals post-change) while maintaining type-I error rates at approximately 5%, aligning with conventional control limits. The diagnostic stage exhibits exceptional precision, attaining 99.1% identification accuracy for 200-dimensional anomaly streams and reducing false discovery rates to below 3–28.0% improvement over conventional approaches. These quantifiable advancements address critical limitations in existing methods, particularly in balancing sensitivity to sparse changes with stringent error control. This method demonstrates effectiveness in monitoring high-dimensional data streams characterized by abnormal mean changes.
However, its capability to detect anomalies is limited in scenarios involving covariance changes, indicating certain constraints in capturing diverse types of abnormal patterns. Consequently, the proposed approach imposes specific requirements on data quality, particularly regarding data distribution, to satisfy its theoretical assumptions. Moreover, it should be noted that several algorithmic parameters require careful calibration based on actual data characteristics in practical applications. Specifically, the transformation parameter γ and the diagnostic window width ω need appropriate adjustment according to the data quality. Future research should focus on extending the framework to address these constraints. Adaptive covariance estimation techniques and robust statistical formulations could enhance robustness to non-Gaussian noise and covariance shifts. Computational optimizations during initialization may improve scalability for ultra-high-dimensional systems. Furthermore, investigating adaptive parameter selection strategies and relaxing independence assumptions could broaden applicability to correlated anomaly patterns. Despite these limitations, the proposed method establishes a principled foundation for high-dimensional stream analysis, particularly in applications demanding sparse fault detection with controlled FDR.

Author Contributions

Conceptualization, T.W. and Z.L.; methodology, T.W., F.Z., and Z.L.; software, T.W.; validation, T.W. and Y.G.; formal analysis, T.W.; investigation, Y.G.; resources, Y.G.; writing—original draft preparation, T.W. and Z.L.; visualization, Y.G.; supervision, F.Z. and Z.L.; funding acquisition, T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by National Science and Technology Major Project (2022ZD0114801), National Nature Science Foundation of China (11926347), Qinglan Project of Jiangsu Province of China (2022), Huai’an City Science and Technology Project of China (HAB202357), Natural Science Research of Jiangsu Higher Education Institutions of China (21KJD110003, 23KJB630003) and Education Department of Jilin Province of China (JJKH20220534KJ). The APC was funded by Qinglan Project of Jiangsu Province of China (2022) and Huai’an City Science and Technology Project of China (HAB202357).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this paper will be made available by the authors upon request.

Acknowledgments

The authors are grateful to the editor and two anonymous referees for their valuable comments that helped to improve the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
SPCstatistical process control
EWMAexponentially weighted moving average
CUSUMcumulative sum
ICin-control
OCout-of-control
ARLaverage run length
FDRfalse discovery rate
PCERper-comparison error rate
FPRfalse positive rate
TPRtrue positive rate
EVTextreme value theory
DSFAdynamic slow feature analysis
PCAprincipal component analysis
APCadaptive principal component

References

  1. Zou, C.; Wang, Z.; Zi, X.; Jiang, W. An efficient online monitoring method for high-dimensional data streams. Technometrics 2015, 57, 374–387. [Google Scholar] [CrossRef]
  2. Ebrahimi, S.; Ranjan, C.; Paynabar, K. Monitoring and root-cause diagnostics of high-dimensional data streams. J. Qual. Technol. 2020, 54, 20–43. [Google Scholar] [CrossRef]
  3. Li, J. A two-stage online monitoring procedure for high-dimensional data streams. J. Qual. Technol. 2019, 51, 392–406. [Google Scholar] [CrossRef]
  4. Li, W.; Xiang, D.; Tsung, F.; Pu, X. A diagnostic procedure for high-dimensional data streams via missed discovery rate control. Technometrics 2020, 62, 84–100. [Google Scholar] [CrossRef]
  5. Ahmadi-Javid, A.; Ebadi, M. A two-step method for monitoring normally distributed multi-stream processes in high dimensions. Qual. Eng. 2020, 33, 143–155. [Google Scholar] [CrossRef]
  6. Liu, K.; Mei, Y.; Shi, J. An adaptive sampling strategy for online high-dimensional process monitoring. Technometrics 2015, 57, 305–319. [Google Scholar] [CrossRef]
  7. Huang, T.; Kandasamy, N.; Sethu, H.; Stamm, M. An efficient strategy for online performance monitoring of datacenters via adaptive sampling. IEEE Trans. Cloud Comput. 2019, 7, 155–169. [Google Scholar] [CrossRef]
  8. Xiang, D.; Li, W.; Tsung, F.; Pu, X.; Kang, Y. Fault classification for high-dimensional data streams: A directional diagnostic framework based on multiple hypothesis testing. Nav. Res. Logist. (NRL) 2021, 68, 973–987. [Google Scholar] [CrossRef]
  9. Li, J. Efficient global monitoring statistics for high-dimensional data. Qual. Reliab. Eng. Int. 2020, 36, 18–32. [Google Scholar] [CrossRef]
  10. Xian, X.; Zhang, C.; Bonk, S.; Liu, K. Online monitoring of big data streams: A rank-based sampling algorithm by data augmentation. J. Qual. Technol. 2021, 53, 135–153. [Google Scholar] [CrossRef]
  11. Colosimo, B.; Jones-Farmer, L.; Megahed, F.; Paynabar, K.; Ranjan, C.; Woodall, W. Statistical process monitoring from industry 2.0 to industry 4.0: Insights into research and practice. Technometrics 2024, 66, 507–530. [Google Scholar] [CrossRef]
  12. Xiang, D.; Qiu, P.; Wang, D.; Li, W. Reliable post-signal fault diagnosis for correlated high-dimensional data streams. Technometrics 2022, 64, 323–334. [Google Scholar] [CrossRef]
  13. Mutambik, I. An entropy-based clustering algorithm for real-time high-dimensional IoT data streams. Sensors 2024, 24, 7412. [Google Scholar] [CrossRef]
  14. Wan, Y.; Lin, S.; Jin, C.; Gao, Y.; Yang, Y. Improved entropy-based condition monitoring for pressure pipeline through acoustic denoising. Entropy 2024, 27, 10. [Google Scholar] [CrossRef]
  15. Hotait, H.; Chiementin, X.; Rasolofondraibe, L. Intelligent online monitoring of rolling bearing: Diagnosis and prognosis. Entropy 2021, 23, 791. [Google Scholar] [CrossRef]
  16. Wu, H.; Yuan, R.; Lv, Y.; Stein, D.; Zhu, W. Multi-weighted symbolic sequence entropy: A novel approach to fault diagnosis and degradation monitoring of rotary machinery. Meas. Sci. Technol. 2024, 35, 106119. [Google Scholar] [CrossRef]
  17. Wang, Z.; Sun, Y. Role of entropy in fault diagnosis of mechanical equipment: A Review. Eng. Res. Express 2023, 5, 032004. [Google Scholar] [CrossRef]
  18. Hu, X.; Zhao, Y.; Yeganeh, A.; Shongwe, S. Two memory-based monitoring schemes for the ratio of two normal variables in short production runs. Comput. Ind. Eng. 2024, 198, 110690. [Google Scholar] [CrossRef]
  19. Liu, Y.; Ren, H.; Li, Z. A unified diagnostic framework via Symmetrized Data Aggregation. IISE Trans. 2024, 56, 573–584. [Google Scholar] [CrossRef]
  20. Yan, X.; Sarkar, M.; Lartey, B.; Gebru, B.; Homaifar, A.; Karimoddini, A.; Tunstel, E. An Online Learning Framework for Sensor Fault Diagnosis Analysis in Autonomous Cars. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14467–14479. [Google Scholar] [CrossRef]
  21. Chen, D.; Zhang, Z.; Zhou, F.; Wang, C. A real-time fault diagnosis method for multi-source heterogeneous information fusion based on two-Level transfer learning. Entropy 2024, 26, 1007. [Google Scholar] [CrossRef] [PubMed]
  22. Li, H.; Zheng, Z.; Liu, K. Online Monitoring of High-dimensional Data Streams with Deep Q-network. IEEE Trans. Autom. Sci. Eng. 2025, 1. [Google Scholar] [CrossRef]
  23. Li, D.; Bai, M.; Xian, X. Data-Driven Pathwise Sampling Approaches for Online Anomaly Detection. Technometrics 2024, 66, 600–613. [Google Scholar] [CrossRef]
  24. Cheruku, S.; Balaji, S.; Adolfo, D.; Krishnamurthy, V. Data-Driven Digital Twins for Real-Time Machine Monitoring: A Case Study on a Rotating Machine. J. Comput. Inf. Sci. Eng. 2025, 25, 1–15. [Google Scholar] [CrossRef]
  25. MacGregor, J.; Cinar, A. Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods. Comput. Chem. Eng. 2012, 47, 111–120. [Google Scholar] [CrossRef]
  26. Zhao, P.; Qinghe, Z.; Ding, Z.; Zhang, Y.; Wang, H.; Yang, Y. A High-Dimensional and Small-Sample Submersible Fault Detection Method Based on Feature Selection and Data Augmentation. Sensors 2021, 22, 204. [Google Scholar] [CrossRef]
  27. Aman, A.; Chen, Y.; Yiqi, L. Assessment of Slow Feature Analysis and Its Variants for Fault Diagnosis in Process Industries. Technologies 2024, 12, 237. [Google Scholar] [CrossRef]
  28. Yoav, B.; Daniel, Y. A multivariate exponentially weighted moving average control chart. Ann. Stat. 2001, 29, 1165–1188. [Google Scholar]
  29. Lowry, C.A.; Woodall, W.H.; Champ, C.W.; Rigdon, S.E. A multivariate exponentially weighted moving average control chart. Technometrics 1992, 34, 46–53. [Google Scholar] [CrossRef]
  30. Feng, L.; Ren, H.; Zou, C. A setwise EWMA scheme for monitoring high-dimensional datastreams. Random Matrices Theory Appl. 2020, 9, 2050004. [Google Scholar] [CrossRef]
  31. Cai, T.T.; Liu, W.; Xia, Y. Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. 2014, 76, 349–372. [Google Scholar]
  32. Yoav, B.; Yosef, H. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar]
  33. Benjamini, Y.; Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 2001, 29, 1165–1188. [Google Scholar] [CrossRef]
  34. Du, L.; Guo, X.; Sun, W.; Zou, C. False discovery rate control under general dependence by symmetrized data aggregation. J. Am. Stat. Assoc. 2021, 118, 1–34. [Google Scholar] [CrossRef]
  35. Vilenchik, D.; Yichye, B.; Abutbul, M. To Interpret or Not to Interpret PCA? This Is Our Question. Proc. Int. AAAI Conf. Web Soc. Media 2019, 13, 655–658. [Google Scholar] [CrossRef]
  36. Lemberge, P.; De Raedt, I.; Janssens, K.H.; Wei, F.; Van Espen, P.J. Quantitative analysis of 16–17th century archaeological glass vessels using PLS regression of EPXMA and μ-XRF data. J. Chemom. J. Chemom. Soc. 2000, 14, 751–763. [Google Scholar] [CrossRef]
  37. Hubert, M.; Rousseeuw, P.; Branden, K. ROBPCA: A new approach to robust principal component analysis. Technometrics 2005, 47, 64–79. [Google Scholar] [CrossRef]
  38. Hubert, M.; Rousseeuw, P.; Van den Bossche, W. MacroPCA: An all-in-One PCA method allowing for missing values as well as cellwise and rowwise outliers. Technometrics 2019, 61, 459–473. [Google Scholar] [CrossRef]
Figure 1. The data stream with IC data (gray) and OC data (red), when p = 10 , p s = 2 , τ = 150 , t = 300 , κ = 2.0 . The left subfigure is the mean abrupt shift case, and the right subfigure is the mean gradual shift case.
Figure 1. The data stream with IC data (gray) and OC data (red), when p = 10 , p s = 2 , τ = 150 , t = 300 , κ = 2.0 . The left subfigure is the mean abrupt shift case, and the right subfigure is the mean gradual shift case.
Entropy 27 00297 g001
Figure 2. The change trend of the monitoring statistic value when p = 200 , p s = 10 , τ = 400 , t = 500 , κ = 2.5 . The red dashed line represents the threshold line; the vertical gray dotted line is the change point location.
Figure 2. The change trend of the monitoring statistic value when p = 200 , p s = 10 , τ = 400 , t = 500 , κ = 2.5 . The red dashed line represents the threshold line; the vertical gray dotted line is the change point location.
Entropy 27 00297 g002aEntropy 27 00297 g002b
Figure 3. The change trend of power and size values (%) by various procedures when p = 200 , p s = 20 , τ = 200 , t = 300 , α = 0.05 with the increase in the parameter κ .
Figure 3. The change trend of power and size values (%) by various procedures when p = 200 , p s = 20 , τ = 200 , t = 300 , α = 0.05 with the increase in the parameter κ .
Entropy 27 00297 g003
Figure 4. The change trend of the monitoring statistic value when p = 50 , p s = 30 , τ = 200 , n i c = 150 , t = 300 , κ = 2.5 . The red dashed line represents the threshold line; the vertical gray dotted line is the change point location.
Figure 4. The change trend of the monitoring statistic value when p = 50 , p s = 30 , τ = 200 , n i c = 150 , t = 300 , κ = 2.5 . The red dashed line represents the threshold line; the vertical gray dotted line is the change point location.
Entropy 27 00297 g004
Figure 5. The change trend of power and size values (%) by various procedures when p = 200 , p s = 20 , τ = 200 , t = 300 , α = 0.05 with the increase in the parameter κ .
Figure 5. The change trend of power and size values (%) by various procedures when p = 200 , p s = 20 , τ = 200 , t = 300 , α = 0.05 with the increase in the parameter κ .
Entropy 27 00297 g005
Figure 6. Online monitoring results for the glass production process stream. Solid points: IC batches; Hollow circles: OC batches triggering alarms. The OC state from t = 57 onward emulates a production line deviation requiring intervention.
Figure 6. Online monitoring results for the glass production process stream. Solid points: IC batches; Hollow circles: OC batches triggering alarms. The OC state from t = 57 onward emulates a production line deviation requiring intervention.
Entropy 27 00297 g006
Table 1. Average percentage of the empirical type-I error values α ˜ (%) and the power values β ˜ (%) by the proposed monitoring procedure under different model settings when p = 200 , n i c = 150 , t = 500 , τ = 200 , α = 0.05 , and κ = 1.5 ,   2.0 ,   2.5 , respectively.
Table 1. Average percentage of the empirical type-I error values α ˜ (%) and the power values β ˜ (%) by the proposed monitoring procedure under different model settings when p = 200 , n i c = 150 , t = 500 , τ = 200 , α = 0.05 , and κ = 1.5 ,   2.0 ,   2.5 , respectively.
ScenarioModel κ = 1.5 κ = 2.0 κ = 2.5
p s / p α ˜ β ˜ α ˜ β ˜ α ˜ β ˜
  (a)0.055.3(3.4)99.2(0.5)5.4(3.4)99.7(0.2)5.5(3.5)99.9(0.2)
0.105.2(3.4)99.7(0.2)5.4(3.4)99.8(0.7)5.5(3.4)99.9(0.1)
0.205.6(3.6)99.8(0.2)5.6(3.1)99.9(0.1)5.7(3.6)100.0(0.1)
 (i) (b)0.055.5(3.2)96.6(1.4)5.4(3.6)99.7(0.2)5.2(3.2)99.9(0.2)
0.105.4(3.4)99.5(0.3)5.3(3.7)99.8(0.2)4.9(3.4)99.9(0.1)
0.205.4(3.6)99.7(0.2)5.3(3.5)99.8(0.2)5.3(3.3)100.0(0.1)
  (c)0.054.9(3.1)96.7(1.3)5.3(3.3)99.7(0.2)5.3(3.2)99.9(0.2)
0.105.1(3.5)99.6(0.3)5.3(3.6)99.8(0.2)5.4(3.5)99.9(0.1)
0.205.1(3.2)99.7(0.2)5.4(3.5)99.9(0.2)5.4(3.4)100.0(0.1)
  (a)0.055.6(4.3)87.7(1.5)5.5(4.2)90.4(1.2)5.5(3.9)92.9(1.0)
0.105.4(3.9)89.7(1.2)5.5(4.1)91.6(1.0)5.3(3.8)93.9(0.9)
0.205.4(4.0)90.9(1.1)5.7(4.1)92.9(1.0)5.2(3.9)94.0(0.9)
(ii)(b)0.055.3(3.9)86.6(1.8)5.2(3.9)89.6(1.4)5.3(3.6)92.5(1.2)
0.105.5(3.9)88.7(1.3)5.3(4.0)91.1(1.2)5.1(4.2)92.9(1.1)
0.205.4(3.9)90.3(1.2)5.6(4.0)92.4(1.1)5.3(4.1)93.6(0.9)
  (c)0.055.3(4.1)86.7(1.7)5.3(3.8)89.7(1.6)5.2(3.6)91.6(1.2)
0.105.4(4.1)88.7(1.3)5.3(4.0)91.8(1.3)5.2(3.9)92.7(1.1)
0.205.2(3.8)90.3(1.2)5.5(4.2)92.3(1.1)5.4(4.0)93.6(0.9)
Table 2. Comparison of the three methods via change point estimation with p = 200 , p s = 40 , t = 500 , α = 0.05 , the real change point τ = 200 and 400, respectively. Bias: the absolute deviation of estimators; Sd: the standard deviations of estimators; P j = 100 Pr ( | τ τ ^ | j ) %, j = 3 in Scenario (i), and j = 15 in Scenario (ii).
Table 2. Comparison of the three methods via change point estimation with p = 200 , p s = 40 , t = 500 , α = 0.05 , the real change point τ = 200 and 400, respectively. Bias: the absolute deviation of estimators; Sd: the standard deviations of estimators; P j = 100 Pr ( | τ τ ^ | j ) %, j = 3 in Scenario (i), and j = 15 in Scenario (ii).
τ = 200 τ = 400
ScenarioMethod κ BiasSd P j BiasSd P j
EVT1.06.33.824.07.15.023.6
1.52.11.289.02.21.288.5
2.01.11.199.51.20.899.5
A-M1.09.44.513.56.85.435.2
(i) 1.56.12.817.54.53.637.3
2.04.12.134.03.22.544.0
APC1.04.63.144.93.62.757.1
1.53.61.755.02.51.379.0
2.03.11.270.22.10.992.7
EVT1.022.13.53.921.712.13.5
1.515.82.743.715.310.442.1
2.012.33.0891.012.48.190.3
A-M1.018.87.117.214.210.136.2
(ii) 1.514.96.134.212.37.847.1
2.013.34.964.99.97.174.1
APC1.016.03.349.514.74.657.8
1.515.22.959.013.83.765.5
2.014.72.376.113.22.580.2
Table 3. Average percentage of type-I error values ( α ˜ %), power values ( β ˜ %), and change point estimators ( τ ^ ) by the proposed online monitoring procedure under different choices of γ when p = 200 , t = 300 , τ = 200 , and α = 0.05 .
Table 3. Average percentage of type-I error values ( α ˜ %), power values ( β ˜ %), and change point estimators ( τ ^ ) by the proposed online monitoring procedure under different choices of γ when p = 200 , t = 300 , τ = 200 , and α = 0.05 .
κ = 1.0 κ = 1.5 κ = 2.0
p s / p γ α ˜ β ˜ τ ^ α ˜ β ˜ τ ^ α ˜ β ˜ τ ^
0.24.880.4209.54.997.1202.84.998.4201.4
0.050.44.934.7265.15.083.4207.34.998.3201.5
0.65.017.8290.15.050.3248.14.985.9205.2
0.25.092.9206.05.097.9201.64.998.8200.8
0.100.45.053.9232.94.995.7202.55.099.2200.8
0.64.928.1292.55.073.4213.14.997.4201.5
0.25.095.9204.15.098.3201.54.999.0200.4
0.150.44.967.1215.65.098.2201.64.999.2200.6
0.65.037.3281.75.084.7205.54.999.2200.5
Table 4. Diagnosis accuracy of the proposed procedure, under different choices of κ and p s , when p = 200 , t = 300 , τ = 200 , and α = 0.05 .
Table 4. Diagnosis accuracy of the proposed procedure, under different choices of κ and p s , when p = 200 , t = 300 , τ = 200 , and α = 0.05 .
EVTA-MAPC
p s κ τ ^ TPRFPR τ ^ TPRFPR τ ^ TPRFPR
1.0212.399.10.8213.6100.00.9210.571.14.4
5%1.5203.3100.00.8208.6100.00.9208.182.15.4
2.5201.6100.00.8206.7100.00.9206.985.87.4
1.0206.599.31.3209.5100.00.9209.984.810.3
10%1.5202.1100.01.4206.1100.01.0206.792.812.9
2.5201.2100.01.3204.6100.00.8206.294.415.0
1.0204.599.51.6209.8100.00.6208.191.812.6
15%1.5201.499.71.7205.0100.00.9206.295.714.2
2.5200.9100.01.7204.4100.00.4205.596.113.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, T.; Guo, Y.; Zhu, F.; Li, Z. Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis. Entropy 2025, 27, 297. https://doi.org/10.3390/e27030297

AMA Style

Wang T, Guo Y, Zhu F, Li Z. Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis. Entropy. 2025; 27(3):297. https://doi.org/10.3390/e27030297

Chicago/Turabian Style

Wang, Tao, Yunfei Guo, Fubo Zhu, and Zhonghua Li. 2025. "Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis" Entropy 27, no. 3: 297. https://doi.org/10.3390/e27030297

APA Style

Wang, T., Guo, Y., Zhu, F., & Li, Z. (2025). Online Monitoring and Fault Diagnosis for High-Dimensional Stream with Application in Electron Probe X-Ray Microanalysis. Entropy, 27(3), 297. https://doi.org/10.3390/e27030297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop