Frequent Alarm Pattern Mining of Industrial Alarm Flood Sequences by an Improved PreﬁxSpan Algorithm

: Alarm systems are essential to the process safety and efﬁciency of complex industrial facilities. However, with the increasing size of plants and the growing complexity of industrial processes, alarm ﬂooding is becoming a serious problem and posing challenges to alarm systems. Extracting alarm patterns from an alarm ﬂood database can assist with an alarm root cause analysis, decision support, and the conﬁguration of an alarm suppression model. However, due to the large size of the alarm database and the problem of sequence ambiguity in the alarm sequence, existing algorithms suffer from excessive computational overhead, incomplete alarm patterns, and redundant outputs. In order to solve these problems, we propose an alarm pattern extraction method based on the improved PreﬁxSpan algorithm. Firstly, a priority-based pre-matching strategy is proposed to cluster similar sequences in advance. Secondly, we improved PreﬁxSpan by considering timestamps to tolerate short-term order ambiguity in alarm ﬂood sequences. Thirdly, an alarm pattern compression method is proposed for the further distillation of pattern information in order to output representative alarm patterns. Finally, we evaluated the effectiveness and applicability of the proposed method by using an alarm ﬂood database from a real diesel hydrogenation unit.


Introduction
In order to ensure the safety of industrial production, alarm systems are essential to guarantee the safety and efficiency of operations.Alarms are audible or visual signals that alert operators to equipment failures, process deviations, and other abnormalities, thus preventing equipment damage or even production accidents.With the wide use of industrial control systems (ICSs), on the one hand, the cost of designing and configuring alarms has been reduced; on the other hand, the high degree of correlation and complexity between devices makes it possible for a single point of failure to lead to a failure in a related area or even failures in the whole plant, known as cascaded faults.At the same time, unreasonable alarm thresholds and the low performance of alarm management systems in ICSs pose challenges to the efficient operation of alarm systems [1,2], where alarm flooding is the most common and serious problem during the operation of industrial installations.According to EEMUA and ISA 18.2 standards, operators should not receive more than 6 alarms per hour, and alarm flooding is defined as more than 10 alarms per operator per 10 min [3,4].During alarm floods, operators are unable to identify critical information from numerous alarms in a timely manner, resulting in the lack of effective actions to address critical exceptions, which affects product quality and increases production costs and poses a significant risk to process safety as well as personnel safety.For instance, 275 different alarms occurred in the 10.7 min prior to the 2005 hydrocarbon plant explosion at a Texas refinery in the United States [5,6].The operators failed to detect an abnormality in the hydrocarbon fractionation level in its isomerization unit in time, leading to an explosion after the gas-phase component was discharged from the vent stack.Numerous industrial standards and accident analyses have shown that a scientific and reasonable alarm system is important to ensure the safety of industrial processes, to enhance production efficiency, and to guarantee the safety of employees.
Alarm floods have become a common phenomenon in the process industry and pose challenges for alarm systems.To date, extensive studies have been carried out to optimize alarm systems so as to alleviate the effects of alarm floods caused by chattering alarms or reduce the number of these alarm floods.Tulsya et al. [7] designed a delayed alarm strategy for the desired and worst conditions based on minimizing the missed alarm rate and false alarm rate to ensure robustness to non-smooth industrial processes.Wang et al. [8] proposed a method to design a dead bandwidth to suppress the number of chattering alarms and mitigate disturbances to the alarm system.Cheng et al. [9] designed an optimal alarm filter to achieve the best alarm accuracy in the case of the given normal and abnormal statistical distributions.
In the process industry, the switching of some operating states and the propagation of cascaded faults usually generate related alarms.As part of alarm rationalization, alarm flood analysis has also attracted extensive research attention and has become a major branch in handling alarm floods.To date, some modified sequence alignment algorithms have been proposed for the pairwise matching of alarm flood sequences.In Ref. [10], the similarity index between paired sequences was calculated using an improved Smith-Waterman algorithm (SWA) and clustered similar alarm sequences based on the similarity scores.Lai et al. [11] proposed an improved basic local alignment search tool (BLAST) by combining the alarm priority information and timestamp.Simulation experiments showed that the improved BLAST has a smaller computational overhead compared to the modified SWA in [10].In Ref. [12], a weighted sequential similarity approach is proposed to extract alarm sequence templates for given faults.Based on the extracted fault templates, an improved Needleman-Wunsch algorithm is proposed to isolate alarms caused by identified alarm patterns [13].
Alarm flood analysis extracts alarm sequences from alarm logs and identifies alarm patterns based on the similarity indexes or the frequency of alarm occurrences.The extracted alarm patterns can be used in a root cause analysis [14], alarm display and alarm response improvement [15], or some advanced methods mentioned in the EEMUA, such as fault prediction, online alarm suppression, and so forth.However, extracting patterns from an alarm database with thousands of alarms is time-consuming and requires fairly accurate process knowledge [16].Fortunately, it is common that some events or abnormalities that occur frequently leave a trace in the A&E log.If such a repeated series of alarms can be detected from historical data, it can help to extract alarm patterns.
Although alarm flood analyses with sequence alignment algorithms were implemented in Refs.[12,13], these methods focus more on clustering similar alarm flood sequences rather than detecting frequent alarm patterns.The Apriori algorithm and its variants are major methods for extracting frequent patterns [17].However, these algorithms need to construct a large number of candidates as well as frequently scan the database to detect patterns, which results in unaffordable computational overhead when applied to industrial alarm databases.In order to reduce the computational cost, Zhou et al. [18] proposed a modified CloFast algorithm to extract compact alarm patterns in industrial alarm floods.
PrefixSpan (Prefix-Projected Pattern Growth) is an efficient algorithm that generates smaller-sized item databases and provides faster computation [19].Niyazamand et al. [20] proposed a modified PrefixSpan algorithm (M_PrefixSpan) to extract frequent patterns in alarm flood sequences.However, M_PrefixSpan is based on the premise that alarm sequences are sequential, and such a premise is difficult to satisfy in real industrial processes: related alarms occur almost simultaneously and in an uncertain order.When several alarm flood sequences occur with order ambiguities, M_PrefixSpan will sequentially extract alarms as prefixes for expansion.As a result, the frequency of some alarms may not meet the minimum support threshold, making it possible for critical alarms to be neglected, affecting the usability of the extracted patterns.Wang et al. [21] reduced the computational cost of the PrefixSpan algorithm by applying an incremental mining strategy.However, it still fails to solve the problem of the sequence ambiguity of alarm sequences; at the same time, the existing algorithm outputs a large number of redundant alarm patterns, which makes it difficult for users to find representative alarm patterns.
Motivated by the problems described above, a compressed alarm pattern mining method based on the PrefixSpan algorithm (CAPM_PrefixSpan) is proposed to further facilitate the root cause analysis of the alarm flood.
The main contributions of this paper are as follows: 1.
We propose a pre-matching mechanism based on the similarity scores of pairwise alarm sequences, which can effectively reduce the computational cost when dealing with numerous alarm data.

2.
We modified the method of constructing the projection database in the PrefixSpan algorithm, which can help the algorithm avoid the problem of incomplete patterns due to sequence order ambiguity when mining frequent alarm patterns.

3.
We propose a compression method to merge similar extracted alarm patterns so as to cluster and compress frequent alarm patterns into a compact alarm sequence, which prevents the output of cumbersome alarm patterns.
The rest of this paper is organized as follows.Section 2 introduces the preliminaries of alarm systems and the PrefixSpan algorithm.Section 3 presents the proposed CAPM_PrefixSpan algorithm.The effectiveness of CAPM_PrefixSpan is verified based on an industrial case in Section 4. Finally, the conclusion is given in Section 5.

Preliminaries and Problem Description
This section presents the problem of extracting alarm flood patterns from historical alarm data (A&E log) and describes the relevant definitions and algorithms.

Alarms and Alarm Floods
Alarms are generated when process variables exceed their predetermined thresholds and are stored as a set of structured texts in the Alarm and Event Log (A&E log).As shown in Table 1, an alarm contains many attributes, typically including a tag name, an alarm identifier, time information, and an alarm priority [22].The tag name is the label corresponding to the alarm, and the alarm identifier denotes the alarm type; e.g., "PVHI" (process variable high) and "PVLO" (process variable low) indicate that an analog variable exceeds the high limit or the low limit of the threshold, respectively.The time label records the time that the alarm happens; the alarm priority indicates the importance of the alarm and is usually determined based on factors such as the consequences of ignoring the alarm and the maximum time allowed to deal with the alarm.
Therefore, in this paper, we represent an alarm with a tuple containing three attributes: where e x j i is the alarm label, which is a combination of the tag name and the identifier.t x j i and p x j i are the time label and the priority of the alarm, respectively.As a result, these three attributes can define an arbitrary unique alarm x j i in any alarm sequence X j in the A&E log.Based on the ANSI/ISA 18.2 definition of an alarm flood (more than ten alarms per operator per ten minutes), an alarm flood sequence can be expressed as Equation (2) shows.
where X j is the length of the alarm flood sequence and satisfies X j ≥ 10 according to the definition.D is the alarm flood database, which is a collection of all the alarm flood sequences extracted from the A&E log.

Chattering Alarms
Chattering alarms are large and single alarm messages due to one alarm variable fluctuating around the alarm threshold over a short period of time.Due to the prevalence of noise and unreasonable alarm designs, chattering alarms are very common in the process industry and account for more than 80% of the total number of alarms [23].Chattering alarms are unable to convey interdependent pattern information between alarms and interfere with the extraction of frequent alarm patterns.Therefore, it is important to remove chattering alarms during the data pre-processing phase.
Clustering identical alarms into a single event can eliminate the influence of chattering alarms.We adopt a predefined time window T W to eliminate chattering alarms: when an alarm is generated, alarms with the same alarm label and the same alarm identifier for the subsequent duration of the alarm are ignored.After this processing, this method ensures that two identical alarms are separated by at least T W (s).
Figure 1a shows 135 alarms for diesel flow in the atmospheric and vacuum distillation units of the refinery, and Figure 1b shows the processed alarms by setting the time window T W = 120 s.Only four alarms are preserved for further alarm pattern analysis.
Processes 2023, 11, x FOR PEER REVIEW 5 of 18 when an alarm is generated, alarms with the same alarm label and the same alarm identifier for the subsequent duration of the alarm are ignored.After this processing, this method ensures that two identical alarms are separated by at least W T (s).
Figure 1a shows 135 alarms for diesel flow in the atmospheric and vacuum distillation units of the refinery, and Figure 1b shows the processed alarms by setting the time window W T = 120 s.Only four alarms are preserved for further alarm pattern analysis.

Mining Frequent Alarm Patterns with PrefixSpan
PrefixSpan is a variant of the FreeSpan algorithm, which continuously generates and mines smaller projection databases by recursive mining until all items are lower than the support threshold.In Ref. [21], a modified PrefixSpan (M-PrefixSpan) is proposed for mining frequent alarm patterns.The relevant definitions of M_PrefixSpan are as follows: Item: Each alarm label in the alarm flood sequence database.For example, the alarm label "PI251.PVLO" in Table 1.
Item frequency (marked as ( ) ξ • ): The total number of alarm sequences in  , which

Mining Frequent Alarm Patterns with PrefixSpan
PrefixSpan is a variant of the FreeSpan algorithm, which continuously generates and mines smaller projection databases by recursive mining until all items are lower than the support threshold.In Ref. [21], a modified PrefixSpan (M-PrefixSpan) is proposed for mining frequent alarm patterns.The relevant definitions of M_PrefixSpan are as follows: Item: Each alarm label in the alarm flood sequence database.For example, the alarm label "PI251.PVLO" in Table 1.
Item frequency (marked as ξ(•)): The total number of alarm sequences in D, which contains at least one alarm label.
Support threshold (marked as Sup min ): The minimum frequency of an item to be considered a candidate as a frequent item.
Prefix and suffix: Consider three alarm sequences The remaining sequence γ is called a suffix of α with regards to prefix β.
Projection database: The collection of suffixes for a given prefix in the alarm sequence database.
Step 2: Create a projection database with each prefix.
Step 3: Determine the frequencies of all suffixes associated with the prefix.The frequency of the suffix with regard to the prefix "FI702.PVHI" is shown in Table 3.
Step 5: Repeat Step 2 to Step 5 until the support values of all items in the projection database are lower than the threshold Sup min .

Problem Description
In summary, the major aim of this paper is to extract frequent alarm patterns from the A&E logs of industrial alarm systems.As shown in Figure 2, after removing chattering alarms, the calculation for extracting alarm patterns from the alarm database D is conducted in the following three steps: 1.
The priority-based pre-matching strategy is used to cluster similar alarm flood sequences so as to reduce the computational overhead.2.
Closed frequent alarm patterns are discovered to extract typical alarm patterns.

3.
The alarm pattern is compressed to reduce the impact of cumbersome frequent alarm patterns.
Figure 2. Framework of the proposed method for mining alarm patterns.

Proposed Methods
In this section, we detail the three steps of CAPM_PrefixSpan for mining frequent alarm patterns, including the priority-based pre-matching strategy, the discovery of closed frequent alarm patterns, and the alarm pattern compression method.

The Priority-Based Pre-Matching Strategy
The pre-matching strategy utilizes the similarity index of alarm sequences or alarm attributes of alarm sequences to cluster similar alarm sequences and thus exclude irrelevant ones.As one of the important attributes of alarms, the alarm priority indicates the importance of the alarm.As shown in Table 4, there are usually three or four alarm priorities established in industrial alarm systems.According to ISA 18.2, for alarm systems with three levels of alarm priorities, the recommended percentage for each alarm priority from "Low" to "Emergency" should be 80%, 15%, and 5%, respectively.High-priority alarms have a smaller percentage but indicate severe abnormal conditions; on the contrary, lower priorities are typically used to configure most of the less severe alarms.Thus, it is reasonable to cluster alarm flood sequences based on the co-occurrence of alarms with higher priorities.The specific calculations and processes for the above steps are described in the following section.

Proposed Methods
In this section, we detail the three steps of CAPM_PrefixSpan for mining frequent alarm patterns, including the priority-based pre-matching strategy, the discovery of closed frequent alarm patterns, and the alarm pattern compression method.

The Priority-Based Pre-Matching Strategy
The pre-matching strategy utilizes the similarity index of alarm sequences or alarm attributes of alarm sequences to cluster similar alarm sequences and thus exclude irrelevant ones.As one of the important attributes of alarms, the alarm priority indicates the importance of the alarm.As shown in Table 4, there are usually three or four alarm priorities established in industrial alarm systems.According to ISA 18.2, for alarm systems with three levels of alarm priorities, the recommended percentage for each alarm priority from "Low" to "Emergency" should be 80%, 15%, and 5%, respectively.High-priority alarms have a smaller percentage but indicate severe abnormal conditions; on the contrary, lower priorities are typically used to configure most of the less severe alarms.Thus, it is reasonable to cluster alarm flood sequences based on the co-occurrence of alarms with higher priorities.In order to cluster similar alarm flood sequences in a given database, a binary matrix S is established with each element calculated by Equation (3): where S i,j represents the element in the i-th row and j-th column of matrix S, s (X i ,X j ) is the similarity index between the i-th and j-th alarm sequences, and µ th is the threshold for matching similar sequences.
where C i,j = e x x = (e x , t x , p x ) ∈ X i ∩ X j is the collection of all identical alarm labels between the i-th and j-th alarm sequences, m represents the length of set C i,j , and C n i,j denotes the n-th alarm label in set C i,j .The function ψ(•) calculates the score based on the priority levels and can be expressed as Equation ( 5): where p x is the priority level of alarm x, p max is the maximum priority level, and β is a positive constant.Therefore, ψ(e x ) increases as the alarm priority increases.For instance, given an alarm system with three priority levels, "High", "Medium", and "Low", the scores ψ(e x ) are 2, 3.5, and 5, respectively.
Once the binary matrix S is finished, alarm flood sequences with S i,j = 1 are clustered to segment the alarm flood database D, denoted as = R 1 , R 2 , . . ., R |R| .
In each group any two sequences X k a and X k b share at least one instance of identical alarm labels.As a result, the algorithm avoids distractions from irrelevant alarm sequences.In the next step, frequent alarm patterns are discovered recursively for each group of alarm sequences R 1 , R 2 , . . ., R |R| by incorporating temporal information.

Discovering Closed Frequent Alarm Patterns
Definition 1: Alarm pattern P is a frequent alarm pattern if the frequency ξ(P) ≥ Sup min .Notice that if P is frequent, all subsets of Pare also frequent.
Definition 2: Alarm pattern P is a closed alarm pattern iff: (1) Alarm pattern P is frequent.
The PrefixSpan algorithm builds the projection database by counting the frequency of each suffix corresponding to the prefix.In order to reduce interference from the order ambiguity problem in alarm flood sequences, we propose an improved projection database construction method by introducing the temporal information of the alarms.
Without loss of generality, for the arbitrary prefix v with the frequency m, assume that the suffix v has n alarm sequences containing the alarm tag e x i (m ≥ n).Let t denote the time information of alarm x i corresponding to prefix v, where t is an n × 1 column vector.Let t (v) denote the final time of m alarm sequences with respect to prefix v.The time distance matrix ∆T v,x i can be calculated by the following equation.
where I 1×m and I 1×n are the unit vector.The time span T s = [t b , t f ] is used to truncate the time distance matrix ∆T v,x i , where t b is a negative constant, so as to reduce the impact of sequence order ambiguity; t f is the time window and a positive constant (usually set to 100 s), which is used to determine the causal relationship between the alarms.To truncate the time distance matrix ∆T v,x i by the time span T s according to Equation ( 7), we have ∇T v,x i = [∇t] n×m : where ∆t ij is an element in the time distance matrix.∆t ij ∈ (0, t f ] indicates that alarm x i occurs after prefix v within t f seconds; ∆t ij ∈ [t b , 0] indicates that alarm x i occurs within t s seconds before prefix v. Therefore, the frequency of the alarm tag can be calculated as: where ζ(e x i → v) represents the total frequency of alarm tag e x i following prefix v.Then, alarm tag e x i and corresponding frequency ζ(e x i → v) will be recorded in the projection database B w.r.t v . The Pseudocode for constructing the projection database is summarized in Algorithm 1.If ζ(e x i → v) ≥ Sup min , alarm tag e x i is integrated with the current corresponding prefix v to construct a new prefix, v .Further, the average time interval between e x i and v can be calculated as τ(e x i → v) and will also be recorded.The prefix extension is recursively performed until no more frequent alarm items can be found in the projection database.The major codes for building the projection database are summarized in Algorithm 1. Finally, a filter is adopted to remove all the subsets of frequent alarm patterns, rendering the final output of the CAPM_PrefixSpan a closed frequent alarm pattern.The main codes for discovering closed frequent alarm patterns are shown in Algorithm 2.

Algorithm 1 Major codes for building projection database
3 X=The set of all alarm tags in the suffix with regard to prefix v 4 For each alarm tag e x i in X.

Frequent Alarm Pattern Compression Method
Even if similar alarm flood sequences are clustered by the pre-matching strategy proposed in Section 3.1, the closed frequent alarm patterns extracted from R k are still numerous and redundant.Therefore, the extracted frequent alarm patterns should be further compressed into representative alarm patterns.
For the given closed alarm pattern P k extracted from R k , a binary matrix Z k is created by calculating the pairwise similarity index between patterns P k a and P k b : where γ is the threshold for pattern compression, and Z k a,b denotes an element of the i-the column and j-th row in matrix Z.The function ϕ(P k a , P k b calculates the similarity index between P k a and P k b .Since the closed alarm patterns extracted from the same collection R k usually share identical alarm tags, the similarity index ϕ(P k a , P k b is calculated based on the optimal alignment of the optimal segment pairs of two alarm patterns.As a result, the Smith-Waterman algorithm is utilized to calculate ϕ(P k a , P k b as Equation (10) shows.
where φ P a i:m , P b j:n is the similarity index of the segmented pair P a i:m , P b j:n .In order to find the best local alignment between P k a and P k b , the SW algorithm recursively calculates an index matrix H: where H p+1,q+1 is an element of the matrix, and δ is the gap penalty.For any p and q, H 1,q = 0 and H p,1 = 0 since one or both of the segments of P k a and P k b are empty.ρ x a p , x b q is the similarity score function, as Equation (12) shows: Based on Equation (11), matrix H and the similarity index ϕ(P k a , P k b can be worked out.Following this, by calculating all alarm patterns in P k , the matrix Z k can be obtained.By clustering the alarm patterns with Z k = 1, similar closed alarm pattern collections can be recognized for further compression.These collections can be expressed as: where is the clustered alarm pattern, and k = 1, 2, . . ., |R| is the index of the extracted alarm pattern in P. Finally, the compressed alarm pattern Y k m is distilled from each C k m in according to Equation ( 14): where the operator ⊕ indicates the combination of elements in the clustered frequent alarm patterns based on their corresponding average timestamp.The Pseudocode of the frequent alarm pattern compression method is shown in Algorithm 3.

Algorithm 3
Major codes for compressing alarm patterns in P

Implementation Procedure
The major steps for mining frequent alarm patterns by CAPM_PrefixSpan are summarized in Algorithm 4, where D is the alarm flood sequences, [T w µ th γ T s Sup min ] are the predefined parameters, and Y denotes the set of compressed closed alarm patterns.The detailed steps are as follows: Step 1. Remove chattering alarms by using the time window T w .
Step 2. Calculate the similarity score of all alarm flood sequences in D based on Equation (4) and cluster similar alarm sequences according to Equation (3).
Step 4. Compress the extracted alarm patterns for each collection in P = P 1 , P 2 . . .P |R| according to Algorithm 3. In the CAPM_PrefixSpan algorithm, several important parameters are involved, including the time window T w , pre-matching threshold µ th , time span T s , compress threshold γ, and minimum support threshold Sup min .For the easier implementation of the algorithm for practitioners, the following guidelines can be considered when selecting the parameters of CAPM_PrefixSpan.

1.
In the data processing step, T w specifies the minimum time interval between two identical alarm tags.By default, T w = 100s to filter chattering alarms is widely used in practice [20].2.
In the pre-matching stage, µ th specifies the minimum similarity score for the prematching strategy.For the alarm system with three levels of priorities, µ th = 5 × 2 = 10 is set because ISA 18.2 considers an alarm flood to be over when the alarm rate is less than five alarms in 10 min; in addition, ISA 18.2 suggests that 80 percent of the alarms should be designated "Low" priority alarms, which have a similarity score of 2 according to Equation (5).

3.
In the closed frequent alarm pattern discovery stage, Sup min specifies the minimum occurrence frequency for considering an alarm to be a frequent alarm in the analyzed alarm floods.By default, Sup min = 2 is set to capture all repeated alarms.T s specifies the tolerance of order ambiguity in the alarm flood sequences.By default, T s = [−10, 100] is set as the tolerance of short-term order ambiguity to discover casualty alarms.4.
In the alarm pattern compression stage, γ specifies the threshold for merging similar alarm patterns.The value of γ can be determined based on user requirements.

Industrial Case Study
In this section, we intend to evaluate the performance of the proposed CAPM_ PrefixSpan algorithm in terms of computational cost and mining alarm patterns based on a real industrial alarm sequence database.

Data Acquisition and Comparative Algorithms
The experimental data were collected from the A&E log of a typical diesel hydrogenation unit at a refinery in Xingjiang Province, China.This facility had a total of 2203 configured alarms for monitoring 503 process variables.The alarm data were extracted from April 2020 to August 2020, and the chattering alarms were removed by setting T w = 100s.Finally, 161 alarm flood sequences were extracted according to the definition in ISA 18.2.Details of the extracted alarm flood sequences are shown in Table 5.In order to evaluate the effectiveness of the CAPM_PrefixSpan algorithm, M_ PrefixSpan [20] and causality PrefixSpan (C_PrefixSpan) [21] were compared on the same alarm flood database.In Ref. [21], the C_PrefixSpan algorithm utilized the time span t f to capture causality alarms associated with prefixes.The parameter settings are shown in Table 6.

Comparison of the Overall Results
Firstly, a total of 29 collections of clustered alarm flood sequences were obtained by using the pre-matching strategy proposed in Section 3.1.Next, frequent alarm pattern mining and alarm pattern compression were recursively performed for each clustered alarm sequence.As a result, we obtained a total of 121 closed alarm patterns and 33 compressed alarm patterns, which are shown in Figure 3.
By dividing the alarm flood database into 29 alarm collections by using the proposed pre-matching strategy, the search space of the algorithm is effectively reduced.In clustered alarm sequence #1 in Figure 2, this collection contains a total of 32 different alarm tags, which means that the maximum number of alarm tags to be considered is reduced to 32.Without the pre-matching process, all 2203 alarm tags in the DCS of the diesel hydrogenation unit would be examined, resulting in a large number of redundant alarm patterns and potentially unaffordable computational overhead.
To further illustrate the importance of the pre-matching method, Figure 4 presents the running times of the different algorithms for different database sizes.Overall, the larger the database, the longer it takes to extract the closed alarm patterns.This is because, as the database contains more alarm tags and alarm sequences, more alarm tags satisfy Sup min in each iteration, making the size of the projected database also increase dramatically.By clustering alarm flood sequences based on the alarm priority and co-occurrence, the number of alarms to be examined is reduced significantly, which greatly reduces the computation time of the algorithm.Furthermore, this also allows CAPM_PrefixSpan to extract alarm patterns based on a low support threshold.

Comparison of the Overall Results
Firstly, a total of 29 collections of clustered alarm flood sequences were obtained by using the pre-matching strategy proposed in Section 3.1.Next, frequent alarm pattern mining and alarm pattern compression were recursively performed for each clustered alarm sequence.As a result, we obtained a total of 121 closed alarm patterns and 33 compressed alarm patterns, which are shown in Figure 3.By dividing the alarm flood database into 29 alarm collections by using the proposed pre-matching strategy, the search space of the algorithm is effectively reduced.In clustered alarm sequence #1 in Figure 2, this collection contains a total of 32 different alarm tags, which means that the maximum number of alarm tags to be considered is reduced to 32.Without the pre-matching process, all 2203 alarm tags in the DCS of the diesel hydrogenation unit would be examined, resulting in a large number of redundant alarm patterns and potentially unaffordable computational overhead.
To further illustrate the importance of the pre-matching method, Figure 4 presents the running times of the different algorithms for different database sizes.Overall, the larger the database, the longer it takes to extract the closed alarm patterns.This is because, as the database contains more alarm tags and alarm sequences, more alarm tags satisfy min Sup in each iteration, making the size of the projected database also increase dramatically.By clustering alarm flood sequences based on the alarm priority and co-occurrence, the number of alarms to be examined is reduced significantly, which greatly reduces the computation time of the algorithm.Furthermore, this also allows CAPM_PrefixSpan to extract alarm patterns based on a low support threshold.Table 7 shows the number of alarm patterns extracted by the compared algorithms with the full database.CAPM_PrefixSpan extracted the most closed alarm patterns; this is because C_PrefixSpan and M_PrefixSpan are affected by alarm sequence order ambiguity, resulting in the incomplete extraction of alarm patterns.CAPM_PrefixSpan tolerates short-term sequence timing order ambiguity by introducing alarm time information and the time span s T .In addition, it is clear that the alarm pattern compression method significantly reduced pattern redundancy.For example, in alarm flood collection #2, five closed alarm patterns were obtained, and these alarm patterns differ only partially in their alarm tags and are highly similar.The alarm pattern compression method distills similar alarm patterns into a compressed alarm pattern, which reduces pattern redundancy and in turn helps users to focus on the patterns they are interested in.

M_PrefixSpan
C_PrefixSpan CAPM_PrefixSpan Table 7 shows the number of alarm patterns extracted by the compared algorithms with the full database.CAPM_PrefixSpan extracted the most closed alarm patterns; this is because C_PrefixSpan and M_PrefixSpan are affected by alarm sequence order ambiguity, resulting in the incomplete extraction of alarm patterns.CAPM_PrefixSpan tolerates shortterm sequence timing order ambiguity by introducing alarm time information and the time span T s .In addition, it is clear that the alarm pattern compression method significantly reduced pattern redundancy.For example, in alarm flood collection #2, five closed alarm patterns were obtained, and these alarm patterns differ only partially in their alarm tags and are highly similar.The alarm pattern compression method distills similar alarm patterns into a compressed alarm pattern, which reduces pattern redundancy and in turn helps users to focus on the patterns they are interested in.For further demonstration, we compared the alarm patterns extracted by the three algorithms from the same collection of alarm sequences.
The alarm sequence comes from the feedstock feed system in the diesel hydrogenation plant.This system feeds the feedstock from the atmospheric depressurization unit to the buffer tank, outputs the feedstock through the booster pump, and removes the fine particles through the filter; it then heats up and feeds it to the buffer tank to provide a stable feed to the downstream unit, which is an important part of the system in the diesel hydrogenation plant.
After the pre-matching process, alarm sequence collection #1 was obtained with a total of four alarm flood sequences.The alarm patterns extracted by M_PrefixSpan, C_PrefixSpan, and CAPM_PrefixSpan from alarm collection #1 are shown in Tables 8-10, respectively."PDI203" is the filter's differential pressure indicator; "PI106" is the buffer tank pressure indicator; "FI102" and "FI101" are the tank diesel flow indicators; "FI401" is the filter outlet flow indicator; and "FI402" is the atmospheric diesel flow indicator.The process values in collection #1 are highly correlated with each other, but the alarm sequences do not have a fixed order, and the same alarm pattern can produce multiple forms.For example, pattern #1 and pattern #2 in Table 10 are different forms of the same alarm pattern.M_PrefixSpan calculates each frequency separately based on each particular form, resulting in a reduced frequency for some pattern forms and causing the frequency of that alarm pattern form to not meet the minimum support threshold Sup min .As shown in Table 8, it can be found that alarm pattern #1 to pattern #3 are subsets of pattern #1 in Table 10, which indicates that the M_PrefixSpan algorithm fails to extract the complete alarm patterns.Meanwhile, the C_PrefixSpan algorithm also fails to completely extract pattern #1 in Table 8 due to the limited time window and the influence of order ambiguity in alarm flood sequences.
In contrast, the alarm patterns extracted by CAPM_PrefixSpan contained all the patterns in Tables 8 and 9.This indicates that tolerating short-term order ambiguities by setting the time span can effectively improve the mining performance of the alarm patterns.
It is clear to see that pattern #1 and pattern #2 in Table 10 are different forms of the same mode.Therefore, the closed alarm patterns need to be further processed to prevent outputting redundant patterns.Table 11 shows the compressed alarm patterns obtained by CAPM_PrefixSpan.The compressed alarm patterns greatly reduce the pattern redundancy.Further, based on an evaluation by experts with process knowledge, the alarm pattern "PDI203.PVHI" is triggered by high differential pressure due to the backflushing of the oil circuit.After 6.3 s, "FI401.PVLO" was triggered.In addition, the high flow of diesel fuel "FI405.PVHI", "FI102.PVHI", and "FI101.PVHI" caused the high pressure of the tank and triggered alarm "PI106.PVHI" within 12.4 s.Therefore, this alarm mode makes sense because it effectively exposed the fault propagation path in the plant.

Conclusions
In the process industry, alarm sequences caused by the same propagation path share different forms because of noise and the randomness of detection delays.In order to facilitate alarm pattern extraction as well as improve alarm systems, an alarm pattern extraction method is proposed, which consists of three main stages: the pre-matching strategy based on alarm priority, the improved PrefixSpan algorithm, and the alarm pattern compression method.To verify the effectiveness of the proposed method, an industrial case study was carried out with alarm data from a complex facility of a refinery.The experimental results show that CAPM_PrefixSpan improves the efficiency of alarm pattern recognition by introducing alarm timestamp information and tolerating short-term order ambiguity.In addition, the effectiveness of the compressed alarm patterns was verified by an expert evaluation.
However, alarm pattern mining based on historical data can only extract alarm patterns from abnormalities that have occurred.Furthermore, as the proposed algorithm is based on the number of occurrences of alarm tags in a particular DCS system, the extracted alarm patterns are still not universal across the same processes at different facilities.Therefore, future work will focus on investigating generalized alarm pattern mining methods.

Figure 2 .
Figure 2. Framework of the proposed method for mining alarm patterns.

Algorithm 4
Mining closed frequent alarm patterns in D 1 Input: D, T w , µ th , T s , γ, Sup min 2 Output: Y 3 Remove chattering alarms in D. 4 Divide D into = R 1 , R 2 , . . ., R |R| by using the priority-based pre-matching strategy.5 For each pattern in R in 6 Mining closed frequent alarm patterns P according to Algorithm 2 7 End for 8 Compress the alarm patterns P into Y according to Algorithm 3

Figure 3 .
Figure 3. Numbers of clustered alarm flood sequences, closed alarm patterns, and compressed alarm patterns from 29 alarm flood groups.

Figure 3 .
Figure 3. Numbers of clustered alarm flood sequences, closed alarm patterns, and compressed alarm patterns from 29 alarm flood groups.

Figure 4 .
Figure 4. Runtime with different sizes of alarm flood sequence databases.

Figure 4 .
Figure 4. Runtime with different sizes of alarm flood sequence databases.

Table 1 .
An example of alarms in industrial plants.

Table 2 .
Alarm sequences for a steam generator in a diesel hydrogenation plant.

Table 4 .
Typical alarm priorities of industrial alarm systems.

Table 4 .
Typical alarm priorities of industrial alarm systems.
k 7 Calculate time distance matrix ∆T v,x i according to Equation (6) 8 ∇T v,x i = Truncate ∆T v,x i by time span T s according to Equation (7) 9 rank(∇T v,x i ) = the frequency of alarm tag e x i with respect to v in R k 10 Add {e x i :rank(∇T v,x i )} into B Major codes for discovering closed frequent alarm patterns in 1 Input: , T s , Sup min 2 Output: P 3 For each R k in 4 Scan all alarm items in R k .5 Remove items with frequencies lower than Sup min .6 v current = the remaining alarm items in R k .Sup min 22 Update v next by Assembling e x n with v m w.r.t v m 21 If ζ( e x n → v m ) ≥

Table 5 .
Details of the extracted alarm flood sequence database.

Table 6 .
Detailed parameters of each algorithm.

Table 7 .
The number of extracted alarm patterns.

Table 7 .
The number of extracted alarm patterns.

Table 11 .
Compressed alarm patterns obtained by CAPM_PrefixSpan.