Correlation Degree and Clustering Analysis-Based Alarm Threshold Optimization

Guixin Zhang; Zhenlei Wang

doi:10.3390/pr10020224

and

Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology (ECUST), Shanghai 200237, China

^*

Author to whom correspondence should be addressed.

Processes2022, 10(2), 224;https://doi.org/10.3390/pr10020224

This article belongs to the Topic Processing, Analysis, Modelling and Mechanics of Materials and Structures

Version Notes

Order Reprints

Abstract

In industrial practice, excessive alarms and high alarm rates are mostly generated from unreasonable settings to variable alarm thresholds, which have become the significant causes of impact on operation stability and plant safety. A correlation degree and clustering analysis-based approach was presented to optimize the variable alarm thresholds in this paper. The correlation degrees of variables are first obtained by analyzing correlation relationships among them. Second, the variables are grouped according to the gray correlation coefficients and clustering analysis, given the weight for fault alarm rate (FAR) in each group. An objective function about the FAR, missed alarm rate (MAR), and the maximum acceptable FAR and MAR is then established with variable weight. Eventually, based on an optimization algorithm, the objective function can be optimized for obtaining the optimal alarm threshold. Cases study of the Tennessee Eastman (TE) industrial simulation process and an actual industrial ethylene production process, in comparison to the initial situation, show that the method can effectively reduce FAR according to correlation degrees among variables in the system, and decrease the number of alarms with reduction rates of 40.5% and 35.3%, respectively.

Keywords:

alarm threshold; correlation degree; clustering analysis; FAR; MAR

1. Introduction

With the complexity and refinement of the process of industrial production, the production process is more and more inseparable from real-time monitoring of the system. An alarm management system, as an indispensable part in the safety operation of industrial production, has been paid more and more attention by all walks of life. In industrial practice, cases with more false alarms, a higher false alarm rate (FAR) and a missed alarm rate (MAR) always arise in processes [1], which are mainly caused through the unreasonable threshold settings for variables and ineffective management for alarm systems. Based on the studies from EEMUA, the range of alarm numbers that an operator could effectively handle for one alarm is from every 5 min to 10 min [2].

Regarding the methods of alarm optimization, academia has given many methods, each of which plays a certain role in its corresponding system to a greater or lesser extent. There are many kinds of alarm optimization methods, and classification methods are inconclusive. Generally speaking, they can be divided into univariate methods and multivariate methods, threshold optimization methods and algorithm optimization methods, off-line methods and dynamic methods, etc.

For determining the process variable threshold, academia has studied many optimization methods. For instance, in terms of FAR and MAR, an approach for estimating the threshold was proposed on account of an adaptive fuzzy-neural network and genetic learning algorithm [3]. As the threshold can be determined by the deadband, a method with the objective function about FAR and MAR, and the relation between the optimal threshold and deadband to estimate the threshold, was presented [4]. Combining FAR and MAR with a correlation coefficient, an off-line method for optimizing thresholds was given to reduce alarms for multi-variables based on time delay [5]. For improving the robustness classification performance in the system with regard to separation threshold selection, different intelligent pattern classifiers were used to mine industrial batch dryer data to determine thresholds [6]. In addition, there also have many methods in setting thresholds in early warning and damage systems [7,8], alarms reduction [9], systems monitoring [10,11], and performance optimization [6,12].

Over the past five years, based on intelligent algorithms, some similar methods have been improved. For instance, to remove chattering alarms, a univariate method was presented for addressing the reduction of alarms with median filters [13]. Taking the missed alarms and false alarms into account, an off-line univariate approach for determining alarm threshold in debris flow forecasting was presented, with the lowest missed-alarm and false-alarm probabilities [14]. By optimizing positioning accuracy, the pulse-width multiplexing Φ-OTDR and multisensor information fusion algorithm were utilized to reduce the nuisance alarm rate [15]. For target tracking in a chaotic environment, a mul-tivariate approach was proposed to optimize the joint threshold and power allocation strategy with a two-variable nonconvex optimization problem for the cognitive radar network, containing the detection stage and transmitting stage [16]. Based on the test observations, an off-line simple and robust approach was proposed to determine the detection thresholds for detecting defluidization in the early stage [17]. Some approaches about optimal alarm identification [18], design and evaluation analysis for an alarm system [19,20,21,22,23], management framework [24], alarm threshold [25], and an overview of industrial alarm systems [26] also have appeared. Variables in most of the above approaches have not been clustered with optimized thresholds, which could be suitable for analyzing interlinks among similar variables. Therefore, Zhang et al. presented an off-line multivariate method based on ROC curve and sensitivity, considering the sensitivity relationship and clustering analysis among variables, to optimize the alarm threshold [27]. A multivariate alarm clustering method was proposed that takes advantage of the information contained in the alarm logs themselves, of which the clustering analysis for process alarms was achieved through word embedding [28]. Analyzing alarm data, Lucke et al. presented an on-line method that conducted a practical application for alarm flood classification based on a set of historical alarm floods [29]. In the process, for high-dimension variables, the number of alarms needing addressing increases significantly when the number of measurable variables increases. False alarms caused by redundant disturbances will disturb operators, leading to alarms having more significance on the system being missed as a consequence. Thus, clustering variables into groups is necessary for alarm optimization.

Most of the above alarm optimization methods are based on the off-line system optimization, and the results obtained in the corresponding systems are also obvious, large or small, effectively optimizing the production process and reducing losses. Thus, to promptly detect the chattering alarms and effectively reduce the number of chattering alarms, an on-line method was given to detect alarms in a timely manner [30]. As for the HVAC systems, Chakraborty et al. put forward a novel dynamic threshold method with a data-driven model using extreme gradient boosting (XGBoost), which mainly utilized early fault detection [31]. The static and dynamical performance analyses were used to update evidence in designing the industrial alarm system to reduce unnecessary alarms [32]. There are also some corresponding approaches for optimization, such as alarm management strategy [33], alarming mechanism [34], and threshold setting [35], which have promoted the development of dynamic methods to a certain extent.

However, as the current industrial production processes change irregularly, the preceding production process and the following process cannot be consistent all the time, such as the changes caused by different conditions or an abnormal process. In view of the problem, a new alarm threshold optimization method is proposed, which uses the correlation degrees among the variables and clustering analysis. Herein, this paper mainly has four significant contributions. (1) It considers the gray correlation degree analysis. Variables with similar influence on the system can be found out through correlation analysis. (2) It could carry on the group sorting according to the intrinsic clustering analysis. (3) It reduces FAR and also has a significant inhibitory effect on MAR (significantly reduced invalid alarms). (4) It can be used as a reference for real-time online optimization. When connecting the current programs to the computer interface in on-line systems, it could meet the requirements of the fast-changing production processes through setting an update period and data, which would consider the alarm rates. In addition, this method could help operators reduce operation load, make more efficient repair measures in a timely manner, and reduce the losses.

2. Optimum Design Outline

2.1. Alarm Efficiency Index

At present, an alarm system is important for safety, which generally utilizes FAR and MAR as efficiency indices to measure the accuracy of detecting operation conditions [36]. Based on the operation conditions, industrial processes usually contain normal and abnormal situations, which generally use the FAR and MAR to represent the probability directly for a variable when its measured values go beyond the threshold in normal operations, and within the threshold in abnormal operations in an alarm system [37].

The FAR and MAR can be obtained as follows:

Initially, for a variable x, within a period of time, two groups of data under normal and abnormal situations are obtained. Where a group of data are collected as the normal data when the process runs normally and steadily, another set of data are collected as the abnormal data when the process deviates from normal operation state obviously, which contains added disturbance or failure.

Later, for a variable x, the probability density functions f(x) and g(x) under the two situations, respectively, are obtained by fitting the corresponding data, which can be shown in Figure 1, where x_T denotes as the alarm threshold. For a certain parameter of the system, x_T indicates that the parameter has a well running state under the current threshold. When it exceeds or falls below the current threshold, the process may generate redundant false alarms or missed alarms. Here, false alarms will be activated when normal process variable values (the blue line) falls below x_T, and missed alarms will be activated when the abnormal process variable values (red dotted line) exceed x_T.

Figure 1. Process probability density curves of x.

Finally, given the

x_{T}

, based on Figure 1 and the Equations (1) and (2) [5,36], the FAR and MAR can be obtained.

F A R = \int_{- \infty}^{x_{T}} f (x) d x

(1)

M A R = \int_{x_{T}}^{+ \infty} g (x) d x

(2)

The following work in this paper can be conducted when the functions (f(x) and g(x)) for a variable can be fitted which was irrelevant to the distribution.

2.2. Alarm Clustering Analysis

Based on the alarm clustering algorithm, variables can be clustered into groups.

Correlation degree analysis

A measure of the degree of correlation between two factors in a system that varies from time to time or from object to object is called the correlation degree [38]. In a system process, if the trend of change of the two factors is consistent, that is, the degree of synchronous change is high, then the degree of correlation is high. Conversely, it is lower. Thus, the gray correlation analysis method is a method to measure the correlation degree among factors according to the degree of similarity or difference of development trend among factors, that is, “gray correlation degree”.

Specific calculation steps for correlation analysis:

(1): Determine the reference sequence and comparison sequence. The data sequence that reflects the behavior characteristics of a system is called a reference sequence and the data sequence composed of factors that affect the behavior of a system is called a comparison sequence;
(2): Conduct dimensionless treatment for the reference sequence and comparison sequence.

Due to the different physical meanings of each factor in the system, the dimensionality of the data may not be the same, which is not convenient for comparison, or it is difficult to get the correct conclusion when comparing. Therefore, in the analysis of gray relational degree, dimensionless data processing should be generally required.

(3): Determine the reference sequence and comparison sequence of the gray correlation coefficient ξ(X_i).

The correlation degree is essentially the difference in geometry among curves. So, the difference among curves can be used as a measure of the correlation degree. For a reference sequence

X_{0} = {x_{0} (1), x_{0} (2), \dots, x_{0} (n)}

, there are several comparison sequences X₁, X₂, …, X_m, the correlation coefficient ξ_i(k) of each reference sequence and comparison sequence each time is deduced by the following formula:

ξ_{i} (k) = \frac{\min_{j} \min_{i} | x_{0} (l) - x_{j} (l) | + \underset{j}{P \max} \max_{i} | x_{0} (l) - x_{j} (l) |}{| x_{0} (k) - x_{i} (k) | + \underset{j}{P \max} \max_{i} | x_{0} (l) - x_{j} (l) |}

(3)

where, P is the distinguish coefficient, the value range of which is generally between 0–1, with 0.5 as the common value;

x_{0} (k) - x_{i} (k)

represents the absolute difference between the sequences X_i and X₀ at point k; l = 1, 2, …, n,

\min_{i} | x_{0} (l) - x_{j} (l) |

is the minimum difference of the first level, which represents the minimum difference between sequences X_j(l) and X₀(l) at each point;

\min_{j} \min_{i} | x_{0} (l) - x_{j} (l) |

is the minimum difference of the second level, which represents the minimum difference in all sequences based on the minimum difference found in each sequence;

\max_{i} | x_{0} (l) - x_{j} (l) |

is the maximum difference of the first level, which represents the maximum difference between sequences X_j(l) and X₀(l) at each point;

\max_{j} \max_{i} | x_{0} (l) - x_{j} (l) |

is the maximum difference of the second level, which represents the maximum difference in all sequences based on the maximum difference found in each sequence.

In order to avoid the resulting deviation caused by variable units and other factors, it is necessary to conduct standardized processing on variable data.

(4): Calculate the correlation degree

As the correlation coefficient denotes the value of correlation degree between the comparison sequence and the reference sequence at each time, it has more than one value, which could lead the information to be too scattered to facilitate the overall comparison. Therefore, it is necessary to concentrate the correlation coefficient of each moment into a value, that is, to find its average value, as the value expression of the correlation degree between the comparison sequence and the reference sequence.

Correlation degree r_i represents the gray correlation degree of comparison sequence X_i to reference sequence X₀, also called sequence correlation degree, average correlation degree, and line correlation degree, the formula of which is shown as follows:

r_{i} = \frac{1}{n} \sum_{k = 1}^{n} ξ_{i} (k)

(4)

The closer the value of r_i is to 1, the better the correlation is.

2.

Clustering analysis

Specific clustering steps:

(1): Calculate the gray correlation coefficients between every two variables, then sum the distances;
(2): Calculate the correlation degree standard deviations of the above sums, utilizing w_d to denote the deviation result;
(3): Based on the relationship between w_d (the value obtained by 0–1 normalization for the summation of the correlation coefficients of one variable to all other variables) and C_g (global correlation degree level), and the relation of Pearson correlation coefficients and correlation levels [39], variables are clustered into groups, listed in Table 1. Then, the variable weight of a variable in one group can be calculated through the data of variables in the group.

Table 1. Relationship between the values of w_d and C_g.

3.: Variable weight calculation

The variable weight of a variable in one group can be determined through the mean square error method with specific steps, as below:

(1): Data normalization

$Y_{i j} = \frac{x_{i j} - \min (x_{i j})}{\max (x_{i j}) - \min (x_{i j})}$

(5)

where, $x_{i j} (i = 1, 2, \dots, n; j = 1, 2, \dots, m)$ denotes the initial data of the jth variable in group i.
(2): Mean value

$\bar{Y_{i j}} = \frac{1}{m} \sum_{j = 1}^{m} Y_{i j}$

(6)
(3): Mean square error

$σ_{i j} = \sqrt{\sum_{j = 1}^{m} {(Y_{i j} - \bar{Y_{i j}})}^{2}}$

(7)
(4): Variable weight

$w_{i j} = \frac{σ_{i j}}{\sum_{j = 1}^{m} σ_{i j}}$

(8)

Herein, two efficiency indices are introduced totally, FAR and MAR. Compared with MAR, the correlation degree mainly reflects on FAR, which has a significant effect on the system. Therefore, the weight w_ij is given for FAR. Meanwhile, MAR/R_MAR is used in case of overlarge MAR, where R_MAR denotes the maximum acceptable MAR, values of which generally less than the engineering required error (0.05) with 0.01, recommended by [2].

2.3. Threshold Optimization

The optimization objective function, shown as Equation (9), is established according to the alarm information under normal and abnormal situations, which is solved by the numerical optimization method from the point of view of minimizing.

F (x) = Min (\frac{F A R}{\frac{R_{F A R}}{(1 + w_{i j})}} + \frac{M A R}{R_{M A R}})

(9)

where,

R_{F A R}

denotes the maximum acceptable FAR, the value of which generally less than the engineering required error (0.05) with 0.01, recommended by [2].

Figure 2 depicts the flow chart of a quadratic interpolation optimization algorithm with the basic thought shown as: for F(x) = Min φ(x) (x∈R¹), the φ(x) can be fitted by y(x), which consists of some dots. Then, the extreme point μ of y(x) is an estimate value of x^*.

Figure 2. Flow diagram of quadratic interpolation optimization algorithm.

A threshold optimization algorithm is implemented as follows:

(1): Give the initial interval [x₁,x₃], three points (x_1, y₁), (x_2, y₂), (x_3, y₃), and convergence precision ε, where, x₁ < x₂ < x₃, ε > 0;
(2): Calculate c₁, c₂ (where, c₁ = (y₃ − y₁)/(x₃ − x₁), c₂ = [(y₂ − y₁)/(x₂ − x₁) − c₁]/(x₂ − x₃)), and x_p = 0.5(x₁ + x₃ − c_1/c₂), y_p = f(x_p);
(3): If |y₂ − y_p| ≥ ε, then go step (4), otherwise, go step (9);
(4): If x_p > x₂, then go step (5), otherwise, go step (7);
(5): If y₂ ≥ y_p, then x₃ = x_p, y₃ = y_p, return to step (2) otherwise, go step (6);
(6): Let x₁ = x₂, y₁ = y₂, x₂ = x_p, y₂ = y_p, return to step (2);
(7): If y₂ < y_p, then x₁ = x_p, y₁ = y_p, return to step (2) otherwise, go step (8);
(8): Let x₃ = x₂, y₃ = y₂, x₂ = x_p, y₂ = y_p, return to step (2);
(9): If y₂ < y_p, then x^* = x₂, y^* = y₂, otherwise, go step (10);
(10): x^* = x_p, y^* = y_p;
(11): Output x^* = x_p, f^* = f (x^*).

2.4. Optimization Process Description

Figure 3 gives the optimization algorithm, the specific explanations of which are shown as follows:

Figure 3. Flow chart of alarm threshold optimization process.

To begin with, the correlation degrees of variables are obtained by analyzing correlation relationships among them. Subsequently, the variables are grouped according to the gray correlation coefficients and clustering analysis, given the weight ω_i for FAR in each group. An objective function about the FAR, MAR, R_FAR, R_MAR, and ω_i is then established with variable weight. Eventually, based on the optimization algorithm, the objective function is optimized for obtaining the optimal alarm threshold.

3. Theory Study—Tennessee Eastman (TE) Simulation Process

3.1. Process Description

TE process was put forward by J. J. Downs and E. F. Vogel. It can be used as a data source, which is commonly utilized for comparing various methods, such as control optimization. Therefore, this work uses the TE simulation process as a case.

Figure 4 shows the flow diagram for a TE process, which contains five major operating units: reactor, condenser, compressor, separator, and stripper. The TE process consists of 15 known failures and 5 unknown failures. Meanwhile, it also consists of 12 operational variables and 41 measured variables. Table 2 lists the selected 10 measured variables for researching the applicability analysis of the method. To verify the accuracy of the results, the same sampling environment (the faults were after 8 simulation hours introduced) should be necessary, sampling interval of which is ΔT = 3 min by considering the time constants of the process in a closed loop [40,41], as the process under the sampling time can be considered reaches a relatively steady running state, which could reflect the running state with a long period of time, to some extent. Meanwhile, 960 groups of data under a normal condition and 500 groups of data under an abnormal condition with failure 6 are collected.

Figure 4. Flow chart of TE process.

Table 2. Ten measured variables in TE process.

3.2. Cluster Variables and Calculate Weights

(1): Variable clustering

The selected ten variables can be regarded as 10 vectors with 960 dimensions, containing 960 observations of normal data. Table 3 lists the correlation degree of these ten variables.

Table 3. Correlation degree of ten variables.

Calculating the sum (d_T) of the correlation degree between one variable and all other variables is necessary for treating ten variables as a whole. Table 4 lists the sums and the normalized result (w_d).

Table 4. Sums of the correlation degree and normalized result of ten variables.

Based on the normalization result and criterions given in Table 1, the original ten variables are clustered into four groups. Variables V1 and V2 constitute the first group, variable V3 constitutes the second group, variable V6 constitutes the third group and the rest belong to the last group.

(2): Variable weight

Table 5 lists the weights for variables in four groups.

Table 5. Variable weights.

For the FAR and MAR, the impact on the system caused by the correlation degree of variables commonly reflect on FAR more directly than MAR, therefore, giving the correlation weight to FAR.

3.3. Optimization Solution

Steps of the optimization solution are listed as below, using the first variable V1 as an example.

First, the probability density function Equations (10) and (11) of V1 in the normal and abnormal cases are fitted with 960 observations of normal data and 500 observations of abnormal data, respectively. Figure 5 shows the corresponding probability density curves.

f (x) = 13.973 \cdot e^{- 613.36 \cdot (x - 0.25114)^{2}}

(10)

g (x) = 4.2301 \cdot e^{- 56.214 \cdot (x - 0.041936)^{2}}

(11)

Figure 5. Process probability density curves of V1.

Second, the objective function is obtained as Equation (12), the parameters of which, including weight w₁ = 0.521, Equations (1) and (2), R_FAR = R_MAR = 0.01, are input into function Equation (9).

\begin{array}{l} F (x) & = Min (\frac{F A R}{\frac{R_{F A R}}{(1 + w_{i j})}} + \frac{M A R}{R_{M A R}}) \\ = Min (\frac{\int_{- \infty}^{x_{T}} 13.973 \cdot e^{- 613.36 \cdot (x - 0.25114)^{2}} d x}{\frac{R_{F A R}}{(1 + w_{i j})}} \\ + \frac{\int_{x_{T}}^{+ \infty} 4.2301 \cdot e^{- 56.214 \cdot (x - 0.041936)^{2}} d x}{R_{M A R}}) \end{array}

(12)

Eventually, the objective function is optimized as Figure 6, with optimum x_T = 0.22, F(x_T) = 6.5098, FAR = 0.00654, MAR = 0.06710. The thresholds of other variables are optimized similarly.

Figure 6. Optimization result of V1.

Additionally, to verify the effectiveness of this method, some other methods should be utilized for comparison, such as the deadband [42], alarm delay [36], and moving average filter (MAF) with original reference value [27]. The summarized results listed in Table 6.

Table 6. Optimized results for ten variables.

To verify the effectiveness of the method, an abnormal data set of fault 8 is added. Table 6 lists the results under the two failures (failure 6: f6, failure 8: f8).

3.4. Results and Analysis

Shown in Figure 7, the FAR calculated by the proposed method has an effective reduction in cases with initial thresholds, and the MAR of which under control simultaneously.

Figure 7. Histograms of FARs in five cases with two failures (f6,f8).

In Figure 8, the numbers of variable alarms in total under 5 cases under the case with failure 6 (8) are 3532, 3414, 2916, 2928, 1733 (3532, 3417, 2968, 2992, 1744), respectively. Compared with the other four methods, the alarm reduction rates calculated by the presented method are 50.9%, 49.2%, 40.5%, and 40.8% (50.6%, 48.9%, 41.2%, and 41.7%), respectively. From which, it reflects the method could have some impact on the alarm threshold optimization, bring a lower FAR and fewer alarms.

Figure 8. Histograms of number of alarms in five cases with two failures (f6,f8).

4. Real Industrial Verification—Industrial Ethylene Production Process

4.1. Process Description

To verify the effectiveness of the method in actual industry, an industrial ethylene production process was selected as an actual case, the process of which generally consists of four major processes: cracking, compression, quenching, and separation.

Figure 9 shows the flow chart of the ethylene production process, the separation section of which contains the most of research object. Due to the large number of variables in this process, in order to avoid the influence of blind selection on the results, we try to objectively select variables with correlation relationships among them not easy to judge and which have significant impact on the yield and quality of ethylene through empirical knowledge and analysis. Therefore, ten variables are selected to represent the process, which are listed in Table 7. For this process, the time-lag effect under start-up or shut-down cases is great. Therefore, the data under the steady state are selected as normal data and the data under the state with disturbances are then chosen as abnormal data for study, respectively, are selected for study. In total, 1000 observations of data for 10 variables are extracted with a sampling interval of 1 min, containing 500 observations of normal data and 500 observations of abnormal data with the feed flow of cracking furnace increases by 10%.

Figure 9. Flow chart of ethylene production process.

Table 7. Ten measured variables in ethylene production process.

4.2. Cluster Variables and Calculate Weights

(1): Variable clustering

The selected ten variables can be regarded as 10 vectors with 500 dimensions, containing 500 observations of normal data. Table 8 lists the correlation degree of these 10 variables.

Table 8. Correlation degree of 10 variables.

Calculating the sum (d_T) of the correlation degree between one variable and all other variables is necessary for treating ten variables as a whole. Table 9 lists the sums and the normalized result (w_d).

Table 9. Sums of the correlation degree and normalized result of ten variables.

Based on the normalization result and criterions given in Table 1, the original ten variables are clustered into four groups. Variables V1, V4, and V7 constitute the first group, variables V2 and V8 constitute the second group, variable V10 constitutes the third group and the rest belong to the last group.

(2): Variable weight

Table 10 lists the weights for variables in four groups.

Table 10. Variable weights.

4.3. Optimization Solution

Steps of the optimization solution are listed as below, using the first variable V1 as an example.

First, the probability density functions Equations (13) and (14) of V1 in normal and abnormal cases are fitted with 500 observations of data, respectively. Figure 10 shows the corresponding probability density curves.

f (x) = 12.756 \cdot e^{- 213.36 \cdot (x - 0.71021)^{2}}

(13)

g (x) = 2.1321 \cdot e^{- 34.231 \cdot (x - 0.019436)^{2}}

(14)

Figure 10. Process probability density curves of V1.

Second, the objective function is obtained as Equation (15), parameters of which in-cluding weight w₁ = 0.453, Equations (1) and (2), R_FAR = R_MAR = 0.01 are inputted into function Equation (9).

\begin{array}{l} F (x) & = Min (\frac{F A R}{\frac{R_{F A R}}{(1 + w_{i j})}} + \frac{M A R}{R_{M A R}}) \\ = Min (\frac{\int_{- \infty}^{x_{T}} 12.756 \cdot e^{- 213.36 \cdot (x - 0.71021)^{2}} d x}{\frac{R_{F A R}}{(1 + w_{i j})}} \\ + \frac{\int_{x_{T}}^{+ \infty} 2.1321 \cdot e^{- 34.231 \cdot (x - 0.019436)^{2}} d x}{R_{M A R}}) \end{array}

(15)

Eventually, the objective function is optimized as Figure 11, with optimum x_T = 0.73, F(x_T) = 6.5095, FAR = 0.00354, MAR = 0.0661. The thresholds of other variables are optimized similarly.

Figure 11. Optimization result of V1.

To verify the effectiveness of the method, an abnormal data set of the feed flow of cracking furnace decreases by 10% and is added. Table 11 lists the results under the two cases (cracking furnace increases by 10%: c1, cracking furnace decreases by 10%: c2).

Table 11. Optimized results for 10 variables.

4.4. Results and Analysis

Shown as the Figure 12, the FAR calculated by the proposed method has an effective reduction in cases with initial thresholds, and the MAR of which under control simultaneously.

Figure 12. Histograms of FARs in five cases with two cases (c1,c2).

In Figure 13, the numbers of variable alarms in total under 5 cases under the case of c1 (c2) are 2552, 2434, 2036, 2248, 1316 (2552, 2430, 2129, 2239, 1346), respectively. Compared with the other four methods, the alarm reduction rates calculated by the method are 48.4%, 45.9%, 35.3%, and 41.4% (47.2%, 44.6%, 36.8%, and 39.9%), respectively. From which, it reflects the method could have some impact on the alarm threshold optimization, bring a lower FAR and fewer alarms.

Figure 13. Histograms of number of alarms in five cases with two cases (c1,c2).

5. Conclusions

In this work, correlation degree and clustering analysis based method is presented to achieve threshold optimization: the gray correlation coefficients of variables are first obtained by analyzing correlation degrees among them; the variables are grouped later according to the correlation degree and clustering analysis, given the weight ω_ij for FAR in each group; optimization algorithm is finally utilized to optimize objective function about FAR, MAR, R_FAR, R_MAR, and ω_ij to complete threshold optimization.

According to the analysis of case theory study with TE simulation process and actual industrial verification for industrial ethylene production process, the results manifest the presented approach can not only reduce FAR, have significant inhibitory effect on MAR, and decrease the number of alarms effectively in total, but could carry on the grouping sorting according to the intrinsic clustering analysis, which could help operators reduce operation load. Meanwhile, it will also leave operators more time to make more efficient repair measures timely and reduce losses through helping them to identify variables that have larger and more rapid impact on system, extend the deteriorative time for abnormity.

Author Contributions

Methodology, Z.W. and G.Z.; Writing—original draft preparation, G.Z.; Writing—review and editing, Z.W. and G.Z.; Supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

Support for carrying out this work was provided by the National Key R&D Program of China (2018YFB1701103), International (Regional) Cooperation and Exchange Project (61720106008), and National Science Fund for Distinguished Young Scholars (61725301).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement