Data-Driven Methods for the Detection of Causal Structures in Process Technology

: In modern industrial plants, process units are strongly cross-linked with each other, and disturbances occurring in one unit potentially become plant-wide. This can lead to a ﬂood of alarms at the supervisory control and data acquisition system, hiding the original fault causing the disturbance. Hence, one major aim in fault diagnosis is to backtrack the disturbance propagation path of the disturbance and to localize the root cause of the fault. Since detecting correlation in the data is not sufﬁcient to describe the direction of the propagation path, cause-effect dependencies among process variables need to be detected. Process variables that show a strong causal impact on other variables in the process come into consideration as being the root cause. In this paper, different data-driven methods are proposed, compared and combined that can detect causal relationships in data while solely relying on process data. The information of causal dependencies is used for localization of the root cause of a fault. All proposed methods consist of a statistical part, which determines whether the disturbance traveling from one process variable to a second is signiﬁcant, and a quantitative part, which calculates the causal information the ﬁrst process


Introduction
Modern industrial plants are complex systems that need to run over several weeks or months.During a production run, operating conditions can change, which can lead to abnormal behavior of the process.Since modern plants' control and measurement devices are strongly cross-linked with each other, a failure in a major piece of equipment can potentially lead to plant-wide disturbances and for example result in a flood of alarms, making it difficult to localize the root cause of the disturbance.As not all relations of the different process parameters are well known, data-driven methods can be of great help to localize or at least to narrow down on the cause of a disturbance.
Backtracking the disturbance propagation path using data-driven methods means to detect temporal cause-effect relationships in a data set.In detail, this means that statistical relationships and time-shifts are used to reconstruct the propagation direction of the disturbance.Several methods have been already developed to test for temporal causal dependencies in data: One of the first approaches was made by Granger [1], who compares two autoregressive models.The first model contains only past values of itself; the second model is augmented with past values of another variable.If the augmentation improves the regression, it is assumed that this variable has a causal impact on the other.In [2], an algorithm for root cause localization based on the cross-correlation function is presented for causal analysis, especially when having valve stiction.Schreiber [3] presents a concept, named transfer entropy, which detects causal dependencies by measuring the reduction of uncertainty when one variable predicts future values of the other.Further methods for the detection of causal dependencies are proposed in terms of dynamic Bayesian networks [4,5] or nearest neighbor approaches [6].An overview about different methods, tested on artificial benchmark data sets and for biosignal analysis, is given in [7].
In this paper, different algorithms are proposed, which are based on the cross-correlation function, the transfer entropy, Granger causality and support vector machines.Next, the results of the different methods are combined, and a root cause priority list is generated.This priority list contains a ranking of the different process variables describing their possibility as being the actual cause of the fault.
All proposed algorithms consist of a statistical test, which determines whether the disturbance traveling from one process variable to a second is significant, and a quantitative part, called the causal strength, which defines the influence the input variable has on the output variable.For all methods, the causal strength takes values between zero(= no causal dependency) and one (= causal dependency).Finally, this value is used for generating the root cause priority list.The paper centralizes the main results given in [8], containing the following main contributions: (1) a proposal of a new algorithm based on support vector machines by using a recursive variable selection and model reduction approach; (2) the development of a design approach to combine all methods into one causal matrix and transfer into a root cause priority list; (3) the extension of an existing method based on the cross-correlation function by using permutation tests for the significance test; and (4) the development of a visualization method for the causal matrices.The paper is structured as follows: In Section 2, the foundations for the detection of causal dependencies in dynamic systems are introduced.Section 3 explains the algorithms.In Section 4, it is pointed out how the results of the different methods can be combined and how the root cause priority list is calculated.Additionally, a new way to visualize the results from the different methods is described.Finally, Section 5 tests the methods on data from a simulated chemical stirred-tank reactor and on an experimental laboratory plant.

Detecting Causal Dependencies in Process Measurements
Definition (causal system): A time-invariant system is causal if, for all input signals with u 1 (t) ≡ u 2 (t) and t ≤ t 1 , for any t 1 , the output signals for 0 ≤ t ≤ t 1 show the characteristic y 1 {x 0 , u 1 (t)} ≡ y 2 {x 0 , u 2 (t)} (with x 0 : the initial state).Systems that are not causal are called acausal.In causal systems, the input signals for t > t 1 do not have an impact on the behavior of the output signal until time t 1 .Additionally, for causal systems, the impulse response for t < 0 is zero.
This definition of causality results in several system theoretic consequences.For all static systems f : R → R, causality exists, even if u or y is seen as the input signal, since in both cases, the output signal is not depending on future values, but on the current value of the input signal.
Additionally, all linear time-continuous systems with one input and one output signal are causal, since Y (s) = G(s)U (s) and U (s) = G −1 (s)Y (s) explain the data equally well (U (s), Y (s): Laplace-transformed input and output signals).In that case, it is only possible to test if the transfer function G(s) is realizable (order of the numerator polynom ≤ order of the denominator polynom).Still, this does not contain information about causality.
To backtrack the disturbance propagation path in a plant, a signal decay time or a signal dead time needs to be present in the measurements.Decay times between u(t) and y(t) characterize a smoothing effect, so that y(t) seems to be delayed with respect to u(t).A dead time T D between u(t) and y(t) exists, if y(t) does not depend on u(τ ) in τ ∈ (t − T D , t], but only on u(t) in τ ≤ t − T D .In other words, dead times describe the interval a change in the input signal needs to become visible on the output of the system.In industrial processes dead times exist, e.g., in tubes, when measuring a fluid concentration on different positioned sensors.
Another point of view on causality is given by Pearl [9].The central idea is that a cause C increases the probability of the appearance of an effect E. This means that C can only be the cause of E, if P (E = 1|C = 1) > P (E = 1|C = 0) is fulfilled.However, an increase of the probability can only be done through an active intervention.Therefore, Pearl introduces the do-operator, which forces setting C on a fixed value.In other words, C only has a causal impact on E, if P (E = 1|do(C = 1)) > P (E = 1|do(C = 0)) is fulfilled.This approach is the only possibility to avoid the detection of false causal dependencies.
An example for the detection of a false causal dependency is a non-measured signal u z , which has an impact on u, as well as a later one on y, without having a direct dependency pointing from u to y.In [10], it is shown how the approach given by Pearl can be used to detect causal structures in process data.Since the methods proposed in this paper only rely on observational data, meaning that no active intervention is performed, this approach will not be pursued in this paper.

Cross-Correlation Function
The cross-correlation function (CCF) [11] quantifies the linear similarity of two equidistant sampled time series u[k], y[k] that are time-shifted by a constant lag λ.The CCF is defined as: with λ ∈ {1 − K, 2 − K, . . ., K − 2, K − 1} and: If λ max > 0 holds, there could be a causal relation from u → y, and the hypothesis test is performed.To check if a significant correlation between the signals is present, a t-test with the null hypothesis defined as having no correlation is used with significance level α = 0.05.If this test fails, it is assumed that no cause-effect relationship exists from u → y.
Significant causal direction: This test covers the possibility that the resulting CCF can have a global maximum for λ > 0 and a slightly lower local maximum for λ < 0, meaning that the causal direction is not obvious.Therefore, this test checks if the maximum for λ > 0 is significantly different from the maximum for λ < 0. To perform the test, a compound parameter C CCF defined as: with −1 ≤ C CCF ≤ 1 is used.A significance value C CCF > 0 indicates a causal dependency from u → y.
As C CCF strongly depends on the characteristics of u[k] and y[k], an adaptive threshold is derived through a 3σ permutation test.Since performing a complete permutation of u[k] destroys all causal information, the resulting value of C CCF should be close to zero.This idea is exploited by calculating random permutations of u[k] and generating several values for C CCF π .The threshold C CCF thresh for each pair of variables is calculated as: If both significance tests are passed, the found causal dependency is assumed to exist.

Causal Strength
The causal strength of u → y is defined as: with 0 ≤ Q CCF ≤ 1. β CCF is a design parameter to make the different methods numerically compatible, and its selection will be explained in Section 4. The proposed algorithm for the detection of cause-effect relationships for two time series u[k] and y[k] using the CCF is summarized as follows: Algorithm 1: Algorithm based on cross-correlation.

Granger Causality
The Granger causality (GC) has been introduced by Clive Granger [1] and is traditionally used in the field of economics [12,13].Recently, the application of GC is of growing interest, especially in the field of neuroscience [14] and biology [15].

The central idea is that it is assumed that a signal u i [k] has a causal influence on y[k] if past values from u i [k] and y[k] result in a better prediction of y[k] than using only past values from y[k] for prediction. GC takes into account that besides the signal u
A comparison is done using two vector autoregressive models and performing a one-step-ahead prediction.If the prediction error of the first model is substantially smaller than the one from the second model, a causal dependency u i → y is concluded.For the proposed algorithm, each time series is once selected as output y := u m , while the left r − 1 time series are used as input.This can be formulated with n defining the model order as: By definition, the parameters âj , blj in Equations ( 7) and ( 8) result from separate estimations.The impact of the i-th input signal is measured in terms of the sum of the squares of residuals without u i as E U i Y and with u i as E U i Y and is used to test for causal significance and for calculating the causal strength.
The performance of the model strongly depends on the model order n.A too small of a value for n leads to a large prediction error, and setting n too large results in an overfitted model.Therefore, the Akaike information criterion (AIC) [16] is used for model order estimation, while taking into account the prediction errors, the sample size K, the number of variables r and the model order n.For model order estimation, the loss function is once set to V = E U i Y for the unrestricted and to V = E U i Y for the restricted model.The AIC is defined as: while the estimated order is selected as arg min n AIC(n).Furthermore, all models are tested for consistency by performing a Durbin-Watson statistic [17] on the residuals.

Significance Test
This test is performed to check whether E U i Y and E U i Y differ significantly.According to [18], E U i Y and E U i Y follow χ 2 distributions, and an F -test can be performed to verify if the time series u i [k] has a causal influence on y[k].The test is performed on the restricted and unrestricted model under the null hypothesis that E U i Y < E U i Y with significance level α = 0.05.

Causal Strength
The strength of the significant causal relationship u i → y is defined as: The design parameter β GC will be determined in Section 4. The suggested algorithm is summarized as follows: Algorithm 2: Algorithm based on Granger causality.
4. If the causal dependency is significant, calculate the causal strength u i → y by Q GC ;

Transfer Entropy
The transfer entropy (TE) is an information theoretic measure and was first introduced by Schreiber [3].Applications of the TE can be found, e.g., in neuroscience [19,20] and in financial data analysis [21].In the field of process engineering, research has been conducted by Bauer [6], who uses TE for the causal analysis of measurements taken from chemical processes.
From its definition, it can be used to detect causal dependencies by testing how much information is transferred from u[k] to y[k] and how much information is transferred from y[k] to u[k].The transition probability for y[k] is defined as P (y n+1 |y), which is used as short notation for: According to [21], the boundaries of the TE are 0 ≤ TE uy (λ) ≤ H y , with H y being the entropy of the output signal.To capture dead times in the data, the parameter λ is introduced to perform a backward-shifting of u[k], and Equation ( 12) is calculated for different u[k − λ].For the calculation of causal dependencies, the maximum value of the transfer entropy is set to TE uy := arg max λ (TE uy (λ)).
To calculate the value of the time horizon n, the residual sum of squares of several vector autoregressive models is calculated: with âj resulting from a least squares estimation.The used order n TE is then chosen as the minimum of the Akaike information criterion defined as AIC(n) = log σ2 y + 2n K [16].

Significance Test
Testing for a significant causal relationship is done by a test introduced in [3].The key idea is to generate an adaptive threshold TE thresh with mean value µ TEu π y and standard deviation σ TEu π y .If TE uy > TE thresh uy holds, a causal dependency u → y is concluded, and the causal strength can be calculated.

Causal Strength
By definition TE uy ≤ H y , meaning that for normalizing to values between zero and one, the causal strength of the transfer entropy can be defined as: The design parameter β TE will be chosen in Section 4. The suggested algorithm for the detection of cause-effect relations is summarized as: Algorithm 3: Algorithm based on transfer entropy.

Support Vector Machines for Regression
Support vector machines (SVM) are learning methods that are used for supervised learning.Originally, they were developed by Vapnik [22] for classification and later on were extended towards regression and time series prediction [23].In industrial processes, they are used, e.g., for fault detection and diagnosis [24] or optimization [25].For the detection of causal structures in data, a model reduction approach is proposed.
Given a training set with {x i , z i } K i=1 with x i ∈ R n and z i ∈ R, an SVR means to find the regression function: containing as parameters the normal vector w, the bias b and •, • , denoting the scalar product.The function f (x) should have at most a deviation from the values z i for the whole training data set, while seeking a normal vector w, which is as small as possible.As the selection of a too small insensitivity zone would lead to equations with infeasible constraints, slack variables ξ, ξ ∈ R ≥0 and an additional weighting parameter C ∈ R >0 are introduced leading to the objective function: with f The optimization problem is solved by means of its dual containing the Lagrange multipliers α, α.The solutions of the regression function is finally given by: Kernel functions: One of the main reasons why SVMs are employed is their ability to deal with nonlinear dependencies by introducing so-called kernel functions; see, e.g., [26,27].The input data R n is mapped into some feature space F with a possibly higher dimension by using a nonlinear transformation function Φ and searching for the flattest function in the defined feature space.Since SVMs solely depend on the calculation of dot products, the computational complexity for this transformation remainsfeasible.
A kernel function, defined as k(x, x ) := Φ(x), Φ(x ) , can be plugged into Equation (18), resulting in the final regression function: In this paper, only Gaussian kernels, defined as k , are used for the detection of causal structures.To optimize the parameters , C and σ, a downhill simplex algorithm [28] is used.

Detecting Causal Dependencies
To detect cause-effect dependencies, an input data set The SVM is once trained and optimized regarding , C and σ on the complete data set.The time horizon n is estimated as given in Equation ( 13) for the transfer entropy.
To detect cause-effect dependencies, an input data set The SVM is once trained and optimized regarding , C and σ on the complete data set.The time horizon n is estimated as given in Equation ( 13).
In the first step, the variables in Ψ uy are ranked in terms of their prediction accuracy of y[k] by performing a recursive variable elimination algorithm based on the Lagrange multipliers as proposed by [29].
In the second step, a relevant subset of input variables is selected.If the resulting subset contains one or several past values of u[k], it is assumed that u causes y.For the selection of the size of the subset, an F -test [11] (α = 0.05) is performed on the resulting residual sum of squares of the two SVMs, while the first SVM contains ψ variables and the second SVM ψ + 1 variables.If the null hypothesis cannot be rejected, the residual sum of squares does not change significantly, and the found subset of variables is set to size ψ.

Causal Strength
Similar to Granger causality, the causal strength is calculated based on the comparison of the squared sum of residuals.The causal strength Q SVM is calculated through a comparison of the two different squared sums of residuals, named E uy and E y , while E uy ≤ E y .In detail, E uy is calculated using the above explained SVM with the subset of input variables resulting from the initial set Ψ uy .For the prediction of y[k], the residual sum of squares E y is calculated by performing the same algorithm, only starting with the reduced set Ψ y = {y[k − 1], ..., y[k − n]}, which does not contain the time series u[k], and by using the same parameters opt , C opt and σ opt .
Again a tuning parameter β SVM is defined.Selecting the parameter value is postponed to Section 4. The resulting value Q SVM is therefore defined as: with 0 ≤ Q SVM ≤ 1, where zero equals no causal dependency and one means maximum causal strength.The complete algorithm using support vector machines for the detection of causal dependencies is summarized as Algorithm 4.
Algorithm 4: Algorithm based on support vector machines.
1. Estimate the time horizon n SVM using a VAR model and AIC to generate Ψ uy ; 2. Train SVM and fit user-selected parameters , C, σ using downhill simplex algorithm and check the consistency of the SVM using the Durbin-Watson statistic; 3. Perform variable selection and calculate subset; 4. If u is in the subset, set Q SVM as the resulting value of the causal strength u → y; As for each pair of variables a distinct SVM needs to be trained and tuned, this algorithm is the most complex one of the four presented.Regarding the transfer entropy, transition probabilities need to be calculated, meaning that this method can also become computationally intense.The cross-correlation function and Granger causality, being linear measures, are rather cheap to compute.For a detailed comparison of the different algorithms, containing a large set of benchmark data, refer to [8].

Reconstruction of the Disturbance Propagation Path and Localization of the Root Cause
Each pair of process variables results in a value that represents the causal influence one variable has on the other.For displaying the complete information, all relationships are written into a causal matrix Q ∈ R r×r defined as: consisting of the causal strengths q ∈ {0, . . ., 1} of the process variables X i with i = 1, . . ., r.In the matrix, the row index represents the variable that is the causing candidate and the column index representing the effect candidate.Values close to zero describe weak causal strengths, and values close to one describe strong ones.

Balancing and Combining Causal Matrices
Since each method uses a different mathematical approach, the causal matrices of the four methods are not comparable directly.To make them comparable to each other, the prior introduced exponential fitting parameters β CCF , β TE , β GC , β SVM ∈ [0, ∞) are used.
The proposed design approach is based on the assumption that, on average, all methods will work equally well on the data set.In that case, equally well means that for the found significant causal dependencies, all causal matrices will result in the same mean value.Hence, the value of each β parameter is fitted in a way so that the matrices Q CCF , Q TE , Q GC and Q SVM give the same mean for the significant cause-effect relationships.Regarding the investigated use cases in Section 5, the mean value for the causal matrix from each method is set to 0.5.
Finally, to calculate the combined causal matrix, the mean is taken over all balanced causal matrices for all causal dependencies.In that case, non-significant causal dependencies are set to zero.

Root Cause Priority List
This list contains a ranking of the analyzed process variables with regard of their possibility of being the actual root cause.As a consequence, a value defined as RC is associated with each variable.This is done by summing up the causal influence one variable has onto the other variables defined as: The variable having the maximum value of RC is ranked first, meaning that this variable is most likely to be the root cause of the disturbance.Table 1 outlines the representation of the root cause priority list.
Table 1.Root cause priority list from the causal matrix.
Rank Process variable RC

Visualizing Causal Matrices
Several techniques have been already developed that deal with the visualization of the causal matrices.In [18], circular directional charts are suggested, and bubble charts are proposed in [6].In [5], it is suggested to use heat maps to illustrate causal dependencies.Still, all of the methods have as a drawback that only one causal matrix can be visualized at a time.Hence, to compare the different causal matrices better, several ways for visualization are utilized in this paper.
Partially-directed graph: In these graphs, process variables are represented by nodes and causal dependencies by directed edges.It is possible that several edges point onto one node or that several edges leave one node.The main purpose of this representation is to give a fast overview of the disturbance propagation path, while the root cause is the first variable of the chain.Furthermore, the size of the arrowhead is used to indicate the strength of the causal dependency.The graph represents the combined causal matrix.Doughnut chart: These graphs are circular charts that are divided into several sectors, while having a blank center.To represent the causal matrices, the quantity of each sector results from the calculated entries in Q plus one blank sector.The value in the middle of the doughnut represents the combined causal strength.
Bar chart: Bar charts represent values in the form of rectangular bars while having their length proportional to the causal strength.This visualization avoids the drawback of the doughnut chart, as the different sections are hard to compare, since they are bent.

Use Cases
Two use cases, namely a simulated continuously stirred-tank-reactor and a laboratory plant, are used to test the methods.Further experiments on the laboratory plant and more simulation results on the tank-reactor, as well was on other benchmark data sets can be found in [8].

Continuously-Stirred-Tank Reactor
To study the performance of the different methods, the model of a continuously-stirred-tank reactor (CSTR), explained in [30], is used.The underlying chemical reaction scheme consists of two irreversible follow-up reactions, where an educt A reacts to an intermediate product B, and this reacts to the resulting product C.The reactants are dissolved in a fluid and can be measured in terms of the three concentrations c A , c B , c C at the outlet of the CSTR.The CSTR is continuously filled, while the fluid has the reactant concentration c in and the temperature ϑ fl .V describes the volume of the CSTR and F the selected volume flow rate.The parameters k 1 and k 2 are empirical parameters and describe the relationship between the temperature and speed of the chemical reaction.E 1 and E 2 are the activation energies of the reactants, and R is the universal gas constant.The parameter values are given in Table 2.

Parameter Value
Unit Finally, the underlying differential equations of the CSTR are: The set-points of the two input variables are chosen as ϑ fl,OP = 350 K and c in,OP = 1 mol /L, while ϑ fl is superposed with white noise having N (0, 3 K 2 ) and c in with white noise having N (0, 0.1 ( mol /L) 2 ).To calculate the causal matrices, in total, K = 1, 000 samples are used.An extract of the data set used for the analysis is given in Figure 1.From the differential equations, it is expected that the methods deliver as a result the disturbance propagation path ϑ fl → c in → c A → c B → c C .ϑ fl should be ranked in first position, since it has an impact on all three chemical reactions.
The results are illustrated in Figure 2, with the red squares marking the expected causal dependencies.The design parameters result in β CCF = 0.51, β TE = 0.11, β GC = 0.59 and β SVM = 0.28.Analyzing the bar chart in Figure 2 illustrates that all methods detect a large causal strength of ϑ fl pointing towards the other process variables.This becomes obvious when taking into account the underlying differential equations of the CSTR (see Equation ( 23)), as the temperature has a direct impact on all three concentrations.Furthermore, the result indicates that the nonlinearity implied in the exponential function has been correctly fitted by the Granger causality and cross-correlation function.
Another strong causal strength has been found from c in → c A .This can also be explained through the differential equations, as c in has a direct impact on c A .The relationship c in → c B is the only indirect causal dependency detected by all methods, and c A → c B and c B → c C are detected by GC, TE and the SVM.The cause-effect dependency c in → c C and c A → c C has been found by TE and the SVM.Except the SVM, which detects the wrong causal dependency c C → c B , no wrong causal dependencies are found by the other methods.Furthermore, the transfer entropy is the only method that detects all expected causal dependencies.The propagation path from the combination of the methods is given in Table 3.The resulting root cause priority list is given in Table 3. ϑ fl is correctly detected as being the root cause, and c in , being the source of disturbances, is ranked second.Furthermore, the results show that causal dependencies at the end of the reaction chain result in weaker causal strengths or that these dependencies do not pass significance tests.The reason is that the disturbances are low-pass filtered each time a reaction takes place, so that less fluctuations for inferring causal dependencies are present in the data.Additionally, when merging the methods into one resulting causal matrix, the correct causal dependencies obtain a much stronger weighting compared to the causal matrices resulting from only one method at a time.

Experimental Laboratory Plant
To evaluate the methods on real-world data, a fault is generated in an experimental laboratory plant that pumps water in cycles.A photo of the plant is given in Figure 4, and the connection of the different process devices is sketched in Figure 4.The process starts by setting a pump (x 1 ) positioned on the lower side of the plant into feed-forward control to transfer water into the ball-shaped upper tank.From the upper tank, the water passes several measurement devices before flowing into a lower cylindrical tank.Finally, the water flows from the lower tank back to the pump and closes the water cycle.Between the two tanks, pressure (x 2 ) and flow (x 3 ) are measured.With a valve (x 4 ), placed between the flow meter and the lower tank, the water flow can be controlled.Additionally, the filling level (x 5 ) is measured in the lower tank.To generate a fault, a connection cable between the valve and compressor is removed and reattached randomly.The pump is set to 50% of its maximum feeding rate.The resulting data are illustrated in Figure 5.The instance the valve closes, the water is blocked from flowing from the upper to the lower tank.As it reopens, the process goes back into stationary phase.This means for the process variables that the flow reduces, while the level meter measures a continuous reduction of the water in the lower tank.The hydraulic pressure increases, until the pump stops delivering water from the lower to the upper tank.It is expected that the valve (x 4 ) is detected as being the root cause of the disturbance.The results are given in Figure 6, where the red squares mark the expected causal dependencies, and the design parameters result in β CCF = 0.33, β TE = 0.26, β GC = 0.31 and β SVM = 0.82.
No method can detect all expected causal dependencies, but all expected causal dependencies are found by at least two methods.Table 4 gives the root cause priority list from the combined causal matrix.The valve is set on position one and is therefore correctly detected as being the root cause of the fault.
Like in the data from the CSTR, the outcome shows that when merging the methods into one resulting causal matrix, the correct causal dependencies obtain in a stronger weighting, meaning that the wrongly detected causal dependencies from some methods become less relevant.

Conclusion and Future Work
Several data-driven methods have been proposed to detect causal dependencies in measurements by exploiting information contained in time-shifts and statistical relationships.
As use cases, the methods were applied to backtrack the disturbance propagation path and localize the root cause with data coming from a simulated stirred-tank reactor and from a laboratory plant.In both cases, the causing variable of the fault was found correctly.Additionally, the results of the use cases showed, that it seems to be useful that more than one method is applied to perform an analysis.Since a found causal dependency is only a hypothesis, it gives more evidence if different methods indicate the same causal dependencies.
There is much room for future research.As all proposed methods localize faults solely from process data, one part of future work will focus on the integration of prior knowledge available from a plant (e.g., through the scheme of the plant or known process characteristics).Additionally, attenuation of fluctuations along their propagation through the system can be used as a further source of information for reconstructing the propagation path.Another topic is how the methods can be extended to work on MIMO systems, as it is also possible that two faults occur at the same time in a plant.Finally, the approach for combining the methods shows good results, but is rather ad hoc.Approaches using fuzzy decision making or Bayesian statistics need to be investigated.

Author Contributions
The main contributions that have been presented in this paper are: (1) a new algorithm based on support vector machines for the detection of causal dependencies in data by using a recursive variable selection and model reduction approach; (2) a design approach to combine the different methods into one causal matrix.As each method follows a different mathematical approach, this was done by introducing exponential fitting parameters into each method.The combined causal matrix is in a subsequential step used to generate a root cause priority list to decide which process variable is most likely to be the root cause of the found causal dependencies; (3) an existing method based on the cross-correlation function has been extended by using permutation tests as significance test; and (4) a new visualization method for the representation of causal matrices was developed.This visualization allows a better comparison of the results coming from the different methods, since all causal matrices can be represented in a single graphic.
with µ C CCF π being the mean value and σ C CCF π the standard deviation.If C CCF > C CCF thresh holds, the test has passed successfully.
uy based on the permutated input time series u π [k] and the generation of several values for TE uπy .The values of TE uπy are finally used to calculate the threshold TE thresh uy in terms of a 3σ-test: TE thresh uy := µ TEu π y + 3σ TEu π y ,

Figure 1 .Figure 2 .
Figure 1.Simulated data from the CSTR used for the analysis.

Figure 3 .
Figure 3. Experimental setup of the laboratory plant.

Figure 4 .
Figure 4. Schematic drawing of the laboratory plant.

Figure 5 .Filling
Figure 5. Data of the laboratory plant when having a faulty valve.Sampling rate T s = 2 s.

Figure 6 .
Figure 6.Causal matrices for the laboratory plant.The red squares describe the expected causal dependencies to be detected by the methods.

check if max |ĉ uy [λ]| differs significantly from zero; the second test checks if the causal strength from u → y differs significantly from y → u to have a clear indication of the direction of the propagation path. Significant time-shifted correlation:
Since the available samples of the two time series are limited, Equation (1) only gives an estimation of the CCF, so that max |ĉ uy [λ]| randomly differs from zero for two uncorrelated signals.To test if max |ĉ uy [λ]| differs significantly from zero, a hypothesis test is performed.
1. Compute E U i Y and E U i Y using the model order estimated through AIC; 2. Test for model consistency for both models using the Durbin-Watson statistic; 3. Perform a significance test based on an F -test for

Table 3 .
Root cause priority list calculated from the causal matrix in Figure2.

Table 4 .
Root cause priority list calculated from the causal matrix in Figure6.