Next Article in Journal
The Application of Dielectric Spectroscopy and Biocalorimetry for the Monitoring of Biomass in Immobilized Mammalian Cell Cultures
Next Article in Special Issue
Computer-Aided Framework for the Design of Freeze-Drying Cycles: Optimization of the Operating Conditions of the Primary Drying Stage
Previous Article in Journal
A Combined Feed-Forward/Feed-Back Control System for a QbD-Based Continuous Tablet Manufacturing Process
Previous Article in Special Issue
A Novel ARX-Based Approach for the Steady-State Identification Analysis of Industrial Depropanizer Column Datasets
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

An Algorithm for Finding Process Identification Intervals from Normal Operating Data †

André C. Bittencourt
Alf J. Isaksson
Daniel Peretzki
3 and
Krister Forsman
Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden
ABB AB, Corporate Resarch, SE-721 78 Västerås, Sweden
Hella Fahrzeugkomponenten GmbH, 28199 Bremen, Germany
Perstorp AB, SE-284 80 Perstorp, Sweden
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the Proceedings of the 2011 AIChE Annual Meeting, Minneapolis, MN, USA, 16-21 October 2011.
Processes 2015, 3(2), 357-383;
Submission received: 24 January 2015 / Accepted: 23 April 2015 / Published: 6 May 2015
(This article belongs to the Special Issue Process Control: Current Trends and Future Challenges)


Performing experiments for system identification is often a time-consuming task which may also interfere with the process operation. With memory prices going down and the possibility of cloud storage, years of data is more and more commonly stored (without compression) in a history database. In such stored data, there may already be intervals informative enough for system identification. Therefore, the goal of this project was to find an algorithm that searches and marks intervals suitable for process identification (rather than completely autonomous system identification). For each loop, four stored variables are required: setpoint, manipulated variable, measured process output and mode of the controller. The essential features of the method are the search for excitation of the input and output, followed by the estimation of a Laguerre model combined with a hypothesis test to check that there is a causal relationship between process input and output. The use of Laguerre models is crucial to handle processes with deadtime without explicit delay estimation. The method was tested on three years of data from about 200 control loops. It was able to find all intervals in which known identification experiments were performed as well as many other useful intervals in closed/open loop operation.

1. Introduction

In the process industry, models of varying fidelity are relevant for many different purposes, such as process design, PID controller tuning, multi-variable supervisory control and process diagnostics. Often the task of building a process model is complemented with an identification procedure, where some model parameters are estimated from measured data. Performing dedicated experiments for system identification is, however, often a time-consuming task which may also interfere with the process operation. Nowadays, it is common to store measurement data from the plant operation (without compression) in a history database. Such databases can be a useful source of information about the plant, and might already contain suitable data to perform process identification. Due to the size of such databases, searching for intervals of data suitable for identification is a challenging task. Preferably, this task should be supported by a data scanning algorithm, that automatically searches and marks interesting intervals of data.
Relatively little can be found in the literature that directly addresses this problem. In [1], a data removal criterion is presented that uses the singular value decomposition (SVD) technique for discarding data which are only noise dependent and leads to a larger mean square error (MSE) of the estimated model parameters. Horch introduced in [2] a method for finding transient parts of data after a setpoint change, specifically targetting the identification of time delay [3]. In [4], the authors discuss persistence of excitation for on-line identification of linear models but do not deal with finding intervals of data that are persistently exciting. Data mining techniques have been proposed to give a fully automated modeling and identification, based solely on data. In [5], the authors proposed a method to discover the topology of a chemical reaction network, whereas in [6] a method is proposed to find the dynamical model that generated the data using symbolic regression. In [7], the authors consider the use of historical data to achieve process models for inferential control. Some guidelines are suggested on how to select intervals of data to build models, but no algorithm is proposed with this objective.
More recently the problem was treated in [8], referring to our original conference paper [9]. The developed method uses the condition number of information matrix as a measure of data quality from routine operation. In [10] our method from [9] is studied in detail, in particular with respect to its sensitivity to the design parameters. A study that will become useful in this paper fine tuning the approach further.
Process plants have specific characteristics that make a fully automated modeling and identification a challenging task, see e.g., [11]. This work focuses on models that are suitable for the design/tuning of low order controllers like PI and PID. For this purpose, it is clear from the above considerations that a fully automated identification is challenging. Instead, the objective of this work is to develop a data mining algorithm that retrieves intervals of data from a historic database that are suitable for process identification. The method outputs the intervals together with a quality indicator. The user can then decide on the model and identification method to be used.
Compared to the original conference paper [9], this paper gives a more detailed review of the literature on informative data especially in relation to identifiability in closed-loop. We have also modified the estimation of the Laguerre model to add a forgetting factor and introduced a noise model.

Problem Formulation

Consider the control loop in Figure 1, where at time k the operation mode m(k) ∈ {0, 1} denotes operation in either open-, or closed-loop denoted by the symbols 0 and 1 respectively. In manual mode, the input to the process u(k) is decided by the operator. In automatic mode, u(k) is given by the controller. The controller is driven by the control error e(k), formed from the setpoint r(k) subtracted by the measured process output y(k), which is corrupted by noise v(k). It is considered that the system can be described by an unknown model (θ), which is a function of the parameters θ.
A collection of data ZN =[z(1)T,⋯, z(N)T]T is available, where
z ( k ) = [ m ( k ) r ( k ) u ( k ) y ( k ) ] .
The objective is to find time intervals Δ = [kinit, kend] where the data in ZN may be suitable to perform identification of the process parameters. Remark: if m(k) is not available, the mode is usually possible to infer from the behaviour of r(k) and u(k).
For its practical use, the following requirements are sought:
  • Minimal knowledge about the plant is required. That is, none (or little) input is expected from the user.
  • The resulting algorithm should process the data quickly. For example, a database containing data from a month of a large scale plant operation should not take longer than a few minutes to be processed.
  • For each interval found, a numeric measure of its quality should be given. This can be used by the user in order to select which intervals to use for identification.
In order to achieve both Requirements 1 and 2, some simplifying assumptions are made:
Assumption 1.1 (SISO). Since the purpose is PID tuning it is assumed that only SISO control loops are to be estimated.
Assumption 1.2 (Linear models). It is assumed that the process can be well described by a linear model (θ).
Assumption 1.3 (Monotone step response). In process industry most uncontrolled processes are non-oscillatory, i.e., transfer functions with real valued poles are sufficient.
For a process in operation there are mainly two scenarios to hope for that may result in data informative enough for system identification (see [2]):
  • The process is operating in manual mode and the input signal u(k) is varied enough to be exciting the process.
  • The controller is in automatic and there are enough changes in the setpoint r(k) to make identification possible.
As a consequence, the method developed below is treating these two cases separately.
Notice that there are other scenarios that, at least in theory, would render a system in closed-loop identifiable. A longer discussion on this topic is given in next section.

2. Theoretical Guiding Principles and Preliminaries

To make this paper reasonably self-contained this section gives a brief review of the necessary system identification concepts needed to justify the design choices made. However, for a reader only interested in the final algorithm and its application this section may be omitted. The section is mainly based on [1214].
Let each SISO loop correspond to a linear discrete-time system
S : y ( k ) = G 0 ( q ) u ( k ) + H 0 ( q ) e ( k ) .
where k is the sample index, q is the shift operator (qu(k) = u(k + T ) for the sampling interval T), e(k) is white noise with variance γ0, y(k) is the output and u(k) is the input, which might be given in open-loop or by a stabilizing controller
u ( k ) = K ( q ) ( r ( k ) y ( k ) ) = S 0 ( q ) K ( q ) r ( k ) S 0 ( q ) K ( q ) H 0 ( q ) e ( k ) ,
where S 0 ( q ) ( 1 + K ( q ) G 0 ( q ) ) 1 is the sensitivity function. The transfer functions G0(q), H0(q) and K(q) are rational and proper and H0(q) is minimum phase and normalized such that H0() = 1. A parametric linear model structure (θ) is used to describe the system S as
( θ ) : y ( k ) = G ( q , θ ) u ( k ) + H ( q , θ ) e ( k )
where θ ∈ ℝn is the vector of unknown parameters. The true system S is said to belong to the model set ≜{(θ)|θ ∈ Dθ} for some parameter set Dθ if there is θ′ ∈ Dθ such that ( θ ) = S. The equality here means that G0(z) = G(z, θ′) and H0(z) = H(z, θ′) for almost all z, i.e., the system and model are equivalent from an input-output relationship. The set defined by all θ′ such that S = ( θ ) is denoted D ( S , ) and contains a unique element θ0, or “true pameters”, if the model structure is globally identifiable, the concept of identifiability will be reviewed in Section 2.2.
The optimal one-step ahead predictor for y(k) is given by
y ^ ( k | θ ) = H ( q , θ ) 1 G ( q , θ ) u ( k ) + ( 1 H ( q , θ ) 1 ) y ( k ) .
A sensible approach to find an estimate of the parameter, denoted θ ^, is to minimize the difference prediction errors, ε(k, θ) = y(k) ŷ(k|θ) in some sense. As it will be reviewed in Section 2.3, if S and the data set is informative enough, prediction error methods will give consistent estimates of the parameter, i.e., θ ^ D ( S , ). Informative datasets are discussed in Section 2.2 and recursive prediction error methods are considered in Section 2.3.
One approach to find a model structure such that S is to derive the model from the laws of physics governing the processes and use the parameter vector θ to describe remaining unknown characteristics, e.g., the gain, poles or zeros. Since it would be impractical to consider knowledge of the dynamics for each loop in an entire process plant, we consider that no knowledge is available about the system (later on we will however assume the presence of integrators to be known). Instead, we seek for a flexible model structure that can describe a large set of system dynamics while allowing for an efficient identification procedure. Flexible and tractable black-box models are presented in Section 2.1.
The alternative models are called black-box since they are used to describe relations in the data and do not necessarily have parameters that relate to the original process. Flexible black-box models fit our purpose of finding data suitable for identification of process models. Notice that we are not aiming at directly finding a process model, with a clear physical interpretation or suitable for control purposes.

2.1. Black-Box Modeling with Laguerre Models

Because there is a one to one correspondence for the transfer functions in Equations (4) and (5) we focus the presentation for the predictor form and follow [13]. Assuming that H(q, θ) and G(q, θ) have the same unstable poles, it follows that the optimal predictor can be approximated as [15]
H ( z , θ ) 1 G ( z , θ ) i = 1 n b i z 1 B n ( z )
H ( z , θ ) 1 1 i = 1 n a i z i A n ( z ) .
This approximation corresponds to an ARX model of order n and is exact as n → ∞. Therefore, high order ARX models can be used as flexible black-box models. An additional advantage is that the ARX model structure is linear in the parameters θT = [a1,⋯, an, b1,⋯bn] and its identification in a prediction error sense is very tractable as will be discussed in Section 2.3.
There are a number of shortcomings with the parametrization in Equation (6) though. As noted by Wahlberg in [13], the rate of convergence for the approximation as n → ∞ is determined by the location of the poles of G(z, θ) which are not shared by H(z, θ) and the zeros of H(z, θ); the closer these are to the unit circle, the slower the approximation will converge. For sampled continuous-time systems, the continuous-time poles (and zeros) are mapped as epT and thus for a small sampling interval, T, the approximation will require large n. Similarly, for plants with an unknown delay d, it will require d/T additional coefficients for Bn(q).
An alternative parametrization that has a convergence rate of the approximation independent on T and which is more suitable for plants with a delay is possible with Laguerre polynomials as
H ( z , θ ) 1 G ( z , θ ) i = 1 n b i L i ( z , α ) B ˜ n ( z , α )
H ( z , θ ) 1 1 i = 1 n a i L i ( z , α ) A ˜ n ( z , α )
where Li(q, α) is the ith Laguerre filter
L i ( q , α ) = ( 1 α 2 ) T q α ( 1 α q q α ) i 1 .
As for the ARX model, the approximation is exact when n → ∞. The distances of the Laguerre filter pole, 0 < α < 1, to the poles of the system determine the convergence rate for the approximation [13]. For fast convergence, α can be set as equal to the largest time constant in the system [16] but in case of scattered poles, the convergence may still be slow as discussed in [13]. Optimal choice of the Laguerre pole is discussed in [17].
The model structure in Equation (7) is a generalization of the model in Equation (6), which is retrieved for α = 0. In fact, introducing α ¯ = ( 1 α ) / T and δ = (q − 1)/T, Equation (8) can be rewritten as [13]
δ i 1 ( 1 + δ / α ¯ ) n ,
so the Laguerre model reduces to an ARX model with a prefilter 1 / ( 1 + δ / α ¯ ) n in the δ domain.
The Laguerre filter of order i consists of one low-pass filter cascaded with (i − 1) first-order all pass filters, which acts effectively as a delay approximation of order (i − 1). The substitution of the delay operator in an ARX model with a Laguerre filter has important characteristics to allow for an accurate description and of low order [13]. The maximum delay d ¯ a Laguerre model can explain can be found by comparing a Padé approximation of a delay with the all pass part of the Laguerre filters [18,19] and is given by
d ¯ = 2 ( n 1 ) T / log α .
If the real pole α and order n are selected properly, Laguerre models can efficiently approximate a large class of linear systems [2]. In general, the performance of the identification is relatively insensitive to the choice of α [10,20]. Estimation of α from data is presented in, for example, [21], but this has not been found necessary in this paper.
Because a Laguerre model has a finite gain at low frequencies, it is not a good approximation for plants with an integrator. This is in fact an important limitation of Laguerre models since many processes in process industry (most notably level control) have an integrating behavior. To overcome this we will treat integrating loops separately as described later in Section 3.
Another limitation is that the poles for the Laguerre filter are real and for resonant processes the approximation may take long to convergence. An alternative is to consider the use of complex poles in the expansion with, e.g., Kautz polynominals. See [22] for details.

2.2. Identifiability, Informative Data Sets and Persistence of Excitation

After selection of an adequate model structure, one might ask whether it is possible to unequivocally identify its parameters from the data using a prediction error method. Whether this is possible will depend on properties of both the model structure and of the data set used. Recalling there is a one to one relation between the optimal predictor in Equation (5) and the model in Equation (4), we study this question based on the predictor form. Following [12,23] we introduce the shorthand notation,
y ^ ( k | θ ) = [ H ( q , θ ) 1 G ( q , θ ) ( 1 H ( q , θ ) 1 ) ] W ( q , θ ) [ u ( k ) y ( k ) ] z ( k ) = W ( q , θ ) z ( k ) .
First, it will not be possible to distinguish two different parameters, θ ≠ θ′, if the related linear filters are the same i.e., W (z, θ) = W (z, θ′). Identifiability of a model structure at θ′ ∈ Dθ is thus defined as whether
W ( z , θ ) = W ( z , θ ) , z θ = θ
in a region (locally identifiable) or for any θ (globally identifiable). Notice that if S and the model is globally identifiable D ( S , ) = θ 0. Furthermore, the ARX model structure is strictly globally identifiable, i.e., it is globally identifiable at any θ′ ∈ Dθ.
Even if the model is globally identifiable, the parameters may not be retrievable depending on the data z(k), take for instance z(k) = 0. This leads to a definition of informative datasets, which allows for a distinction of different parameters in the model set. For what follows, we say that a signal ϕ(k) is quasi-stationary if it satisfies
E ϕ ( k ) = m ϕ ( k ) , | m ϕ ( k ) | C , k
E ϕ ( k ) ϕ ( r ) = R ϕ ( k , r ) , | R ϕ ( k , r ) | C
E ¯ R ϕ ( k , k τ ) = R ϕ ( τ ) , τ ,
E ¯ f ( k ) lim N 1 N k = 1 N E f ( k ) .
The expectation in Equation (13) applies only for stochastic parts. For deterministic signals the expectation is dropped and the definition means that ϕ(k) is a well-defined, bounded, sequence.
A quasi-stationary data-sequence { z ( k ) } 1 N is informative enough with respect to a model set if it allows for the distinction between any two models W(z, θ1), W(z, θ2) in the set, i.e.,
E ¯ [ ( W ( z , θ 1 ) W ( z , θ 2 ) ) z ( k ) ] 2 = 0 W ( e ι ω , θ 1 ) W ( e ι ω , θ 2 )
for almost all ω. Notice that this definition depends only on the model set and not on the true system. To illustrate the consequence of this definition to signals, we define
Definition 1 (Persistent excitation). A quasi-stationary regressor ϕ(k) is persistently exciting (PE) if Ē[ϕ(k)ϕ(k)T]>0.
Definition 2 (Richness of a signal). A scalar quasi-stationary signal u(k) is sufficiently rich of order n (SRn) if ϕ(k)T = [u(k − 1),⋯, u(k − n)] is PE.
As discussed, informativeness of a data set depends on the model structure used. We review the conditions on the data for open and closed loop for ARX models, analogous results are possible for the ARX-Laguerre model using the δ representation given in Equation (9) with a prefilter. Let the ARX model in Equation (6) have polynomials A n a ( q ) and B n b ( q ), then the following results apply [23].
Theorem 1 (ARX, open loop). Let the system be operating in open-loop, the data set is informative if and only if u(k) is SRnb.
Therefore, for the open loop case, the input signal must be rich enough. The closed loop case was studied already by Söderström, Gustavsson and Ljung [24]. It turns out that the system may be identifiable even if the setpoint is zero, provided the controller is complex enough. A number of such scenarios are discussed in [12]; nonlinear control, switching linear controllers or fixed linear controller with high enough order. The case of fixed linear controllers is further studied in [14]. Distinguish between two cases, either r(k) 0, i.e., the system is driven solely by the noise, or r(k) is changing. We denote the first situation as a controller operation for disturbance rejection and the second as a servo operation. Let the feedback stabilizing controller given in Equation (3) be given in a coprime factorization K ( z ) = X ( z 1 ) Y ( z 1 ) where the polynomials are X ( z 1 ) = x 0 + x 1 z 1 + + x n x z n x and Y ( z 1 ) = 1 + y 1 z 1 + + y n x z n y. Then the following resuls apply [14]:
Theorem 2 (ARX, servo). Let max(nxna, nynb) < 0, the data are informative for almost all SRnr r(k), if and only if nr ≥ min(nanx, nbny).
Theorem 3 (ARX, disturbance rejection). Let r(k) 0, then the data are informative if and only if max(nxna, nynb) ≥ 0.
In [25], more general results are given for general linear plants with a delay and considers possible pole-zero cancellations. These theorems above give a minimal set of conditions on the inputs to allow for a distinction of the parameters from the data. The result in Theorem 3 indicates that it is possible to find the parameters based on the excitation from the noise if the controller has high enough order compared to the model. For a model structure of high order, and controlled by PID controllers, it is clear that this condition will hardly hold. In this case, the information contained in the reference must complement the simplicity of the controller as given by Theorem 2.
The following are more general results which apply to general linear models. Defining the spectrum of a signal ϕ(k) as
Φ ϕ ( ω ) τ = R ϕ ( τ ) e ι τ ω ,
a signal is said persistently exciting if its spectrum Φϕ(ω) > 0 for almost all ω. The next results follow for any linear plant [12],
Theorem 4 (Open loop). A persistently exciting input u(k) is informative enough for open loop experiments.
Theorem 5 (Closed loop). A persistently exciting reference r(k) is informative enough for closed loop experiments.
So, in general terms, for open-loop (closed-loop) there should be enough changes in the input (reference) to allow for an identification of any linear plant.

2.3. (Recursive) Prediction Error Methods

An important feature of the general predictors given by the approximations in Equations (6) or (7) is that they can be written as linear regressions,
y ^ ( k | θ ) = φ T ( k ) θ ,
where φT (k) is function of past inputs u(k − 1),…, u(1) and outputs y(k − 1),…, y(1). In a linear regression, the parameters θ ∈ ℝn appear as a linear function with the regressors, which allow for a simple identification procedure. A common choice of identification approach is the prediction error method, where the prediction error ε(k, θ) = y(k) ŷ(k|θ) = y(k) φ(k)T θ is minimized according to some criterion. To reduce the storage requirements of the resulting algorithm and to allow for adaptive solutions, recursive methods are preferred, which can be updated in a data stream. Many recursive prediction error methods are possible, see, e.g., [12]. For the presentation here, we consider an exponentially weighted quadratic prediction error criterion, where the estimate is given as
θ ^ k = arg min θ D θ V k ( θ ) = arg min θ D θ i = 1 k λ k 1 ε 2 ( k , θ ) ,
with 0 < λ < 1. Direct minimization gives the solution for this problem in closed form, which can be written recursively as [12],
θ ^ k = θ ^ k 1 + R ¯ ( k ) 1 φ ( k ) ε ( k , θ ^ k 1 )
R ¯ ( k ) = λ R ¯ ( k 1 ) + φ ( k ) φ ( k ) T
V k ( θ ^ k ) = λ V k 1 ( θ ^ k 1 ) + ε ( k , θ ^ k 1 ) ε ( k , θ ^ k ) .
This algorithm is known as the Recursive Least Squares (RLS). The feasibility of Equation (19a) depends on whether the information matrix R ¯ ( k ) is invertible, i.e., the matrix must be full-rank.

2.3.1. Frequency Description of Prediction Error

Some important properties of prediction error methods can be highlighted with a description in the frequency domain. To illustrate this, we replace the exponential weighting function λk−i in Equation (18) by the constant weight 1 2 N, where N is the number of data samples, and let N → ∞. The limiting cost is denoted V ¯ ( θ ) and it follows from the definition of the Fourier transform and Equation (16) that
V ¯ ( θ ) lim N 1 2 N k = 1 N ε 2 ( k , θ ) = E ¯ 1 2 ε 2 ( k , θ ) = 1 4 π π π Φ ε ( ω , θ ) d ω .
Using the shorthand notation G0, H0 to describe the system’s transfer functions in Equation (2) and Gθ, Hθ to describe the transfer functions of the model in Equation (4), it follows, see [12], that the residuals’ spectrum can be written as
Φ ε ( ω , θ ) = | G 0 + B θ G θ | 2 | H θ | 2 Φ u + | H 0 H θ | 2 | H θ | 2 ( γ 0 | Φ u e | 2 Φ u ) + γ 0 ,
B θ = ( H 0 H θ ) Φ u e Φ u ,
where the inputs were dropped for compactness and Φue denotes the cross-spectrum of u and e, defined in an analogy to Equation (13). The expression allows us to study the effects of the model chosen and excitation to the cost function and thus to the estimated parameters. Notice for instance that if S , θ′ minimizes the criterion. Furthermore, Gθ will be pushed towards (G0 + Bθ), and for that reason Bθ will introduce a bias to the estimate of G0.
For open-loop experiments, the input u is uncorrelated with the noise e so Φue 0 and so the bias term Bθ 0. The input spectra ϕu will thus control the frequencies where | G θ ^ G 0 | 2 is minimized. For instance, some times the noise model is unimportant and thus fixed to a constant, i.e., H(q, θ) = H*(q), this was used in our previous paper [9]. In this case we have
arg min θ V ¯ ( θ ) = arg min θ π π | G 0 G θ | 2 Φ u | H * | 2 d ω ,
and the minimizing argument is such that the distance |Gθ−G0|2 is decreased according to the frequency weight given by the ratio of the input spectrum Φu and the noise model chosen |H*|2. Furthermore, the relevance of a persistently exciting input becomes clear in the expression.
For the closed-loop case, u will be correlated to e leading to non-zero Φue and thus the bias term Bθ should be addressed by adjusting the noise model Hθ to H0. Splitting the input spectrum into the parts originating from the reference r and noise e as Φ u ( ω ) = Φ u r ( ω ) + Φ u e ( ω ), if the input can be written as u(k) = K1(q)r(k) + K2(q)e(k), then | Φ u e ( ω ) | 2 = γ 0 Φ u e ( ω ) and
arg min θ V ¯ ( θ ) = arg min θ π π | G 0 + B θ G θ | 2 | H θ | 2 Φ u + γ 0 | H 0 H θ | 2 | H θ | 2 Φ u r Φ u d ω ,
and the bias contribution follows as
| B θ | 2 = | H 0 H θ | 2 γ 0 Φ u Φ u e Φ u .
The equation shows that fixing the noise model for a system operating in closed loop will lead to a biased estimate of G0. Furthermore, for a controller in disturbance rejection mode, with Φ u r 0, the bias can only be controlled by the ratio of the noise energy and the contribution of the noise to the input energy due to feedback, i.e., γ 0 / ϕ u e. So to reduce the bias, the influence of the noise to the input should increase, which contradicts many control objectives. Reducing the bias for a system operating in disturbance rejection mode is therefore difficult. In servo mode, the reference r can reduce |H0Hθ|2 through the ratio Φ u r / Φ u, thus reducing the bias term.

2.3.2. Asymptotic Properties of the RLS

Assume that the system in Equation (2) is in the model set, i.e., S , and that the data are informative enough then as λ → 1 and k,
1 λ ( θ ^ k θ 0 ) A s N ( 0 , P θ ) , P θ 1 2 γ 0 R ¯ 1 , R ¯ E ¯ [ φ ( k ) φ ( k ) T ]
so the asymptotic estimate is consistent and is distributed according to a Gaussian distribution. The covariance matrix is given by
θ E [ ( θ ^ k θ 0 ) ( θ ^ k θ 0 ) T ] ~ ( 1 λ ) 2 γ 0 R ¯ 1 ,
which is controlled by R ¯. In order to make the covariance small, R ¯ should be made large in some sense. This idea is explored, for instance, in experiment design (choosing u) for system identification. The asymptotic result in Equation (24) will provide useful guidance when defining our algorithm in the next section.
For finite sample, we can replace R ¯ and γ0 with their sample estimates, given by
γ ^ k = ( 1 λ ) V k ( θ ^ k ) , R ¯ ^ k = ( 1 λ ) R ¯ ( k ) ,
and so the sample covariance matrix is given by
Σ ^ k = ( 1 λ ) 2 γ ^ k R ¯ k 1 = ( 1 λ ) 2 V k ( θ ^ k ) R ¯ ( k ) 1 .

3. User Choices, Data Features

Based on the theoretical guiding principles outlined, we describe and motivate the choices leading to the proposed algorithm. The main ingredients are the choice of a flexible black-box model structure, followed by the definition of data features and test to indicate and quantify the quality of the data to retrieve a linear model. Through the section, choices are made taking into consideration reduced complexity, implementation aspects and tuning.
The algorithm is outlined in Section 4 and for a recursive implementation, based on a data stream. Two basic pre-processing steps are required. First, the signals are considered to be normalized (scaled) between 0 and 1. This is important to simplify the tuning and because some of the features are sensitive to scaling. An adequate scaling is possible if the range for the variables is available, which is normally the case in the process industry. Furthermore, we remove the first sample, of the data stream, from r(k), u(k) and y(k) since we are working with linear models. To avoid confusion, the transformed variables are denoted with a ·′ symbol whenever important. Four tests of increasing complexity are used which are checked in cascade and notation which is relevant to the implementation will be introduced through the section.

3.1. Choice of Model Structure

While by Assumption 1.2 it is considered that the loops in the plant can be described by a linear model as given in Equation (2), it would be too restricting to consider available knowledge of a model structure (θ) for each SISO loop in an entire plant. Since our objective with the algorithm is not to find a process model but rather to find segments of data from where a process model can be retrieved, it is natural to avoid including knowledge of the models for each loop and to instead use flexible black-box models when searching for data. From the discussion presented in Section 2.1, flexible and tractable (i.e., allowing for a representation as a linear regression given by Equation (17)) linear models can be achieved with high-order ARX as in Equation (6) and high order ARX-Laguerre models as in Equation (7).
As discussed, the Laguerre representation presents important advantages, it will often require lower order models for the same quality of the approximation and it is more suitable for processes with a delay since the delay operator is replaced by the Laguerre filters, acting as a delay approximation. The later feature is critical in the process industry as many loops contain nonzero delays. A Laguerre expansion is thus used to model the input relation, i.e.,
H ( z , θ ) 1 G ( z , θ ) i = 1 n b b i L i ( z , α ) .
For the noise model, we do not expect any delays. Since the use of a Laguerre model requires the additional computational complexity for the filtering operations, we choose a noise model based on the expansion on lags as in Equation (6), i.e.,
H ( z , θ ) 1 1 i = 1 n a a i z i .
The predictor can be written as a linear regression given by Equation (17) with
φ ( k ) T = [ u ˜ 1 ( k ) u ˜ n b ( k ) y ( k 1 ) y ( k n a ) ]
θ T = [ b 1 b n b a 1 a n a ] = [ ( θ b ) T ( θ a ) T ] ,
where ûi(k) ≜ Li(q, α)u(k). The resulting model presents the key characteristics
  • for an adequate choice of model orders nb, na, it is a flexible representation of any linear system,
  • for an adequate choice of α lower orders nb are possible compared to high-order ARX because of the use the Laguerre polynomials to describe the input relations,
  • it can be written as a linear regression which allows for an efficient identification, using, e.g., the RLS algorithm.
These features are instrumental for the method proposed in this paper.

3.1.1. Tuning

The use of the Laguerre representation requires the choice of the model order n, and the Laguerre filter pole, α. As discussed in Section 2.1, α will control the convergence rate of the approximation and for a fast convergence one can set equal to the largest time constant present. Once α is chosen, the maximum delay d ¯ the Laguerre filters can approximate is given by Equation (10). With knowledge of the maximum delay possible in the plant or an upper bound for it, the model order nb can be chosen as
n b d ¯ log α 2 + 1 ,
where ⌈·⌉ denotes the ceiling operation.

3.1.2. Plants with an Integrator

To overcome the limitation of a Laguerre model having a finite gain at zero frequency, it is assumed known if an integrator is present in each loop. The input is then integrated as
u ¯ ( k ) = u ( k ) ( 1 q 1 ) ,
and ū(k), y′(k) are used in the identification instead. As found in [10], since the main dynamics for integrating plants relates to the integrators themselves, the pole of the Laguerre filters α for integrating plants can be made considerably smaller (faster) than for non-integrating plants. The Laguerre pole for integrating plants is denoted here as αI. The same model order n for non-integrating plants can be used or a specific choice nI can be made for integrating plants, e.g., using Equation (29) with αI.

3.2. Choice of Operational Modes to Consider

Based on the discussion on informative datasets in Section 2.2, there are three main modes of operation of a plant, open-loop and closed-loop in servo or disturbance rejection operation. For operation in disturbance rejection mode, with r(k) 0, even if the condition on controller order in Theorem 3 is fullfilled, it only guarantees that the information matrix is full rank while its numerical conditioning might be poor. The identification in disturbance rejection mode is in general challenging as discussed in Section 2.3.1. To further illustrate this, consider the frequency representation of Equation (23), where the bias term is controlled by γ 0 / Φ u e when r(k) 0, i.e., Φ u r 0. For a closed-loop system given by Equation (3), Φ u r = | S 0 | 2 | K | 2 | G 0 | 2 γ 0 and thus the bias can be reduced by increasing the sensitivity function gain. This directly contradicts with the control objective of disturbance rejection, where the sensitivity function should be made small. This is also the practical experience by the authors that, even if first-order models are identified, closed loop identification with a constant setpoint very rarely leads to useful models. See [8,26] for simulation examples where the disturbance rejection case is considered for process identification.
As already presented in the Problem Formulation this has influenced us to only look for intervals where the input r (or u for the open-loop case) is in fact moving. We consider theoretically based tests in Sections 3.4 and 3.5. Before proceeding though, we develop two heuristic tests that will help to reduce the overall evaluation time for the algorithm.

3.3. Simple Heuristic Tests

Process plants often operate in steady-state. Except for the disturbance rejection operation, which is not considered further, there must be some change in the inputs u or r in open- or closed-loop respectively if any process model is to be retrieved. After a change in the input, it is also expected that the output will be varying after some time, only if these two events are observed can one expect to be able to retrieve a process model from the input-output data. Checking for changes in the input and for the variability of the output are computationally cheap and can thus be useful to reduce the amount of computations required to scan large datasets. We discuss two approaches of how this can be achieved.

3.3.1. Input Step Change Test

The operation of process plants with the changes of setpoints and eventual open-loop operation is often done through step changes. This is in fact the case for the evaluation example presented in Section 5. For ease of notation, we define
ζ ( k ) ¬ m ( k ) u ( k ) + m ( k ) r ( k )
so that ζ(k)=u′(k) in case of an open-loop operation and ζ(k)=r′(k) in case of closed-loop operation. A step change of size η1 can be detected with a threshold check
| ζ ( k ) | > η 1 ,
for a change larger than η1. The threshold η1 should be chosen very small as this test is only used to avoid scanning for steady-state data.

3.3.2. Variability of y Test

Once a change in the input has been detected, one can check for reasonably sized variations of y before applying more computationally demanding tests. To reduce memory requirements and to allow for adaptability, the variance of y can be computed recursively with an exponential moving average. Denoting µy(k) as an estimate for the mean and γy(k) as an estimate for the variance, a recursive estimate can be found by [27]
μ y ( k ) = λ μ y ( k ) + ( 1 λ μ ) μ y ( k 1 )
γ y ( k ) = 2 λ μ 2 ( λ y ( y ( k ) μ y ( k ) ) 2 + ( 1 λ λ ) γ y ( k 1 ) )
which is tested with a threshold check
γ y ( k ) > η 2
where 0 < λµ, λγ < 1 are the forgetting factors, controlling the effective size of the averaging window for the estimation of the mean and variance of y respectively. Since y′ is normalized between 0 and 1, the threshold η2 can be chosen relative to its range, e.g., η2 = 0.012 means a standard deviation of 1% relative to the output range.

3.4. Finding Excitation in the Data

More theoretically based criteria are possible based on the conditions for informative datasets of Theorems 1 and 2. Checking whether the inputs are sufficiently rich is however impractical since the definition of persistent excitation is based on an ensemble of potential observations, with the Ē operator given in Equation (14). Furthermore, many signals used in practice, during the operation of the plant, are of low SRn. As an example, consider a unit step (Heaviside step) change u(k) = H(k), with H(k) = {1 if k ≥ 0, 0 otherwise}, let ϕ(k) = [H(k − 1),, H(k − n)] as in Definition 2 and note that the rank of
E ¯ [ ϕ ( k ) ϕ ( k ) T ] = lim N 1 N [ N N 1 N n N 1 N 2 N n N n N n N n ] = [ 1 0 0 0 ]
is 1 for any n and a step is thus SR1. This is true even though the matrix is full rank for finite N and it would be possible to compute an estimate. Instead of attempting to check for persistence of excitation, we check whether it is possible to compute an estimate of the parameters.
Considering an estimate based on the recursive least squares in Equation (19), note that the paremeter estimate in Equation (19a) corresponds to solving the linear system
R ¯ ( k ) Δ θ = φ ( k ) ε ( k , θ ^ k 1 )
and updating the estimate as θ ^ k = Δ θ + θ ^ k 1. A necessary condition for the solution is that the information matrix R ¯ ( k ) is full-rank. A rank test, however, gives little information about the numerical condition of R ¯ ( k ). If R ¯ ( k ) is ill-conditioned small perturbations to the right-hand side of Equation (35) (or to R ¯ ( k )) can lead to large errors to the estimate. A more suitable test is to check for its condition number which is described next.

Numerical Conditioning of R ¯ ( k ) Test

Numerical conditioning of a linear system of equation Ax = b can be assessed as follows. Introduce a perturbed linear system A(x + Δx) = (b + Δb), then it follows that (see e.g., [28]),
Δ x p x p κ p ( A ) Δ b p b p , κ p ( A ) A p A 1 p ,
where κp(A) 1 is the condition number of matrix A. Small values of
κ p ( k ) κ p ( R ¯ ( k ) )
are thus desirable in order to improve the reliability of the solution to numerical errors. Any p-norm can be used, the commonly chosen 2-norm gives,
κ 2 ( A ) = σ max ( A ) σ min ( A ) ,
with σmax(A) and σmin(A) denoting the largest respectively smallest singular values of A.
Because σmin can be close to zero, κ(k) can vary between [1,) and it might be difficult to set a threshold for it. An alternative is to monitor its inverse, also known as the reciprocal condition number as it will have values in [0, 1]. Notice that when using the reciprocal condition number, high values relate to good conditioning. The reciprocal condition number is tested with a threshold
κ 2 1 ( k ) > η 3 .
The condition number has been suggested as measure of data excitation already in [9] and is commonly used to monitor the reliability of the solution of a linear system, see e.g., [18]. In [8], the authors note that the condition number for step changes will often be large, which can be seen from the finite sample matrix in Equation (34). They suggest a choice of threshold in the order of 104, i.e., 10−4 when using its reciprocal. We note that increasing the model orders may increase the number of small eigenvalues in R ¯ ( k ) due to over-fitting and thus η3 may be chosen proportional to the number of estimated parameters.

3.5. Granger Causality Test

Notice, however, that the condition number is scaling independent. Hence even if the the condition number is low it says nothing about the magnitude of the excitation compared to the noise. Therefore, it is necessary to also check there there is in fact a noticeable correlation between input and output. A more conclusive test would be to verify whether there is a causal relation between the input u and the output y. If there is a casual relation between the variables, it is reasonable that y(t) can be predicted better making use of past input-output data, { u ( k ) , y ( k ) } 1 t 1 than it would be predicted by using only { y ( k ) } 1 t 1. This relates to the concept of Granger causality, see [29] for example.
For the black-box model used here given in regressor form by Equation (28), if u(k) Granger-causes y(k), it must be that some of the parameters relating to the input-output relation is non-zero, i.e., θ ^ b 0. We can thus define a null hypothesis 0 : θb = 0 which should be rejected in case there is a Granger-causal relation present. A statistical test for this hypothesis is readily available from the asymptotic results for the estimate θ ^ given by Equation (24) since under the null hypothesis 0,
( θ ^ k b ) T ( θ b ) 1 θ ^ k b A s χ n b ,
where (Σb)−1 is the inverse of the sub matrix of Σθ associated to θb only and χd is the chi square distribution with d degrees of freedom. The statistics can be computed based on the finite sample estimates in Equation (26), giving
s ( k ) ( θ ^ k b ) T ( Σ ^ k b ) 1 θ ^ k b
The null hypothesis is rejected if
s ( k ) > η 4 , η 4 arg inf η { η : η χ n b ( x ) d x > p }
where the threshold η4 is taken as the p quantile of χ n b. This condition must be satisfied in our algorithm to assign a data interval as useful.
The statistic s(k) can also be seen as a measure of data quality since the larger it is, the more evidence there is in the data of a causal relation between u and y. This quantity can therefore be used to rank the quality of the various intervals that are found, larger meaning better.
As discussed in Section 3.4 however, the standard implementation of the RLS filter given in Equation (19) requires inversion of R ¯ ( k ) which may lead to numerical problems. The problem of ill conditioning can be reduced with an alternative implementation choice based on the QR factorization, which gives better numerical conditioning of the solution. This is described next.

Choice of Implementation of RLS Based on the QR Factorization

For k data points, the recursive least squares with a forgetting factor λ can be written as
θ ^ k = arg min θ V k ( θ ) = arg min θ Λ ( k ) 1 / 2 ( Y ( k ) ϕ ( k ) θ ) 2 2
where Λ(k) is a diagonal matrix with [Λ(k)]ii = λki and
Y ( k ) T = [ y ( 1 ) y ( k ) ] , ϕ ( k ) T = [ φ ( 1 ) φ ( k ) ] .
Because the norm in the minimization is not affected by an orthonormal transformation Q such that QQT = I, find the QR factorization to
[ Λ ( k ) 1 / 2 ϕ ( k ) Λ ( k ) 1 / 2 Y ( k ) ] = Q ( k ) R ( k ) ,
where Q(k) is orthonormal and R(k) is a matrix of the form
R ( k ) = [ R 0 ( k ) 0 ] , R 0 ( k ) = [ R 1 ( k ) R 2 ( k ) 0 R 3 ( k ) ] ,
where the matrix R0(k) is square upper triangular of dimension n + 1 and R3(k) is a scalar. Applying the orthonormal transformation Q(k)T from the left in Equation (42), then gives
Λ k ( θ ) = Λ ( k ) 1 / 2 ( Y ( k ) ϕ ( k ) θ ) 2 2 = Q T Λ ( k ) 1 / 2 ( Y ( k ) ϕ ( k ) θ ) 2 2
Λ k ( θ ) = [ R 2 ( k ) R 3 ( k ) ] [ R 1 ( k ) θ 0 ] 2 2 = R 2 ( k ) R 1 ( k ) θ 2 2 + | R 3 ( k ) | 2
Hence, the estimate θ ^ k that minimizes Vk(θ) is given by the solution to
R 1 ( k ) θ ^ k = R 2 ( k ) .
By taking the SVD of R1(k) and comparing the singular values of R1(k) and the ones of R ¯ k = ϕ ( k ) T Λ ( k ) ϕ ( k ) = R 1 ( k ) T R 1 ( k ) gives
κ 2 ( R ¯ ^ k ) = κ 2 ( R 1 ( k ) ) 2 ,
and therefore solving for θ ^ k in Equation (48) is better numerically conditioned than the standard problem in Equation (35), it is also simpler since R1(k) is upper triangular. Sample estimates for γ ^ k, R ¯ ^ k and Σ ^ k as in Equation (26) are given by
γ ^ k = ( 1 λ ) V k ( θ ^ k ) = ( 1 λ ) R 3 ( k ) 2 ,
R ¯ ^ k = ( 1 λ ) ϕ ( k ) T Λ ( k ) ϕ ( k ) = ( 1 λ ) R 1 ( k ) T R 1 ( k )
Σ ^ k = ( 1 λ ) 2 γ ^ k R ¯ ^ k 1 = ( 1 λ ) 2 R 3 ( k ) 2 [ R 1 ( k ) T R 1 ( k ) ] 1
The alternative estimates of θ ^ k b and ( k b ) 1 given in Equations (48) and (50) can be used to compute the chi-square statistic and the test simplifies to (see the Appendix for a proof)
s ( k ) = 2 ( 1 λ ) R 3 2 ( k ) ( θ ˜ k b ) T P k b θ ˜ k b > η 4
θ ˜ k b R 2 b ( k ) R 1 a b ( k ) ( R 1 a ( k ) ) 1 R 2 a ( k ) ,
P k b I R 1 a b ( k ) ( R 1 a b ( k ) T R 1 a b ( k ) + R 1 a ( k ) T R 1 a ( k ) ) 1 R 1 a b ( k ) T .
The forgetting factor λ affects the effective sample size, N ef = 1 1 λ, used in the identification. For a consistent identification, it should be such that it is larger than the number of parameters in the model Nefn = na + nb.
The outlined QR based solution is however not sequential in the data. To find a sequential solution, notice that Vk(θ) can be written as
Λ k ( θ ) = [ λ 1 / 2 Λ ( k 1 ) 1 / 2 Y ( k 1 ) y ( k ) ] [ λ 1 / 2 Λ ( k 1 ) 1 / 2 ϕ ( k 1 ) φ ( k ) ] θ 2 2 ,
assume Q(k − 1) and R(k − 1) are available, then by applying the orthonormal transformation
Q ¯ ( k 1 ) = [ Q ( k 1 ) 0 0 1 ]
to Vk(θ) gives
V k ( θ ) = [ λ 1 / 2 R 2 ( k 1 ) λ 1 / 2 R 3 ( k 1 ) y ( k ) ] [ λ 1 / 2 R 1 ( k 1 ) 0 φ ( k ) ] θ 2 2 .
And thus applying the QR factorization
[ λ 1 / 2 R 1 ( k 1 ) λ 1 / 2 R 2 ( k 1 ) 0 λ 1 / 2 R 3 ( k 1 ) φ ( k ) y ( k ) ] = [ λ 1 / 2 R 0 ( k 1 ) φ ( k ) y ( k ) ] = Q ( k ) R ( k )
gives R 1 ( k ) θ ^ k = R 2 ( k ) as the solution. Notice here that the left hand side of Equation (55) has size (n + 2) × (n + 1) for all k and the problem does not increase in size. This last step outlines the QR-RLS algorithm, described in Algorithm 1 with an initialization as R0(0) = R0. The matrix R0 can be chosen as diagonal to ensure well-posedness and with small values to reflect the large uncertainties present during initialization, recall the expression for the covariance of the parameters given in Equation (50c).
Algorithm 1. QR-RLS algorithm
Algorithm 1. QR-RLS algorithm
Set R0(0) = R0.
for k do
  Compute [ λ 1 / 2 R 0 ( k 1 ) φ ( k ) y ( k ) ] = Q ( k ) R ( k ),
  Solve R 1 ( k ) θ ^ k = R 2 ( k ), compute s(k) as in Equation (51). end for
end for

4. Outline of the Algorithm

At this point, we define our method to search for suitable data to perform process identification. We outline a recursive implementation based on a data stream starting at sample k1 and finishing at { z ( k ) } k 1 N, where z(k) is given in Equation (1).
Before checking for the tests suggested in the previous section, we search for a long enough interval where the operation mode m(k) was the same, we require it to be at least longer than a multiple of the number of parameters. Since m(k) ∈ {0, 1}, this can be checked recursively in the data by
T 0 : c ( k ) ( i = max ( k n 0 , k 1 ) k m ( i ) n 0 ) ( i = max ( k n 0 , k 1 ) k ¬ m ( i ) n 0 ) ,
where n0 is the minimal length of data under the same mode to be considered and can be adjusted to avoid too short intervals.
The four tests presented in the previous section are listed again and we introduce some notation
T 1 : | ζ ( k ) | > η 1 , T 2 : γ y ( k ) > η 2 ,
T 3 : κ 2 1 ( k ) > η 3 , T 4 : s ( k ) > η 4 .
The tests T1:4 have increasing orders of complexity and demand on the data quality for identification. While there is no causal relation between the tests, it is reasonable to consider that it is only worth the effort of computing more complex tests once the simpler tests have been triggered. Denoting the first sample a test is triggered by ki, a test Ti is only checked from ki−1 to the current sample k, and thus
k i inf { k C i : T i } , i { 1 , 2 , 3 , 4 } ,
k 0 inf { k C 0 : T 0 } n 0 ,
C i { k i 1 , , k } , i { 0 , 1 , 2 , 3 , 4 } .
When a test Ti−1 is triggered, initialization of the next test Ti is taken from k1n1, i.e., n1 ≤ n0 samples before the step change, to the current sample k,
I i { k 1 n 1 , , k i 1 } , i { 2 , 3 , 4 } ,
notice that T0 and T1 do not require initialization.
After initialization, the tests are updated recursively. Test T2 is update as in Equation (33c), Test T3 is given by Equation (38) with R ¯ ( k ) updated as in the standard RLS solution given in Equation (19b) due to its simplicity and test T4 is updated as in Equation (51) using the QR-RLS implementation given by Algorithm 1.
The algorithm exits whenever one of the following situations occurs
  • E0: a change of operating mode happened,
  • E1: T1 is not triggered,
  • E2:4: T2:4 returns to false after reaching a true value for some k in C 2 : 4 respectively,
  • E5: the data stream reaches the end.
The first sample where any of these conditions are fulfilled is denoted ke. Once the algorithm exits, it is called again with k1 = ke + 1 until E5 is achieved, i.e., all data is processed in the stream.
An interval is marked as good only if the chi-square test, T4, is triggered at some point. In this case, we consider the interval Δ = [k1−n1, ke] as possibly suitable for identification. The interval Δ is returned to the user, together with the largest value for s(k),
s ¯ sup { s ( k ) C 4 } ,
which is used to rank the different intervals found.
Figure 2 gives an overview for the execution flow of the algorithm. It highlights the algorithm’s logical sequence, and shows how every required quantity is initialized and updated sequentially, pointing out the related equations.

5. Illustration Based on Real Data from an Entire Process Plant

To test the developed method, historic data collected from a chemical plant was used. It contains data from 195 control loops of density (7), flow (58), level (54), pressure (26) and temperature (50) types. The loops have considerably different dynamics but most of them can be modeled as a first order model with delay. It is also not hard to tell beforehand which loops will have an integrator. The delays can vary up to 10 min. The data are mainly from closed loop operation, but there are also open loop data. The database contains data of over three years of operation (37 months), which have been stored at a sampling rate of four samples per minute, in a total of almost 1.15 G samples and are stored in 6.7 G bytes.
There are a total of nine parameters and four thresholds to choose. Following the guidelines presented through the paper, they are chosen with values given in Table 1. The proposed algorithm is implemented in Matlab and the entire dataset is searched for segments of data. Running in a standard desktop computer, the total run time was 168 min and the evaluation time, i.e., without considering the time to load data files and data pre-processing, was 100 min. Table 2 gives a summary for the intervals that have been found. A large number of intervals are found for the flow loops in closed-loop since their reference signals were changed often during operation. In total, about 1.45% of the total original samples were found by the algorithm. This is a significant reduction in the amount of the data one would have to manually screen. The associated quality measure s ¯ can support the selection of the intervals for further processing.
To verify the relevance of each test T0:4 in the proposed algorithm, we count the number of times a certain test was the deepest to be triggered before an exit condition. We also present the counts for the exit conditions Ei. The results are shown in Table 3 for open and closed loop. The algorithm is well balanced since every step affects the behavior of the algorithm in selecting and rejecting intervals. The large number for T4 is due to the large number of intervals retrieved for flow loops, recall Table 2. The operation of the plant is mainly carried out in closed-loop, for which case most loops operate in steady-state. This is reflected by the large count for E1, meaning that no significant step change was found for those intervals. The simple test T1 can thus considerably reduce the number of computations needed to scan the data.

5.1. Selected Examples

The three year database contains stretches of data where we know that the control group at Perstorp conducted identification experiments, performed in manual mode with a sequence of steps in u(k). All of these intervals and many others were found with the proposed algorithm. The models built using the data intervals selected by the new method were found very similar to those obtained during the identification experiments. The related s ¯ were also consistently large. The first column in Figure 3 presents one of such intervals retrieved by the algorithm for a level control loop. As can be seen, all quantities respond well to a change in excitation and the retrieved data behaves nicely, with no evident presence of a disturbance.
The second column in Figure 3 presents an interval retrieved from a density loop operating under feedback. As can be noticed, there is a large delay for this plant. A step change is detected at k1 and, compared to the previous example in Figure 3c, significant changes for the variance of y takes considerably longer time to be detected as seen in Figure 3d. Despite the large delay, the use of a Laguerre expansion can capture the excitation in the data and is well adjusted to the first changes in y. The Granger causality test also indicates the presence of excitation and the algorithm exits when its value returns below the threshold.
To illustrate the importance of the Granger causality test, we present an example where the interval was rejected because it did not pass this test. The data, seen in Figure 4a, was collected from the open loop operation of a temperature loop. As can be seen, there are almost no changes in u but y is varying considerably due to the presence of an external disturbance. A small change in the input activates the algorithm and because of the disturbances, both the test for the variance of y and the numerical conditioning of R ¯ ( k ) exceed the thresholds. However, the Granger causality test remains below the threshold and the interval is not accepted as useful. This example illustrates that the condition number alone is not sufficient to determine the quality of the data.

6. Conclusions

From the quite extensive testing it seems that the method developed in this paper can successfully find intervals of data relevant for system identification. It requires minimal knowledge of the control loops, namely, whether the process is integrating or not. It is implemented efficiently in a recursive manner. A day’s worth of data, from all loops in the test plant, takes less than 10s to be processed.
The developed method is based on classical results from identification theory of linear systems. As an initial screening, it checks that input and output signals are varying at all. Then it forms an information matrix and checks its condition number. Finally, it estimates a provisional model using a Laguerre model structure. If the parameter estimates of this provisional model are found to be statistically non-zero the data interval is marked as potentially useful for system identification and a quality measure is also provided.


A. Proof of Equation (51)

For ease of notation we drop the argument so
R 1 = [ R 1 b R 1 a b 0 R 1 a ] , R 2 = [ R 2 b R 2 a ] , Σ ^ = c ( R 1 T R 1 ) 1 , c ( 1 λ ) 2 R 3 2 .
Taking the inverse for Σ ^ using the block matrix inversion lemma and evaluating for θb,
Σ ^ 1 = 1 c [ ( ( R 1 b ) T ( I R 1 a b ( ( R 1 a b ) T R 1 a b + ( R 1 a ) T R 1 a ) 1 ( R 1 a b ) T ) R 1 b ) 1 * * * ]
R 1 1 = [ ( R 1 b ) 1 ( R 1 b ) 1 R 1 a b ( R 1 a ) 1 0 ( R 1 a ) 1 ] , θ ^ b = ( R 1 a ) 1 ( R 2 b R 1 a b ( R 1 a ) 1 R 2 a )
and the upper left block in Equation (A2) corresponds to Σb, so
θ ^ b = ( R 1 b ) 1 θ ˜ b , ( b ) 1 = 1 c ( R 1 b ) T P b R 1 b
and finally,
s ( k ) = ( θ ^ b ) T ( b ) 1 θ ^ b = ( θ ^ b ) T P b θ ˜ b .

Author Contributions

The idea for this work was conceived in discussions between Krister Forsman and Alf Isaksson. The original version of the method and Matlab code was then developed in a Master’s thesis work at Linköping University by Daniel Peretzki (with supervision of the other three authors), and edited by André Bittencourt and Alf Isaksson to become [9], which was presented at AIChE by Krister Forsman. All data for this work was supplied by Perstorp AB via Krister Forsman, who also provided a lot of insight into how a method like this could be used in industry. For the current paper a proper specification of the method was developed by André, Alf and Krister after which the Matlab code was re-written from scratch by André who also created all tables and figures. André also wrote the main body of the text when the AIChE paper was extensively revised and extended by him and Alf.

Conflicts of Interest

The authors declare no conflict of interest.


This work was supported by the Vinnova Industry Excellence Center LINK-SIC at Linköping University and by the Swedish Foundation for Strategic Research (SSF) as part of the Process Industry Centre Linköping (PIC-LI). The authors would also like to thank Lennart Ljung for encouraging and inspirational meetings early in the project, and for continued support during this work. We also gratefully acknowledge more recent discussions with Bo Wahlberg, Fredrik Gustafsson and Håkan Hjalmarsson.


  1. Carrette, P.; Bastin, G.; Genin, Y.; Gevers, M. Discarding data may help in system identification. IEEE Trans. Signal Process. 1996, 44, 2300–2310. [Google Scholar]
  2. Horch, A. Condition Monitoring of Control Loops. Ph.D. Thesis, Department of Signals Sensors and Systems, The Royal Institute of Technology (KTH), Stockholm, Sweden, 2000. [Google Scholar]
  3. Isaksson, A.; Horch, A.; Dumont, G. Event-triggered deadtime estimation from closed-loop data, Proceedings of the American Control Conference, Arlington, VA, USA, 25–27 June 2001.
  4. Green, M.; Moore, J. Persistence of excitation in linear systems. Syst. Control Lett. 1986, 7, 351–360. [Google Scholar]
  5. Cho, Y.J.; Ramakrishnan, N.; Cao, Y. Reconstructing chemical reaction networks: Data mining meets system identification, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’08), Las Vegas, NV, USA, 24–27 August 2008.
  6. Schmidt, M.D.; Lipson, H. Data-mining dynamical systems: Automated symbolic system identification for exploratory analysis. ASME Conf. Proc. 2008, 2008, 643–649. [Google Scholar]
  7. Amirthalingam, R.; Sung, S.W.; Lee, J.H. Two-step procedure for data-based modeling for inferential control applications. AIChE J 2000, 46, 1974–1988. [Google Scholar]
  8. Shardt, Y.A.; Huang, B. Data quality assessment of routine operating data for process identification. Comput. Chem. Eng. 2013, 55, 19–27. [Google Scholar]
  9. Peretzki, D.; Isaksson, A.J.; Bittencourt, A.C.; Forsman, K. Data mining of historic fata for process identification, Proceedings of the 2011 AIChE Annual Meeting, Minneapolis, MN, USA, 16–21 October 2011.
  10. Shardt, Y.A.; Shah, S.L. Segmentation methods for model identification from historical process data, Proceedings of the 14th IFAC World Congress, Beijing, China, 5 July 1999; 19, pp. 2836–2841.
  11. Ng, Y.S.; Srinivasan, R. Data mining for the chemical process industry. In Encyclopedia of Data Warehousing and Mining; Wang, J., Ed.; IGI Global: Hershey, PA, USA, 2009; pp. 458–464. [Google Scholar]
  12. Ljung, L. System Identification: Theory for the User, 2nd ed; Prentice-Hall: Englewood Cliffs, NJ, USA, 1998. [Google Scholar]
  13. Wahlberg, B. System identification using Laguerre models. IEEE Trans. Autom. Control. 1991, 36, 551–562. [Google Scholar]
  14. Gevers, M.; Bazanella, A.; Bombois, X.; Miskovic, L. Identification and the information matrix: How to get just sufficiently rich? IEEE Trans. Autom. Control. 2009, 54, 2828–2840. [Google Scholar]
  15. Ljung, L.; Yuan, Z.D. Asymptotic properties of black-box identification of transfer functions. IEEE Trans. Autom. Control. 1985, 30, 514–530. [Google Scholar]
  16. Wang, L.; Cluett, W. Building transfer function models from noisy step response data using the Laguerre network. Chem. Eng. Sci. 1995, 50, 149–161. [Google Scholar]
  17. Oliveira e Silva, T. On the determination of the optimal pole position of Laguerre filters. IEEE Trans. Signal Process. 1995, 43, 2079–2087. [Google Scholar]
  18. Björklund, S.; Ljung, L. A review of time-delay estimation techniques, Proceedings of the 42nd IEEE Conference on Decision and Control, Maui, HI, USA, 9–12 December 2003; 3, pp. 2502–2507.
  19. Isaksson, M. A Comparison of Some Approaches to Time-Delay Estimation. In Master’s Thesis; Department of Automatic Control, Lund: University, Lund, Sweden, 1997. [Google Scholar]
  20. Park, H.; Sung, S.; Lee, I.; Lee, J. On-line process identification using the Laguerre series for automatic tuning of the proportional-integral-derivative controller. Ind. Eng. Chem. Res. 1997, 36, 101–111. [Google Scholar]
  21. Fu, Y.; Dumont, G. On determination of Laguerre filter pole through step or impulse response data, Proceedings of the 12th IFAC World Congress; Pergamon: Oxford, UK, 1994; pp. 35–39.
  22. Wahlberg, B.; Mäkilä, P. On approximation of stable linear dynamical systems using Laguerre and Kautz functions. Automatica 1996, 32, 693–708. [Google Scholar]
  23. Gevers, M.; Bazanella, A.; Miskovic, L. Informative data: How to get just sufficiently rich? Proceedings of the 47th IEEE Conference on Decision and Control, 2008 (CDC 2008), Cancun, Mexico, 9–11 December 2008; pp. 1962–1967.
  24. Söderström, T.; Gustavsson, I.; Ljung, L. Identifiability conditions for linear systems operating in closed-loop. Int. J. Control. 1975, 21, 243–255. [Google Scholar]
  25. Shardt, Y.A.; Huang, B. Closed-loop identification with routine operating data: Effect of time delay and sampling time. J. Process Control. 2011, 21, 997–1010. [Google Scholar]
  26. Shardt, Y.; Huang, B. Statistical properties of signal entropy for use in detecting changes in time series data. J. Chemom. 2013, 27, 394–405. [Google Scholar]
  27. Finch, T. Incremental calculation of weighted mean and variance; Technical Report, Computing Service; University of Cambridge: Cambridge, UK, 2009. [Google Scholar]
  28. Golub, G.; Loan, C.F.V. Matrix Computations, 3rd ed; John Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
  29. Grillenzoni, C. Testing for causality in real time. J. Econom. 1996, 73, 355–376. [Google Scholar]
Figure 1. Control loop.
Figure 1. Control loop.
Processes 03 00357f1
Figure 2. Algorithm execution sequence. Lines denote the different logical tests. Boxes relate to the computations of quantities used in the different tests. Shaded boxes denote the samples used during initialization of the different quantities and white boxes refer to their online update.
Figure 2. Algorithm execution sequence. Lines denote the different logical tests. Boxes relate to the computations of quantities used in the different tests. Shaded boxes denote the samples used during initialization of the different quantities and white boxes refer to their online update.
Processes 03 00357f2
Figure 3. Examples of retrieved intervals and the quantities used during the algorithm execution. (a) Level, open loop; (b) Density, closed loop; (c) Test for variance of y; (d) Test for variance of y; (e) Test for numerical conditioning of R ¯ ( k ); (f) Test for numerical conditioning of R ¯ ( k ); (g) Granger causality test; (h) Granger causality test.
Figure 3. Examples of retrieved intervals and the quantities used during the algorithm execution. (a) Level, open loop; (b) Density, closed loop; (c) Test for variance of y; (d) Test for variance of y; (e) Test for numerical conditioning of R ¯ ( k ); (f) Test for numerical conditioning of R ¯ ( k ); (g) Granger causality test; (h) Granger causality test.
Processes 03 00357f3
Figure 4. Data under a significant disturbance that was rejected by the algorithm. (a) Temperature, open loop; (b) Test for numerical conditioning of R ¯ ( k ); (c) Test for variance of y; (d) Granger causality test.
Figure 4. Data under a significant disturbance that was rejected by the algorithm. (a) Temperature, open loop; (b) Test for numerical conditioning of R ¯ ( k ); (c) Test for variance of y; (d) Granger causality test.
Processes 03 00357f4
Table 1. Parameters and thresholds used.
Table 1. Parameters and thresholds used.
Filter coefficient for µyλµ0.99
Filter coefficient for γyλγ0.9
RLS forgetting factorλ0.99
Initialization matrix for QR-RLSR00.005 · I
Laguerre polesα, αI0.8, 0.6
Order of Laguerre input modelnb10
Order of noise modelna10
Minimal number of samples under the same moden02(na + nb)
Number of initialization samples before step changen1(na + nb)

Test descriptionThresholdValue

Step change sizeη10.002
Variance of outputη20.001252
Reciprocal condition numberη30.002
Chi-squareη423.2 (p = 0.99)
Table 2. Some statistics characterizing the performance of the method when applied to a historical dataset.
Table 2. Some statistics characterizing the performance of the method when applied to a historical dataset.
Loop typeNumber of Intervals ScannedAverage Length of Δ


Table 3. Counts for the deepest test Ti triggered before the algorithm exits and counts for the condition causing the algorithm to exit the scanning Ei.
Table 3. Counts for the deepest test Ti triggered before the algorithm exits and counts for the condition causing the algorithm to exit the scanning Ei.
iDeepest Ti TriggeredEi Triggered


Share and Cite

MDPI and ACS Style

Bittencourt, A.C.; Isaksson, A.J.; Peretzki, D.; Forsman, K. An Algorithm for Finding Process Identification Intervals from Normal Operating Data. Processes 2015, 3, 357-383.

AMA Style

Bittencourt AC, Isaksson AJ, Peretzki D, Forsman K. An Algorithm for Finding Process Identification Intervals from Normal Operating Data. Processes. 2015; 3(2):357-383.

Chicago/Turabian Style

Bittencourt, André C., Alf J. Isaksson, Daniel Peretzki, and Krister Forsman. 2015. "An Algorithm for Finding Process Identification Intervals from Normal Operating Data" Processes 3, no. 2: 357-383.

Article Metrics

Back to TopTop