# Data Visualization and Visualization-Based Fault Detection for Chemical Processes

^{1}

^{2}

^{3}

^{*}

^{†}

^{‡}

## Abstract

**:**

## 1. Introduction

## 2. Framework

#### 2.1. Fault Detection

**Remark**

**1.**

## 3. Applications

#### 3.1. Continuous Processes

- Step 1
- Assume that matrix $X\in {R}^{m\times n}$ (which contains m samples of n process variables) represents a period of operation where the steady state process performance is considered to be optimal (a “golden period” [28]). We compute its eigenvalues $\mathbf{\lambda}$ and eigenvectors ${\mathbf{v}}_{i}$, $i\in \{1,\dots ,n\}$ of the data covariance matrix $\Sigma =X{X}^{\u22ba}$, i.e.,$$\begin{array}{c}\hfill \lambda \mathbf{v}=\Sigma \mathbf{v}\end{array}$$
- Step 2
- Using the $\mathbf{\lambda}$ and $\mathbf{v}$ values, we define an n-dimensional confidence ellipsoid around the steady state operating region. The coordinates $\overline{X}=[{\overline{x}}_{1},\cdots ,{\overline{x}}_{n}]$ of the center of the ellipsoid are calculated from:$$\begin{array}{c}\hfill {(x-\overline{x})}^{\u22ba}{\Sigma}^{-1}(x-\overline{x})=1\end{array}$$$$\begin{array}{c}\hfill {l}_{i}=2\sqrt{\kappa {\lambda}_{i}}\phantom{\rule{1.em}{0ex}}\forall i\in \left\{1\dots n\right\}\end{array}$$
- Step 3
- The extremes of the n-dimensional ellipsoid can be represented on the Kiviat diagram (Figure 5a) via a projection, which then allows us to define an appropriate confidence region for the centroids.
- Step 4
- The annular region between the extremes of the n-dimensional ellipsoid projected on the Kiviat diagram is sampled to generate random data points using values uniformly distributed within the bounds of each variable (Figure 5b).Polygons situated close to the edges of the annular region could in fact lie outside the confidence ellipsoid. To prevent this, each random polygon is verified to correspond to a point inside the confidence ellipsoid in the n-dimensional ellipsoid by reversing the projection from the Kiviat diagram to n-dimensional space. To to so, we follow two simple steps:
- (a)
- Apply the transformation matrix ${W}^{-1}$ to the coordinates Y of the randomly-generated polygon, to obtain the transformed coordinates Z:$$\begin{array}{c}\hfill Z=Y{W}^{-1}\end{array}$$$$\begin{array}{c}\hfill W=\mathbf{v}\sqrt{\lambda}\end{array}$$
- (b)
- Compare the norm $D=\parallel Z\parallel $ with the radius of the unit sphere. Then, if $D\le 1$, the randomly-generated polygon is indeed associated with a point within the confidence ellipsoid. The polygon is otherwise discarded, and a new polygon is generated.

- Step 5
- The procedure is repeated until the prescribed number of random polygons (typically, 5000) is reached. Then, the calculation of the minimum-area enclosing ellipse [29], of center c, ${(X-c)}^{\u22ba}A(X-c)=1$, is an optimization problem formulated as:$$\begin{array}{c}\hfill \begin{array}{cc}\underset{A,c}{\mathrm{min}}\hfill & \mathrm{log}\left(det\right(A\left)\right)\hfill \\ \mathrm{s}.\mathrm{t}.\hfill & {({P}_{i}-c)}^{\u22ba}A({P}_{i}-c)\le 1\phantom{\rule{0.277778em}{0ex}}i=1,2.\hfill \end{array}\end{array}$$

- Calculate the corresponding polygon and centroid in the Kiviat diagram for every new data sample.
- Assess if the centroid lies outside of the confidence region.
- Flag the sample as a faulty sample if it lies outside of the confidence region. A separate criterion (e.g., two consecutive samples are identified as faulty) can be implemented to raise a process fault.

#### 3.2. Batch Processes

#### 3.3. Periodic Processes

- Periodic processes resemble to some extent batch processes, in that each cycle can be considered to be a “batch.” Thus, “normal” operation can be defined in terms of repeatability, with all such “batches” being the same in a statistical sense. Note, however, that during normal operation, each cycle typically begins and ends in the same state; this is not the case for batch systems, where the start and end point are typically very different.
- The observation above hints at a potential similarity between periodic processes and continuous processes; a periodic process can be construed as “continuous” in the sense that it is desired that the cycles be reproducible and each cycle be statistically the same as its predecessor.

## 4. Conclusions

## Author Contributions

## Conflicts of Interest

## References

- Venkatasubramanian, V. Drowning in data: Informatics and modeling challenges in a data-rich networked world. AIChE J.
**2009**, 55, 2–8. [Google Scholar] [CrossRef] - Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng.
**2003**, 27, 293–311. [Google Scholar] [CrossRef] - Russell, E.L.; Chiang, L.H.; Braatz, R.D. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemom. Intell. Lab. Syst.
**2000**, 51, 81–93. [Google Scholar] [CrossRef] - Lee, J.; Yoo, C.K.; Lee, I. Fault detection of batch processes using multiway kernel principal component analysis. Comput. Chem. Eng.
**2004**, 28, 1837–1847. [Google Scholar] [CrossRef] - Lee, J.; Yoo, C.K.; Lee, I. Enhanced process monitoring of fed-batch penicillin cultivation using time-varying and multivariate statistical analysis. J. Biotechnol.
**2004**, 110, 119–136. [Google Scholar] [CrossRef] [PubMed] - Lee, J.; Yoo, C.; Lee, I. Statistical process monitoring with independent component analysis. J. Proc. Contr.
**2004**, 14, 467–485. [Google Scholar] [CrossRef] - He, Q.P.; Wang, J. Statistics pattern analysis: A new process monitoring framework and its application to semiconductor batch processes. AIChE J.
**2011**, 57, 107–121. [Google Scholar] [CrossRef] - Kano, M.; Nagao, K.; Hasebe, S.; Hashimoto, I.; Ohno, H.; Strauss, R. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Comput. Chem. Eng.
**2002**, 26, 161–174. [Google Scholar] [CrossRef] - Yoon, S.; MacGregor, J.F. Fault diagnosis with multivariate statistical models part I: Using steady state fault signatures. J. Proc. Contr.
**2001**, 11, 387–400. [Google Scholar] [CrossRef] - Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis: Part III: Process history based methods. Comput. Chem. Eng.
**2003**, 27, 324–346. [Google Scholar] [CrossRef] - Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control
**2012**, 36, 220–234. [Google Scholar] [CrossRef] - Inselberg, A. Parallel Coordinates; Springer: New York, NY, USA, 2009. [Google Scholar]
- Wang, R.; Edgar, T.F.; Baldea, M.; Nixon, M.; Wojsznis, W.; Dunia, R. Process Fault Detection Using Time-Explicit Kiviat Diagrams. AICHE J.
**2015**, 61, 4277–4293. [Google Scholar] [CrossRef] - Wang, R.; Edgar, T.F.; Baldea, M.; Nixon, M.; Wojsznis, W.; Dunia, R. A Geometric Framework for Batch Data Visualization, Process Monitoring and Fault Detection. J. Process Control
**2017**. accepted. [Google Scholar] - Wang, R.; Edgar, T.F.; Baldea, M. A geometric framework for monitoring and fault detection for periodic processes. AIChE J.
**2017**, 63, 2719–2730. [Google Scholar] [CrossRef] - Kolence, K.W. The software empiricist. ACM SIGMETRICS Perform. Eval. Rev.
**1973**, 2, 31–36. [Google Scholar] [CrossRef] - Tominski, C.; Abello, J.; Schumann, H. Interactive poster: 3D axes-based visualizations for time series data. In Proceedings of the IEEE Symposium on Information Visualization 2005 (InfoVis 2005), Minneapolis, MN, USA, 23–25 October 2005. [Google Scholar]
- Hackstadt, S.T.; Malony, A.D. Visualizing parallel programs and performance. IEEE Comput. Graph. Appl.
**1995**, 15, 12–14. [Google Scholar] [CrossRef] - Fanea, E.; Carpendale, S.; Isenberg, T. An interactive 3d integration of parallel coordinates and star glyphs. In Proceedings of the IEEE Symposium on Information Visualization 2005 (InfoVis 2005), Minneapolis, MN, USA, 23–25 October 2005; pp. 149–156. [Google Scholar]
- Albazzaz, H.; Wang, X.Z. Historical data analysis based on plots of independent and parallel coordinates and statistical control limits. J. Proc. Contr.
**2006**, 16, 103–114. [Google Scholar] [CrossRef] - Wang, X.; Medasani, S.; Marhoon, F.; Albazzaz, H. Multidimensional visualization of principal component scores for process historical data analysis. IECres
**2004**, 43, 7036–7048. [Google Scholar] [CrossRef] - He, Q.P. Multivariate visualization techniques in statistical process monitoring and their applications to semiconductor manufacturing. In Proceedings of the SPIE 31st International Symposium on Advanced Lithography, San Jose, CA, USA, 19 February 2006; p. 615506. [Google Scholar]
- MacGregor, J.F.; Kourti, T. Statistical process control of multivariate processes. Contr. Eng. Prac.
**1995**, 3, 403–414. [Google Scholar] [CrossRef] - Dunia, R.; Rochelle, G.; Edgar, T.F.; Nixon, M. Multivariate Monitoring of a Carbon Dioxide Removal Process. Comput. Chem. Eng.
**2014**, 60, 381–395. [Google Scholar] [CrossRef] - Dunia, R.; Edgar, T.F.; Nixon, M. Process monitoring using principal components in parallel coordinates. AIChE J.
**2013**, 59, 445–456. [Google Scholar] [CrossRef] - Albazzaz, H.; Wang, X.Z.; Marhoon, F. Multidimensional visualisation for process historical data analysis: A comparative study with multivariate statistical process control. J. Process Control
**2005**, 15, 285–294. [Google Scholar] [CrossRef] - Gajjar, S.; Palazoglu, A. A data-driven multidimensional visualization technique for process fault detection and diagnosis. Chemom. Intell. Lab. Syst.
**2016**, 154, 122–136. [Google Scholar] [CrossRef] - Yu, J.; Qin, S.J. Statistical MIMO controller performance monitoring. Part I: Data-driven covariance benchmark. J. Proc. Contr.
**2008**, 18, 277–296. [Google Scholar] [CrossRef] - Moshtagh, N. Minimum volume enclosing ellipsoid. Convex Optim.
**2005**, 111, 112. [Google Scholar] - Downs, J.; Vogel, E. A plant-wide industrial process control problem. Comput. Chem. Eng.
**1993**, 17, 245–255. [Google Scholar] [CrossRef] - Ricker, N.L. Tennessee Eastman Challenge Archive. Available online: http://depts.washington.edu/control/LARRY/TE/download.html (accessed on 15 April 2017).
- Lee, J.; Yoo, C.; Lee, I. Statistical monitoring of dynamic processes based on dynamic independent component analysis. Chem. Eng. Sci.
**2004**, 59, 2995–3006. [Google Scholar] [CrossRef] - Zhang, Y. Fault Detection and Diagnosis of Nonlinear Processes Using Improved Kernel Independent Component Analysis (KICA) and Support Vector Machine (SVM). Ind. Eng. Chem. Res.
**2008**, 47, 6961–6971. [Google Scholar] [CrossRef] - Birol, G.; Ündey, C.; Cinar, A. A modular simulation package for fed-batch fermentation: Penicillin production. Comput. Chem. Eng.
**2002**, 26, 1553–1565. [Google Scholar] [CrossRef] - Process Systems Enterprise, gPPROMS gML Separations—Adsoprtion Model Library. Available online: www.psenterprise.com/gproms (accessed on 30 April 2017).

**Figure 1.**Data visualization in parallel coordinates for a five-dimensional dataset. Each coordinate can be regarded as the ordinate of a regular time series plot. Data samples are added to the plot as they are acquired, in the form of a set of linear segments. As time progresses (

**a**–

**d**), current data are typically shown along with previously-plotted information to capture trends.

**Figure 2.**Representing multi-dimensional time series data using Kiviat diagrams. The same five-dimensional dataset as in Figure 1, with one-minute sampling time, is used for illustration purposes. The first sample is plotted (

**a**) on the Kiviat plot having a time axis that is normal to the plot plane. The next samples are added as additional Kiviat plots whose planes are parallel to the plane of the first and spaced along the time axis according to the sampling time (

**b**–

**d**). The diagram can be updated by adding such “data slices” in a first-in, first-out manner.

**Figure 3.**Univariate control limits suffer from “blind spots” in a multivariate setting: a data sample (marked in red) can be within the control limits from the perspective of every variable on the respective univariate control charts, but fall outside the multivariate confidence region. LCL and UCL represent univariate lower and upper control limits, respectively.

**Figure 4.**The centroid of each slice constitutes a single-point, multivariate representation of each data slice. (

**b**) is a “top-down” view of (

**a**), with the centroids shown as diamonds.

**Figure 5.**(

**a**) Limits in time-resolved Kiviat diagram. Black arrows indicate limits for each variable. Blue and green lines are the extrema of the confidence ellipsoid. (

**b**) Sampled points within the annular region (in red) are used to generate the confidence ellipse.

**Figure 6.**Unfolding of batch data. (

**a**) Batch data in three dimensions; (

**b**) batch-wise unfolding; (

**c**) time-wise unfolding.

**Figure 7.**The confidence region at every data point drawn (green) for an illustrative batch process data set resembles a funnel or tube in 3D.

**Figure 8.**Schematic of the the PenSim process, reproduced with permission from [34]. Copyright Elsevier, 2002.

**Figure 9.**Intra-cycle fault detection is carried out on a problematic cycle. Each sample in the problematic cycle is compared against the intra-cycle confidence region (in red); samples that lie inside the region are colored in blue, whereas samples that lie outside the confidence region are colored in black.

**Figure 10.**Schematic of the PSA system; the solid lines denote the flow pathway of the gas, while the dashed lines represent inactive piping in the cycle. As shown in the figure, Bed 1 is the active bed (flow denoted in blue), while Bed 2 is being regenerated.

**Table 1.**Faults that can be implemented in Tennessee Eastman Process simulator, reproduced with permission from [32]. Copyright Elsevier, 2004.

Fault No. | Description | Type |
---|---|---|

1 | A/C feed ratio, B Composition constant (Stream 4) | Step |

2 | B Composition, A/C ratio constant (Stream 4) | Step |

3 | D feed temperature (Stream 2) | Step |

4 | Reactor cooling water inlet temperature | Step |

5 | Condenser cooling water inlet temperature | Step |

8 | A, B, C feed composition (Stream 4) | Random variation |

10 | C feed temperature (Stream 4) | Random variation |

14 | Reactor cooling water valve | Sticking |

Fault Detection Delay (Minutes) (Lower Is Better) | |||||
---|---|---|---|---|---|

Fault Numbers | Proposed Method | PCA T${}^{\mathbf{2}}$ | PCA Q | DPCA T${}^{\mathbf{2}}$ | DPCA Q |

1 | 3 | 3 | 9 | ||

3 | 17 | 2 | 6 | 6 | |

1 and 3 | 3 | 3 | 9 | ||

2 | 8 | 6 | 91 | 24 | 94 |

4 | 2 | 6 | 138 | 2 | 94 |

2 and 4 | 2 | 2 | 104 | 6 | 107 |

5 | 2 | 2 | 145 | 3 | 131 |

10 | 52 | 41 | 106 | 47 | 117 |

5 and 10 | 2 | 2 | 3 | ||

8 | 46 | 21 | 116 | 65 | 119 |

14 | 8 | 3 | 8 | ||

8 and 14 | 4 | 2 | 113 | 6 | 119 |

Missed Detection Rates (Lower Is Better) | |||||
---|---|---|---|---|---|

Fault Numbers | Proposed Method | PCA T${}^{\mathbf{2}}$ | PCA Q | DPCA T${}^{\mathbf{2}}$ | DPCA Q |

1 | 0.0179 | 0.0179 | 0.0714 | ||

3 | 0.542 | 0.0095 | 0.9786 | ||

1 and 3 | 0.0174 | 0.0174 | 0.0696 | ||

2 | 0.018 | 0.0103 | 0.9205 | 0.059 | 0.9282 |

4 | 0.040 | 0.0024 | 0.9786 | 0.399 | 0.9406 |

2 and 4 | 0.0124 | 0.0025 | 0.9208 | 0.0025 | 0.9282 |

5 | 0.002 | 0.0024 | 0.981 | 0.0356 | 0.9477 |

10 | 0.138 | 0.095 | 0.9287 | 0.1093 | 0.9145 |

5 and 10 | 0.0024 | 0.0024 | 0.0048 | ||

8 | 0.102 | 0.0784 | 0.9121 | 0.152 | 0.9192 |

14 | 0.040 | 0.0048 | 0.0166 | ||

8 and 14 | 0.0261 | 0.0024 | 0.905 | 0.0119 | 0.9192 |

False Detection Rates (Lower Is Better) | |||||
---|---|---|---|---|---|

Fault Numbers | Proposed Method | PCA T${}^{\mathbf{2}}$ | PCA Q | DPCA T${}^{\mathbf{2}}$ | DPCA Q |

1 | 0.0267 | 0.0533 | 0 | ||

3 | 0.03 | 0.03 | 0.0133 | ||

1 and 3 | 0.0033 | 0.0533 | 0 | ||

2 | 0.03 | 0.04 | 0 | 0 | 0 |

4 | 0.0367 | 0.05 | 0.0167 | 0 | 0 |

2 and 4 | 0 | 0.04 | 0 | 0 | 0 |

5 | 0.0367 | 0.0333 | 0.02 | 0 | 0 |

10 | 0.04 | 0.0333 | 0 | 0 | 0 |

5 and 10 | 0 | 0.03 | 0 | ||

8 | 0.0333 | 0.06 | 0 | 0 | 0 |

14 | 0.0267 | 0.0467 | 0 | ||

8 and 14 | 0.0033 | 0.05 | 0 | 0 | 0 |

**Table 5.**List of process variables, reproduced with permission from [34]. Copyright Elsevier, 2002.

Variable Number | Variable Description |
---|---|

x${}_{1}$ | Aeration rate (L/h) |

x${}_{2}$ | Agitator power (W) |

x${}_{3}$ | Substrate feed rate (L/h) |

x${}_{4}$ | Substrate temperature (K) |

x${}_{5}$ | Substrate concentration (g/L) |

x${}_{6}$ | Dissolved oxygen concentration (g/L) |

x${}_{7}$ | Biomass concentration (g/L) |

x${}_{8}$ | Penicillin concentration (g/L) |

x${}_{9}$ | Culture volume (L) |

x${}_{10}$ | Carbon dioxide concentration (g/L) |

x${}_{11}$ | pH |

x${}_{12}$ | Temperature (K) |

x${}_{13}$ | Generated heat (cal) |

x${}_{14}$ | Acid flow rate (mL/h) |

x${}_{15}$ | Base flow rate (mL/h) |

x${}_{16}$ | Cooling/heating water flow rate (L/h) |

Fault No. | Description | Type |
---|---|---|

1 | 10% increase in aeration rate | Step |

2 | 20% increase in aeration rate | Step |

3 | 1.5 L h${}^{-1}$ increase in aeration rate | Ramp |

4 | 20% increase in agitation power | Step |

5 | 40% increase in agitation power | Step |

6 | 0.015 W increase in agitator power | Ramp |

7 | 20% increase in substrate feed | Step |

8 | 40% increase in substrate feed | Step |

9 | 0.12 L h${}^{-1}$ increase in substrate feed | Ramp |

Fault Detection Delay (Hours) (Lower Is Better) | |||
---|---|---|---|

Dataset # | Proposed Method | MPCA T${}^{\mathbf{2}}$ | MPCA Q |

1 | 0.5 | 4 | 3.5 |

2 | 0.5 | 9.5 | 9.5 |

3 | 13 | 13 | 13 |

4 | 1.5 | 2.5 | 3 |

5 | 9 | 7 | 7.5 |

6 | 15.5 | 11.5 | 12.5 |

7 | 20 | 1.5 | 2 |

8 | 14.5 | 6 | |

9 | 12.5 | 10.5 |

False Detection Rates (Lower Is Better) | |||
---|---|---|---|

Dataset # | Proposed Method | MPCA T${}^{\mathbf{2}}$ | MPCA Q |

1 | 0.11 | 0.075 | 0.1 |

2 | 0.025 | 0.07 | 0.085 |

3 | 0.01 | 0.095 | 0.07 |

4 | 0.03 | 0.105 | 0.07 |

5 | 0.16 | 0.11 | 0.07 |

6 | 0 | 0.105 | 0.035 |

7 | 0.07 | 0.105 | 0.02 |

8 | 0.085 | 0.105 | |

9 | 0.07 | 0.105 |

Duration (s) | Bed 1 State | Bed 2 State |
---|---|---|

2 | Pressurization | Blowdown |

60 | Adsorption | Desorption |

2 | Pressure Equalization | Pressure Equalization |

2 | Blowdown | Pressurization |

60 | Desorption | Adsorption |

2 | Pressure Equalization | Pressure Equalization |

Parameter | Parameter Value |
---|---|

Feed flow rate | 0.00364 mol/s |

Temperature of feed | 298.15 K |

Length of bed | 0.35 m |

Radius of bed | 0.0175 m |

Particle radius | 0.003175 m |

$\u03f5$ (void fraction) | 0.4 |

${P}_{feed}$ | 300,000 Pa |

Fault Detection Delay (Seconds) (Lower is Better) | ||||||
---|---|---|---|---|---|---|

Case | Fault Description | Proposed Method | DPCA [3] ${\mathit{T}}^{\mathbf{2}}$ | DPCA [3] Q | MPCA [5] ${\mathit{T}}^{\mathbf{2}}$ | MPCA [5] Q |

1 | Increased temperature feed by | 89 | 120 | 115 | 118 | 74 |

5K in Bed 1 and Bed 2 | ||||||

2 | Decreased temperature feed by | 99 | 51 | 54 | 116 | 54 |

5K in Bed 1 and Bed 2 | ||||||

3 | Pressure drop in Bed 1 by 10% | 59 | 52 | 103 | 118 | 116 |

4 | Pressure rise in Bed 2 by 10% | 61 | 122 | 116 | 116 | 173 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, R.C.; Baldea, M.; Edgar, T.F.
Data Visualization and Visualization-Based Fault Detection for Chemical Processes. *Processes* **2017**, *5*, 45.
https://doi.org/10.3390/pr5030045

**AMA Style**

Wang RC, Baldea M, Edgar TF.
Data Visualization and Visualization-Based Fault Detection for Chemical Processes. *Processes*. 2017; 5(3):45.
https://doi.org/10.3390/pr5030045

**Chicago/Turabian Style**

Wang, Ray C., Michael Baldea, and Thomas F. Edgar.
2017. "Data Visualization and Visualization-Based Fault Detection for Chemical Processes" *Processes* 5, no. 3: 45.
https://doi.org/10.3390/pr5030045