# Missing Data Probability Estimation-Based Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Statement and Motivation Analysis

## 3. Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates

#### 3.1. Marginalization-Based Realization Estimation

#### 3.2. Expectation–Maximization-Based Likelihood Probability Estimation

#### 3.3. Bayesian and Full Probability-Based Outlier Detection

## 4. Simulation and Application

- (1)
- The first aspect is the detection result for incomplete samples. The Bayesian detection method, which is based only on complete data, cannot achieve detection, whereas the method adopted in this study can realize the detection for the 140th, 250th, and 350th points, but still deems the 460th point normal instead of classifying it as an outlier, as it is shown in Table 6.
- (2)
- The second aspect is that, for complete outlier points, both the traditional Bayesian method and the method adopted in this study can achieve detection, but the detection result is slightly different. For the 120th and 480th points, both methods fail to detect the abnormality, while for the 240th and 600th points, the method used in this study finds the fault with higher probability. For the other outlier points, the two methods achieve the same result,which is illustrated in Figure 4.

## 5. Conclusions

## Supplementary Materials

Supplementary File 1## Author Contributions

## Funding

## Conflicts of Interest

## References

- Knorr, E.M.; Ng, R.T. Algorithms for Mining Distance-based Outliers in Large Datasets. In Proceedings of the International Conference on very Large Data Bases, New York, NY, USA, 24–27 August 1998; pp. 392–403. [Google Scholar]
- Xue, Z.; Shang, Y.; Feng, A. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Math. Comput. Simul.
**2010**, 80, 1911–1921. [Google Scholar] [CrossRef] - Englund, C.; Verikas, A. A hybrid approach to outlier detection in the offset lithographic printing process. Eng. Appl. Artif. Intell.
**2005**, 18, 759–768. [Google Scholar] [CrossRef] - Han, S.J.; Cho, S.B. Evolutionary neural networks for anomaly detection based on the behavior of a program. IEEE Trans. Syst. Man Cybern. Part B Cybern.
**2006**, 36, 559–570. [Google Scholar] - Hung, W.L.; Yang, M.S. An omission approach for detecting outliers in fuzzy regression models. Fuzzy Sets Syst.
**2006**, 157, 3109–3122. [Google Scholar] [CrossRef] - Lin, C.C.; Chen, A.P. Fuzzy discriminant analysis with outlier detection by genetic algorithm. Comput. Oper. Res.
**2004**, 31, 877–888. [Google Scholar] [CrossRef] - Xu, N.; Zhang, Y. An Efficient Reduction Algorithm of High-dimensional Decision Tables Based on Rough Sets Theory. In Proceedings of the Intelligent Control and Automation (WCICA 2004), Hangzhou, China; 15–19 June 2004; Volume 4305, pp. 4304–4308. [Google Scholar]
- Li, X.; Rao, F. An rough entropy based approach to outlier detection. J. Comput. Inf. Syst.
**2012**, 8, 10501–10508. [Google Scholar] - Zhang, Y.; Meratnia, N.; Havinga, P.J.M. Distributed Online Outlier Detection in Wireless Sensor Networks Using Ellipsoidal Support Vector Machine; Elsevier Science Publishers B.V.: Amsterdam, The Netherlands, 2013; pp. 1062–1074. [Google Scholar]
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, Second Edition; John Wiley & Sons: Hoboken, NJ, USA, 2002; pp. 200–220. [Google Scholar]
- Grzymalabusse, J.W.; Hu, M. A Comparison of Several Approaches to Missing Attribute Values in Data Mining; Springer: Berlin/Heidelberg, Germany, 2000; pp. 378–385. [Google Scholar]
- Kumar, N.; Hoque, M.A.; Shahjaman, M.; Islam, S.M.S.; Mollah, M.N.H. A new approach of outlier-robust missing value imputation for metabolomics data analysis. Curr. Bioinform.
**2017**, 12. [Google Scholar] [CrossRef] - Kim, I.S.; Jung, W. Method of processing the outliers and missing values of field data to improve RAM analysis accuracy. J. Appl. Reliab.
**2017**, 17, 264–271. [Google Scholar] - Xiao, H.; Huang, D.; Pan, Y.; Liu, Y.; Song, K. Fault diagnosis and prognosis of wastewater processes with incomplete data by the auto-associative neural networks and ARMA Model. Chemometr. Intell. Lab. Syst.
**2016**, 161, 96–107. [Google Scholar] [CrossRef] - Yan, Y.T.; Zhang, Y.P.; Zhang, Y.W.; Du, X.Q. A selective neural network ensemble classification for incomplete data. Int. J. Mach. Learn. Cybern.
**2016**, 8, 1–12. [Google Scholar] [CrossRef] - Nowicki, R. On Combining Neuro-Fuzzy architectures with the rough set theory to solve classification problems with incomplete data. IEEE Trans. Knowl. Data Eng.
**2008**, 20, 1239–1253. [Google Scholar] [CrossRef] - Luo, C.; Li, T.; Yao, Y. Dynamic probabilistic rough sets with incomplete data. Inf. Sci.
**2017**, 417, 39–54. [Google Scholar] [CrossRef] - Pernestaal, A. Probabilistic Fault Diagnosis: With Automotive Applications. Ph.D. Thesis, Linköping University, Linköping, Sweden, 2009; pp. 38–49. [Google Scholar]
- Huang, B. Bayesian methods for control loop monitoring and diagnosis. J. Process Control
**2008**, 18, 829–838. [Google Scholar] [CrossRef] [Green Version] - Qi, F.; Huang, B.; Tamayo, E.C. A Bayesian approach for control loop diagnosis with missing data. AIChE J.
**2010**, 56, 179–195. [Google Scholar] [CrossRef] - Zhang, K.; Gonzalez, R.; Huang, B.; Ji, G. An expectation maximization approach to fault diagnosis with missing data. IEEE Trans. Ind. Electron.
**2015**, 62, 1231–1240. [Google Scholar] [CrossRef] - Jiang, Q.; Huang, B.; Ding, S.X.; Yan, X. Bayesian fault diagnosis with asynchronous measurements and its application in networked distributed monitoring. IEEE Trans. Ind. Electron.
**2016**, 63, 6316–6324. [Google Scholar] [CrossRef] - Ge, Z.; Song, Z. Distributed PCA model for plant-wide process monitoring. Ind. Eng. Chem. Res.
**2013**, 52, 1947–1957. [Google Scholar] [CrossRef] - Downs, J.J.; Vogel, E.F. A plant-wide industrial process control problem. Comput. Chem. Eng.
**1993**, 17, 245–255. [Google Scholar] [CrossRef]

**Figure 3.**EM iteration results for normal and outlier status: (

**a**) the normal status; (

**b**) the outlier status.

Sample | ${\mathit{\pi}}_{1}(\mathbf{T})$ | ${\mathit{\pi}}_{2}(2\mathbf{T})$ | ${\mathit{\pi}}_{3}(3\mathbf{T})$ | Possible Realization |
---|---|---|---|---|

${d}^{1}$ | 0 | * | * | $\left[\begin{array}{ccc}0& 0& 0\end{array}\right],\left[\begin{array}{ccc}0& 0& 1\end{array}\right],\left[\begin{array}{ccc}0& 1& 0\end{array}\right],\left[\begin{array}{ccc}0& 1& 1\end{array}\right]$ |

${d}^{2}$ | 0 | 1 | * | $\left[\begin{array}{ccc}0& 1& 0\end{array}\right],\left[\begin{array}{ccc}0& 1& 1\end{array}\right]$ |

${d}^{3}$ | 0 | * | 0 | $\left[\begin{array}{ccc}0& 1& 0\end{array}\right],\left[\begin{array}{ccc}0& 0& 0\end{array}\right]$ |

${d}^{4}$ | 0 | 1 | * | $\left[\begin{array}{ccc}0& 1& 0\end{array}\right],\left[\begin{array}{ccc}0& 1& 1\end{array}\right]$ |

${d}^{5}$ | 0 | * | * | $\left[\begin{array}{ccc}0& 0& 0\end{array}\right],\left[\begin{array}{ccc}0& 0& 1\end{array}\right],\left[\begin{array}{ccc}0& 1& 0\end{array}\right],\left[\begin{array}{ccc}0& 1& 1\end{array}\right]$ |

${d}^{6}$ | 0 | 1 | 0 | $\left[\begin{array}{ccc}0& 1& 0\end{array}\right]$ |

Outlier | Reason | Type | Is Incomplete Data |
---|---|---|---|

120 | A feed (stream 1) | Pulse change | No |

140 | Reactor level | Pulse change | Yes |

240 | D feed (stream 2) | Pulse change | No |

250 | Reactor temperature | Pulse change | Yes |

350 | Purge rate (stream 9) | Pulse change | Yes |

360 | E feed (stream 3) | Pulse change | No |

460 | Product separator temperature | Pulse change | Yes |

480 | A and C feed (stream 4) | Pulse change | No |

600 | Recycle flow (stream 8) | Pulse change | No |

720 | Reactor feed rate (stream 6) | Pulse change | No |

840 | Reactor pressure | Pulse change | No |

Outlier | Reason | Type | Is Incomplete Data |
---|---|---|---|

120 | Product separator level | Pulse change | No |

140 | Stripper steam flow | Pulse change | Yes |

240 | Product separator pressure | Pulse change | No |

250 | Compress work | Pulse change | Yes |

350 | Reactor cooling water outlet temp | Pulse change | Yes |

360 | Product separator underflow | Pulse change | No |

460 | Separator cooling water outlet temp | Pulse change | Yes |

480 | Stripper level | Pulse change | No |

600 | Stripper pressure | Pulse change | No |

720 | Stripper underflow (stream 11) | Pulse change | No |

840 | Stripper temperature | Pulse change | No |

E1 | E2 | E3 | E4 | E5 | E6 | E7 | E8 | |
---|---|---|---|---|---|---|---|---|

1 | 0.993464 | 0 | 0 | 0 | 0 | 0.006536 | 0 | 0 |

2 | 0.998419 | 0 | 0.000263 | 0.000263 | 0 | 0.001054 | 0 | 0 |

3 | 0.998309 | 0 | 0.00033 | 0.000308 | 0 | 0.001054 | 0 | 0 |

4 | 0.998284 | 0 | 0.000356 | 0.000306 | 0 | 0.001054 | 0 | 0 |

5 | 0.998278 | 0 | 0.000373 | 0.000295 | 0 | 0.001054 | 0 | 0 |

6 | 0.998275 | 0 | 0.000389 | 0.000282 | 0 | 0.001054 | 0 | 0 |

7 | 0.998274 | 0 | 0.000403 | 0.000269 | 0 | 0.001054 | 0 | 0 |

E1 | E2 | E3 | E4 | E5 | E6 | E7 | E8 | |
---|---|---|---|---|---|---|---|---|

1 | 0 | 0.7142 | 0 | 0.1429 | 0 | 0.1429 | 0 | 0 |

2 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.0714 | 0.2078 | 0.0714 | 0.0714 |

3 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.0949 | 0.1845 | 0.0714 | 0.0714 |

4 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.1026 | 0.1770 | 0.0714 | 0.0714 |

5 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.1047 | 0.1745 | 0.0714 | 0.0714 |

6 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.1055 | 0.1737 | 0.0714 | 0.0714 |

7 | 0 | 0.3896 | 0.0714 | 0.1169 | 0.1058 | 0.1734 | 0.0714 | 0.0714 |

Sampling Point | 140 | 250 | 350 | 460 |

Probability of Outlier | 1 | 1 | 0.933 | 0.006 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tian, Y.; Yin, Z.; Huang, M.
Missing Data Probability Estimation-Based Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates. *Symmetry* **2018**, *10*, 475.
https://doi.org/10.3390/sym10100475

**AMA Style**

Tian Y, Yin Z, Huang M.
Missing Data Probability Estimation-Based Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates. *Symmetry*. 2018; 10(10):475.
https://doi.org/10.3390/sym10100475

**Chicago/Turabian Style**

Tian, Ying, Zhong Yin, and Miao Huang.
2018. "Missing Data Probability Estimation-Based Bayesian Outlier Detection for Plant-Wide Processes with Multisampling Rates" *Symmetry* 10, no. 10: 475.
https://doi.org/10.3390/sym10100475