Next Article in Journal / Special Issue
Big Data Analytics for Smart Manufacturing: Case Studies in Semiconductor Manufacturing
Previous Article in Journal
On the Use of Multivariate Methods for Analysis of Data from Biological Networks
Previous Article in Special Issue
Industrial Process Monitoring in the Big Data/Industry 4.0 Era: from Detection, to Diagnosis, to Prognosis
Article Menu
Issue 3 (September) cover image

Export Article

Open AccessFeature PaperArticle
Processes 2017, 5(3), 38; doi:10.3390/pr5030038

Principal Component Analysis of Process Datasets with Missing Values

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Current address: Element Analytics, San Fransisco, CA 94107, USA.
*
Author to whom correspondence should be addressed.
Academic Editor: John D. Hedengren
Received: 31 May 2017 / Revised: 28 June 2017 / Accepted: 30 June 2017 / Published: 6 July 2017
(This article belongs to the Collection Process Data Analytics)
View Full-Text   |   Download PDF [924 KB, uploaded 6 July 2017]   |  

Abstract

Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA), which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets. View Full-Text
Keywords: principal component analysis; missing data; process data analytics; chemometrics; machine learning; multivariable statistical process control; process monitoring; Tennessee Eastman problem principal component analysis; missing data; process data analytics; chemometrics; machine learning; multivariable statistical process control; process monitoring; Tennessee Eastman problem
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Severson, K.A.; Molaro, M.C.; Braatz, R.D. Principal Component Analysis of Process Datasets with Missing Values. Processes 2017, 5, 38.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Processes EISSN 2227-9717 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top