Next Article in Journal
Explicit Residence Time Distribution of a Generalised Cascade of Continuous Stirred Tank Reactors for a Description of Short Recirculation Time (Bypassing)
Next Article in Special Issue
BISSO: Biomass Interface for Superstructure Simulation and Optimization
Previous Article in Journal
Proposal of a Learning Health System to Transform the National Health System of Spain
Previous Article in Special Issue
Discrete-Time Kalman Filter Design for Linear Infinite-Dimensional Systems
Open AccessFeature PaperArticle

A Comparison of Clustering and Prediction Methods for Identifying Key Chemical–Biological Features Affecting Bioreactor Performance

Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
Author to whom correspondence should be addressed.
Processes 2019, 7(9), 614;
Received: 28 May 2019 / Revised: 14 August 2019 / Accepted: 2 September 2019 / Published: 10 September 2019
(This article belongs to the Special Issue Process Systems Engineering à la Canada)
Chemical–biological systems, such as bioreactors, contain stochastic and non-linear interactions which are difficult to characterize. The highly complex interactions between microbial species and communities may not be sufficiently captured using first-principles, stationary, or low-dimensional models. This paper compares and contrasts multiple data analysis strategies, which include three predictive models (random forests, support vector machines, and neural networks), three clustering models (hierarchical, Gaussian mixtures, and Dirichlet mixtures), and two feature selection approaches (mean decrease in accuracy and its conditional variant). These methods not only predict the bioreactor outcome with sufficient accuracy, but the important features correlated with said outcome are also identified. The novelty of this work lies in the extensive exploration and critique of a wide arsenal of methods instead of single methods, as observed in many papers of similar nature. The results show that random forest models predict the test set outcomes with the highest accuracy. The identified contributory features include process features which agree with domain knowledge, as well as several different biomarker operational taxonomic units (OTUs). The results reinforce the notion that both chemical and biological features significantly affect bioreactor performance. However, they also indicate that the quality of the biological features can be improved by considering non-clustering methods, which may better represent the true behaviour within the OTU communities. View Full-Text
Keywords: machine learning; bioinformatics; statistics machine learning; bioinformatics; statistics
Show Figures

Figure 1

MDPI and ACS Style

Tsai, Y.; Baldwin, S.A.; Siang, L.C.; Gopaluni, B. A Comparison of Clustering and Prediction Methods for Identifying Key Chemical–Biological Features Affecting Bioreactor Performance. Processes 2019, 7, 614.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop