A novel active semi-supervised learning framework using unlabeled data is proposed for fault identification in labeled expensive chemical processes. A principal component analysis (PCA) feature selection strategy is first given to calculate the weight of the variables. Secondly, the identification model is trained based on the obtained key process variables. Thirdly, the pseudo label confidence of identification model is dynamically optimized with an historical, current, and future pseudo label confidence mean. To increase the upper limit of the identification model that is self-learning with high entropy process data, active learning is used to identify process data and diagnosis fault causes by ontology. Finally, a PCA-dynamic active safe semi-supervised support vector machine (PCA-DAS4VM) for fault identification in labeled expensive chemical processes is built. The application in the Tennessee Eastman (TE) process shows that this hybrid technology is able to: (i) eliminate chemical process noise and redundant process variables simultaneously, (ii) combine historical pseudo label confidence with future pseudo label confidence to improve the identification accuracy of abnormal working conditions, (iii) efficiently select and diagnose high entropy unlabeled process data, and (iv) fully utilize unlabeled data to enhance the identification performance.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited