Monitoring of Flotation Systems by Use of Multivariate Froth Image Analysis

: Froth image analysis has been considered widely in the identification of operational regimes in flotation circuits, the characterisation of froths in terms of bubble size distributions, froth stability and local froth velocity patterns, or as a basis for the development of inferential online sensors for chemical species in the froth. Relatively few studies have considered flotation froth image analysis in unsupervised process monitoring applications. In this study, it is shown that froth image analysis can be combined with traditional multivariate statistical process monitoring meth-ods for reliable monitoring of industrial platinum metal group flotation plants. This can be accomplished with well-established methods of multivariate image analysis, such as the Haralick feature set derived from grey level co-occurrence matrices and local binary patterns that were considered in this investigation.


Introduction
It is well documented that in froth flotation systems, the properties of the froth phase, such as froth stability, froth mobility, and froth rheology, have a significant influence on the performance of flotation [1][2][3].Operational variables, such as flow rates, ore mineralogy, and pulp chemistry, affect flotation performance through their effect on the froth phase [4][5][6][7][8].The successful transport of mineral-rich bubbles from the pulp-froth zone to the concentrate launder is expedited by a transient froth.Ideally, the mineralised froth should collapse as soon as it reaches the concentrate launder.Doing so earlier may lead to loss of valuable mineral from the froth to the pulp phase and, following that, to the tailings of the flotation cell.Persistent or excessively stable froths may cause problems in pumping, and may have an adversely effect on mineral separation on downstream process operations [9].
Despite their importance, froth phase models are not well-established yet, at least not as far as the control of flotation processes is concerned.This is a critical barrier to better operation, as advanced control is recognized as one of the most efficient ways to improve flotation performance [10].
Fortunately, image analysis can compensate for the lack of adequate froth phase models in the context of process control.Froth image analysis has advanced considerably, since computer vision systems became industrially established in the 1990s.This includes the measurement of bubble size distributions as a means to characterize froths [11][12][13], the measurement of froth colour, froth stability, and froth velocity patterns that are used on the control of flotation plants.These methods do not require information additional to the images and are well-supported by commercial software.
Other approaches, such as wavelets [14][15][16], co-occurrence matrices [17][18][19][20], Gabor filters [21], and local binary patterns [22][23][24], have all been shown to be useful methods to represent the characteristics of froth systems in terms of sets of multivariate features.This opens the way for direct monitoring of the performance of flotation circuits with computer vision systems.
Unlike problems focused on the recognition of predefined operational regimes, which are typically treated as classification problems [25,26], or the use of computer vision systems as soft sensors that are typically casted as regression problems [27,28], monitoring in general is conducted in an unsupervised learning framework.These approaches are based on data representative of normal operating conditions (NOCs) against which new data can be compared to determine whether a deviation from NOCs has occurred.
Froth images representative of NOCs can be used directly in these multivariate process control frameworks, with the advantage of capturing significant process variation that may not be measurable otherwise [29].Despite these benefits, relatively few papers have considered this approach.
For example, Liu and MacGregor [30,31] proposed control of the appearance of froth structures in flotation plants based on the use of 2-D continuous wavelet transforms to capture spatial and textural information at different resolutions in these images.Multivariate image analysis was further used to extract other features from the froth, such as black holes and clear windows on top of the bubbles.
More recently, Zhang et al. [32] proposed the use of a long short term memory (LSTM) neural network to monitoring flotation operations based on videographic sequences of froth images.This system was designed to monitor froth grades and requires additional information on the froth grades.
In this paper, it is shown that established approaches in multivariate image analysis can be used in a traditional multivariate statistical process control framework as an effective means for monitoring the performance of flotation cells.
Section 2 of the paper discusses the analytical methodology, while Sections 3 and 4 illustrate the approach based on two case studies with data from the platinum industry.Further discussion and conclusions are covered in Section 5.

Analytical Methodology
The monitoring strategy considered in this investigation is based on a traditional multivariate statistical process monitoring framework, with a principal component model.The principal component model is constructed from features extracted from images that represent normal process operation, as schematically illustrated in Figure 1.In essence, features extracted from the froth images are used to construct a principal component model that can generate monitoring diagnostics with which the in-control or out-of-control state of the process can be determined statistically.
Any of a number of different approaches can be followed to extract features from these images, and in this study, grey level co-occurrence matrices and local binary patterns are considered, as described in more detail in the following sections, followed by a discussion of the principal component model.

Feature Extraction with Grey Level Co-Occurrence Matrices
The grey level co-occurrence matrix is represented by  (,) of an image  with parameters  and , where  is the distance between each pair of pixels in the image and  is the number of grey levels considered in the image, as indicated in Figure 2.Each entry,   , in the ×  matrix denotes the number of times that a grey level is associated with a pair of pixels at displacement  in the image.The Haralick [33] set of image descriptors extracted from GLCM images are often used in image analysis.In this investigation, the following four features were used: In Equations ( 1)-( 4), a � ij is the (i,j)th element of the normalised GLCM and m k and s k are the mean and standard deviation of the matrix rows and columns.The energy (ENE) is a measure of the local uniformity of grey levels and large ENE values are associated with pixels that are very similar.The contrast (CON) is a measure of the intensity of grey level variations between neighbouring pixels, i.e., large CON values reflect large differences.The correlation (COR) represents the linear dependency between grey values in the co-occurrence matrix, while the homogeneity (HOM) shows the closeness of the distribution of elements in the co-occurrence matrix to its diagonal, i.e., HOM would approach unity when there are only a few dominant grey tones present.Homogeneity and contrast are typically inversely correlated.
GLCM methods were some of the very first to be used in froth image analysis [17][18][19] and have since been considered extensively in a range of applications in froth image analysis.

Feature Extraction with Local Binary Patterns
Applied on a pixel-by-pixel basis in images, local binary pattern operators compare the intensity of each pixel to the intensities of the other pixels in its neighbourhood [34].The difference in the intensity between each neighbouring pixel   and the centre pixel   under consideration is thresholded by applying a binary thresholding function s: The local binary pattern (LBP) is subsequently computed as: , for all  = 1, 2, … By applying the LBP operator to each pixel in an image with G grey levels, as indicated in Figure 3, the image is represented by LBPs ranging from 0 to G and these images are referred to as LBP images.
LBP feature extraction has only relatively recently been considered in mineral processing [22,23,35].

Construction of Principal Component Models.
Principal component analysis underpins traditional multivariate statistical process control.Essentially, the centred and scaled data matrix () that represents normal operating conditions (NOCs), consisting of  samples of  variables,  ∈ ℝ × is expressed as the product of a score matrix (  ∈ ℝ × ) and the transpose of a loading matrix (  ∈ ℝ × ): The subscript  denotes the number of principal components retained in the model, typically with  ≪ .  ∈ ℝ × is a residual matrix resulting from the approximation of the data matrix with  principal components.Specification of the hyperparameter K can be based on a number of statistical and heuristic methods.These include a value where the variance explained by the retained principal components would exceed a certain threshold (typically 80-90%), scree test, the Kaiser-Guttman test, partial correlation procedures, etc. [36,37].In this investigation, crossvalidation was used to determine K.
The loading matrix  is obtained by solving an eigenvalue problem, as represented by Equation ( 8): In Equation ( 8),  ∈ ℝ × is the covariance or correlation matrix of the variables, which are typically scaled to zero mean and unit variance, i.e., Moreover,   is the j'th eigenvalue associated with the j'th principal component (j = 1, 2, ... M) or loading vector (  ∈ ℝ ×1 ).
In the traditional multivariate process control framework, the Hotelling T 2 and Qstatistics can be used to monitor operations.In the special case, where  ≤ 3, monitoring can be based on 2-D or 3-D score plots: Hotelling's T 2 statistic is calculated in accordance with Equation (10), where   is the score of the i'th sample on the j'th principal component (j = 1, 2, … K), and   is the eigenvalue associated with the j'th principal component.The Q-statistic is calculated according to Equation (11), where   is the residual of the i'th observation of the j'th feature when reconstructed with the first K principal components: The 95% and 99% confidence limits on these charts are based on the 95th and 99th percentiles of the NOC data.New measurements above these control limits are considered out of control and would normally flag further diagnostic and corrective steps.
Application of the methodology is considered in the following two case studies.

Case Study 1: Monitoring of PGM Froths
The first case study is based on froth images obtained from a platinum metals group plant in South Africa.No other data regarding process conditions that are associated with the images have been recorded or are available, but even so, the data can serve as a basis for comparison of the different approaches.The NOC image set contained 300 images, while the new data set contained 295 images, and these will be referred to as NEW images.As can be seen from Figure 4, it is difficult to visually discriminate between the NOCs and NEW images, as the bubble size distributions appear to be the same and colour cannot be used as a basis for discrimination either.This is a typical problem encountered by plant operators responsible for monitoring the flotation process and even if the plant is not monitored by means of a principal component model, as is considered in this investigation, visualization of the features extracted from the froths could provide very useful decision support for operators.

Monitoring Based on GLCM Features
Gray scale co-occurrence matrices were constructed based on the frequency with which a pixel with intensity value i occurred horizontally immediately adjacent to a pixel with intensity value j, i.e., [0 1] in Cartesian coordinates.After optimization over the range G = [2 64], eight grey levels were used, as these gave the best results.Even so, across the range, these G levels did not have a particularly strong effect on the overall performance of the monitoring methodology.
The eigenspectrum of the principal component model based on the four GLCM features is shown in Figure 5.As can be seen from this figure, the first two principal components essentially captured all the variation in the features and hence K = 2 components were retained, which was also indicated by a cross-validation approach.Figure 6 is a bivariate scatterplot of the principal component scores of the GLCM features for the NOC and NEW data.These data sets are well separated in the figure.This separation could be further quantified by using a random forest model with the four GLCM features as predictors to classify the data as either NOC or NEW.The random forest could do so with an out-of-bag error of approximately 9%, as indicated in Appendix A.
Plots such as these could be used as a complementary means to track the flotation process, given that each marker in Figure 6 represents a froth image.This could serve as an aid to operators in steering the flotation process.Although the principal component model consisted of two components only, the Hotelling's T 2 and Q-statistics are still presented as a more general basis for monitoring purposes.These charts are shown in Figure 7 and show that approximately 75.3% of the new data could be flagged as out of control.

Monitoring Based on LBP Features
The optimal LBP features were obtained with a neighbourhood of 2 and 59 features were extracted.The eigenspectrum of the principal components of these features is shown in Figure 8. K = 3 principal components were retained in the model, as determined by cross-validation.This value is of necessity comparatively low, owing to the comparatively small size of the NOC data set in relation to the number of features.

Case Study 2: Monitoring of Platinum Froths Associated with Different Grades
In the second case study, the froth image database previously considered by Marais and Aldrich [38], as well as Horn et al. [39], was revisited.The image data were originally collected over a four-hour period on a primary cleaner cell on a South African industrial PGM plant.During collection of the images, the air flow rate of the cell was periodically varied.Concentrate samples were collected after stabilization of the flotation cell and afterwards, the PGM content was analysed in a laboratory.
The 256 × 256 pixel images represent four different operational regimes with relative platinum grades of 1, 0.464, 0.306, and 0.115, as indicated in Figure 9.The NOC image set consisted of froth images associated with a high grade of platinum, while the NEW1, NEW2, and NEW3 data sets consisted of froth images with progressively lower grades.Figure 10

Monitoring Based on GLCM Features
As before, four GLCM features were extracted from the images based on the use of G = 8 grey levels and a horizontal distance of 1 between pixels.The eigenspectrum of the extracted features is shown in Figure 11.With these features, the principal component model could flag approximately 44.7% of the new data as out of control, i.e., 18%, 50%, and 66% of the NEW1, NEW2, and NEW3 data, respectively, as indicated in Figure 12.To put this in perspective, a random forest model (Appendix A) using the GLCM features as predictors could discriminate between the NOCs data and the NEW1, NEW2, and NEW3 data respectively with accuracies of approximately 77%, 92%, and 98%.

Monitoring Based on LBP Vectors
Monitoring of the process was again based on 59 LBP vectors extracted from the images.K = 3 principal components were retained, the cumulative eigenspectrum of which is shown in Figure 13.
The diagnostic charts are shown in Figure 14.As can be seen from the Q-chart at the bottom of Figure 14, the model could identify NEW1, NEW2, and NEW3 process deviations with a reliability of 31%, 41%, and 47%.This could be compared with the ability of a random forest model (Appendix A) using the 59 LBP features as predictors to discriminate between the NOC data and the NEW1, NEW2, and NEW2 data, which was approximately 79%, 90%, and 98%.This was essentially the same as what could be obtained with the GLCM features.

Discussion and Conclusions
In this paper, it was shown that features extracted from froth images representative of normal operating conditions (NOCs) could be used in a traditional multivariate statistical process control framework, such as being based on the use of principal component analysis.
To this end, the physical interpretation of the features was not important, although it would be possible to link some of these features to the physical characteristics of the froth, such as the bubble size and shape distributions and the appearance of the froth in general.Such an analysis would also have to consider changes in the froth structures over time.
Both algorithms that were used to extract features from the images performed reasonably well, despite the wide disparity in the number of features that were used to characterize the froth images.Nonetheless, there appeared to be considerable scope for improvement.First, it should be noted that the small sizes of the NOCs image data sets placed significant constraints on the use of some of the state-of-the-art approaches to image analysis, particularly methods based on deep learning, such as those considered by Fu and Aldrich [40,41] for example.Larger NOCs data sets would therefore need to be considered in future studies.
In addition, extension of the linear principal component model to any number of nonlinear versions could also lead to potentially considerable improvement in the results.
Finally, if these image features are used in combination with other process variables, further diagnostic methods would add critical additional functionality to the model and could be readily incorporated into the model.In practice, this would be important in order to steer an out-of-control process back to an in-control state.
Case Study 2-Discrimination between NOC and NEW1, NEW2 and NEW3 data sets with Random Forest Models

Figure 1 .
Figure 1.Multivariate process monitoring based on flotation froth image analysis.

Figure 2 .
Figure 2. A co-occurrence matrix shows the frequency of every particular pair of grey levels in pixel pairs, separated by a certain distance and direction, as specified by Cartesian coordinates (u,v).

Figure 3 .
Figure 3. Example of a centre pixel (shaded), with its eight neighbouring pixels (left), the values obtained through binary thresholding (middle) and the conversion weights by which the thresholded val-ues are multiplied to give the decimal LBP value shown in place of the centre pixel (right).

Figure 4 .
Figure 4. Images associated with NOCs (top, left to right) and test images (bottom, left to right).

Figure 5 .
Figure 5. Cumulative eigenvalue plot of GLCM features in case study 1.

Figure 6 .
Figure 6.Principal component score plot of four GLCM features with 8 grey levels in case study 1.Blue circles and red squares indicate NOC and NEW data, respectively.

Figure 7 .
Figure 7. Hotelling's T 2 (top) and Q-chart (bottom) of GLCM features in case study 1.The 95% and 99% confidence limits are indicated by dashed and dotted lines in each plot.

Figure 8 .
Figure 8. Eigenspectrum of the LBP features of the NOCs froth images in case study 1.The diagnostic charts generated by the principal component model are shown in Figure 9. Interestingly, the LBP feature set could flag approximately 65.4% of the NEW data as out of control at a 95% confidence level.

Figure 9 .
Figure 9.The Hotelling T 2 (top) and Q-chart (bottom) showing NOCs and new data, based on the use of LBP features.The 95% and 99% confidence limits are indicated by dashed and dotted lines in each plot.

Figure 10 .
Figure 10.Images associated with NOC and NEW1, NEW2, and NEW3 operational regimes in case study 2. Relative platinum grades are indicated in parentheses.

Figure 11 .
Figure 11.Eigenspectrum of the GLCM features of the NOC froth images in case study 2.

Figure 12 .
Figure 12.The Hotelling T 2 (top) and Q-chart (bottom) showing NOCs and new data (NEW1 with indices from 101 to 200, NEW2 with indices from 201 to 300 and NEW3 with indices from 301 to 400), based on the use of GLCM features in case study 2. The 95% and 99% confidence limits are indicated by dashed and dotted lines in each plot.

Figure 13 .
Figure 13.Cumulative eigenspectrum of the principal component model based on the retention of 3 of 59 LBP features in case study 2.

Figure 14 .
Figure 14.Hotelling's T 2 chart (top) and Q-chart (bottom) showing the NOCs (blue), NEW1 (red), NEW2 (green) and NEW3 (magenta) data for local binary patterns in case study 2. The 95% and 99% confidence limits are indicated by dashed and dotted lines in each plot.

Table A2 .
Random forest model parameters used in Case Study 2. Out-of-bag (OOB) prediction errors of three separate random forest models trained on GLCM and LBP features from the NOC data in Case Study 2 and used to discriminate between the NOC and NEW1, NEW2 and NEW3 data.