A Method for Quality Control of Bauxites: Case Study of Brazilian Bauxites Using PLSR on Transmission XRD Data

Available Alumina (AvAl2O3) and Reactive Silica (RxSiO2), the main parameters of bauxite controlled in the beneficiation process are traditionally measured by laborious, expensive, and time-consuming wet chemistry methods. Alternative methods based on XRD analysis, capable to provide a reliable estimation of these parameters and valuable mineralogical information of the ore, are being studied. In this work, X-ray diffraction data in transmission mode was used to estimate AvAl2O3 and RxSiO2 from Brazilian bauxites using the Partial Least Square Regression (PLSR) statistical tool. The proposed method comprises a routine of sample classification according to their similarities by Principal Component Analysis (PCA) and K-means, calibration of the PLSR model for each group of samples, grouping new bauxite samples according to the generated clustering model, and subsequent estimation of the parameters AvAl2O3 and RxSiO2 using the PLSR models for these samples. The results showed good accuracy and precision of the models generated for samples of the main ore lithology. The quality and pre-processing of the XRD data required for this method are discussed. The results demonstrated that this method has the potential to be industrially applied to quality control of bauxites as a rapid and automated procedure.


Introduction
Bauxite is the main aluminum ore with global resources estimated to be 55-75 billion tons. Brazil holds the 4th largest reserve and produces annually 35 million tons of bauxite, mainly to produce smelter grade alumina [1].
The main aluminum-ore mineral present in Brazilian lateritic bauxite is gibbsite (known as available alumina-AvAl 2 O 3 ), and therefore, these bauxites are processed in low-temperature digestion (LTD) conditions (100-150 • C) [2,3]. In this context, among the silicon-bearing minerals, only kaolinite is leached in the Bayer process. This gangue mineral is well known in the industry as reactive silica (RxSiO 2 ) since it rapidly and undesirably reacts with the sodium hydroxide solution releasing Na 2 SiO 3 to the pregnant liquor (Equation (1)), which must be precipitated as zeolitic phases known as DSP (desilication product) even during the digestion stage (Equation (2)) [4]. These neoformed products significantly affect the costs of the process, either because they dictate the time and influence the temperature of digestion, but mainly because of the loss of caustic soda, making it, in many cases, economically unfeasible to process bauxites with RxSiO 2 > 5% [3][4][5]. 3Al 2 Si 2 O 5 (OH) 4 (s) + 18NaOH (aq) → 6Na 2 SiO 3 (aq) + 6NaAl(OH) 4 (aq) + 3H 2 O (l) (1) 6Na 2 SiO 3 (aq) +6NaAl(OH) 4 (aq) + Na 2 X (aq) → Na 6 (Al 6 Si 6 O 24 )Na 2 X (s) + 12NaOH (aq) + 6H 2 O (l) (2) In the mineral industry, it is common to have quality control and process parameters based on chemical data instead of mineralogical. It is mainly due to the consolidation and availability of quantitative chemical analysis using wet methods, while methods for mineralogical determination are still under development. In the aluminum industry, it is no different. Quality control of ore in the mine and the Bayer process is done almost exclusively in terms of its chemical composition. Thus, samples from geological research, beneficiation, and Bayer process feedstock are analyzed to determine the content of available alumina (AvAl 2 O 3 ) and reactive silica (RxSiO 2 )-traditionally determined by wet chemistry [2,6,7].
Paz et al. [2] report that the RxSiO 2 content determined by such methods can be underestimated, depending on the content and the degree of crystallinity of the kaolinite in bauxite, which may change significantly over the bauxite profile. This means that the clay mineral present can be more reactive to the process, despite its concentration [2,8]. Thus, there is no guarantee that simple knowledge of the chemical composition of bauxite will allow efficient control in metallurgical processes [9]. Another downside of traditional methods is that they are time-consuming, demand manpower and space, and involve the handling of dangerous reagents [10,11].
In this context, several methods based on the mineralogical composition of the ore, obtained by X-ray diffraction (XRD) are being developed as an alternative for the process control in the bauxite and alumina industry. These methods are, in general, based on powder XRD data using Rietveld refinement [6,[10][11][12][13][14][15][16][17] and multivariate statistics [18][19][20][21]. Feret [13] states that XRD has become a fundamental and irreplaceable tool in the control of raw materials of the aluminum industry. The advent of high-speed XRD detectors have enabled a fast data collection and, consequently, the development of rapid and accurate methods, as they use the whole XRD pattern, reducing the effect of preferred orientation and reflection extinction and even mitigating the inaccuracies due to amorphous content [10,11,[13][14][15]. König et al. [10] demonstrated the mineralogical quantification of certified bauxite samples from several countries. Aylmore and Walker [14] and Nong et al. [15] also applied Rietveld-XRD for the quantification of Australian lateritic and Chinese karstic bauxites, respectively. Applications of powder XRD to quantify Brazilian bauxites from Paragominas and Juruti (northern Brazil) were also studied by Angélica et al. [6] and Negrão et al. [16], respectively. Feret and See [17] reported a bauxite analysis by XRD using synchrotron radiation to improve mineralogical quantification.
Principal Component Analysis (PCA) and Partial Least Square Regression (PLSR) are two statistical methods widely used in the chemometric field [22][23][24]. Viscarra Rossel et al. [25,26] demonstrated the use of PLSR from UV-Vis and infrared data to predict various soil properties (such as pH, organic carbon (OC), cation exchange capacity (CEC), etc.) and to determine the composition of mineral-organic mixtures in soils, while PCA was used to compare the synthetic mixtures with respective soils. Olatunde [27] reported excellent results using PLSR on infrared data to estimate the extractible total petroleum hydrocarbon (ETPH) in soils. The author highlights the accuracy and rapidness of this method. PLSR was also used on Energy dispersive X-ray fluorescence (EDXRF) data to predict some soil parameters (CEC, sum of exchangeable bases (SB), and base saturation percentage (BSP)) [28]. From XRD data, König et al. [29] demonstrated the utilization of PLSR for quality control of iron ore sinter as a reliable, easy and rapid method in contrast to wet chemistry.
Melo et al. [19,20] developed a methodology using PLSR on XRD data (reflection geometry), applied to estimate the bauxite quality control parameters. The authors reported that the estimation of AvAl 2 O 3 and RxSiO 2 obtained were in good agreement with the reference and within the acceptable limits of precision (<1.0-1.5% and <0.5%, respectively) [30]. However, it was observed that in samples of marginal ore lithologies with higher kaolinite content and degree of crystallinity (low defects kaolinite), the method does not meet the precision limits, probably due to the preferred orientation effect from manual sample preparation. To overcome this issue, this study aimed to use XRD data in transmission mode following a methodology similar to that of Melo et al. [19] applied to Brazilian gibbsitic bauxites. It is worth mentioning that the proposed method has several advantages over the traditional methods: rapidness, can be completely automated, no chemical reagents are required, and the ore mineralogy can also be monitored providing relevant information to the process.

Materials and Methods
The bauxite samples were provided by Mineração Paragominas SA (Norsk Hydro) and correspond to a drilling campaign on the Miltonia 3 plateau, Pará state, northern Brazil [31]. In this study, 105 samples were used, corresponding to four lithologies: Nodular Bauxite (BN), Nodular-Crystalized Bauxite (BNC), Crystalized Bauxite (BC), and Crystalized-Amorphous Bauxite (BCBA). Details of this lithological profile and sample preparation can be found in Silva et al. [32] and Melo et al. [19]. Figure 1 depicts a schematic representation of the Miltonia location and geological profile. The powder XRD data were collected using a diffractometer (Empyrean, Panalytical, Almelo, The Netherlands), Co X-ray tube (Kα 1 = 1.789 Å), Fe Kβ filter, and PIXel3D 2 × 2 area detector (linear scanning mode) with an active length of 3.3473 • 2θ (255 channels). The following conditions of data collection were used: Transmission mode; 40 kV and 35 mA; soller slit of 0.04 rad; fixed divergent and anti-scattering slits of 1/8 • ; 0.066 • 2θ step-size; 22.96 s of time/step and scanning range from 5 • to 70 • 2θ. The step-size was defined based on Melo et al.'s [14] optimization conditions. Diffractograms were evaluated using the software HighScore Plus 4.8 (Panalytical, Almelo, The Netherlands).
Each sample was assembled in the sample holder and analyzed in duplicate by XRD. To perform the PCA, K-means and PLSR analyses, XRD data were used as dataset. Thus, all diffractograms are organized as an m × n matrix, where m (rows) are the bauxite samples and n (columns) are the intensity count value for each • 2θ step of the XRD measurement for the respective sample. Here, the complete XRD pattern is taken as dependent variables, resulting in 984 features for modeling [19].
PCA was carried out to identify possible outliers and samples with mineralogical similarity; and K-means clustering algorithm (with k = 3, considering Euclidean distance measure) was used to group the samples with similarities (the clusters were named as C1, C2, and C3).
The samples classified in each cluster were randomly divided into two subsets: a calibration set (containing~70% of the samples) and a test set (~30% of the samples). The samples from the calibration set were used to build the PLSR models. This statistical algorithm is particularly suitable for handling multi-collinear data, and an interesting alternative for predicting relevant information Y (obtained from expensive, difficult, or timeconsuming measurements-e.g., wet chemistry) from X data (in general, cheap, easy, or fast measurements-e.g., XRD, Fourier Transform Infrared Spectroscopy (FTIR)) [18,22,33]. Thus, in this study, the content of AvAl 2 O 3 and RxSiO 2 (from wet chemistry) was predicted by using XRD data.
A "leave-one-out" cross-validation was used to find the best number of factors to include in the models and the Root Mean Square Error of Prediction (RMSEP, Equation (3)), Ratio of Prediction Deviation (RPD, Equation (4)), and Relative Error (RE, Equation (5)) were used to assess the performance of the models. Figure 2 shows the X-ray diffractograms of all the bauxite samples used in this study collected by transmission mode. It can be noted that the bauxites of the four lithologies have the same mineralogical composition. The main phase is gibbsite (d 002 = 4.85 Å and d 110 = 4.37 Å). In general, the only SiO 2 mineral identified is kaolinite (d 001~7 .14 Å and d 002~3 .58 Å). Quartz (d 101 = 3.34 Å) may be present in some samples, but in minor content. Hematite (d 104 = 2.69 Å) is observed as the main iron mineral, with intensity varying significantly among the lithologies, and Al-goethite (d 101 ranging from 4.18 Å to 4.14 Å) is also observed, usually as a broad peak due to variations in the isomorphic substitution of Al in the structure [12,34]. Anatese (d 101 = 3.52 Å) is also present in all samples.

XRD Data
Layered minerals (such as clay minerals) tend to orient themselves strongly during samples' assembling in the sample holders for XRD analysis. Thus, for those samples rich in kaolinite and/or gibbsite, it is common to observe high intensities of the basal reflections (d 00l ) in detriment of the other reflections of the XRD pattern [35]. This effect is believed to be the major source of error in quantitative analysis based on XRD data [14,36]. It is interesting to note that this deleterious effect was avoided using the transmission mode, as evidenced by the intensity ratio of the peaks d 110 and d 002 of the gibbsite (~50%). For comparison, the same samples were analyzed by reflection mode with manual sample holder assembling [14], resulting in a ratio d 110 /d 002 of only~8%, a very low value considering the scale factor of this phase. At low angles, the noise is significant, although this mode of data collection allows a better resolution of possible peaks in this region of the diffractogram.  Figure 3 presents the score-plots for the first three principal components. As noted, the principal components PC-1, PC-2, and PC-3 explain, respectively, 35%, 22%, and 12% of the data variability (only 69% of the explained variance). Even considering 8 components, the explained variance remains lower than 75%. In contrast, Melo et al. [19] achieved 98% of the explained variance with only two components by using XRD reflection data. This shows that, although the preferred orientation effect was mitigated, the conditions of data collection by transmission mode used in this work resulted in a significant reduction in the intensities, which in turn, reduced the sensitivity of the statistical treatment in finding significant factors to reduce the data dimensionality. It is observed that no clustering is evidenced, even for those samples of the same lithology [8]. In this context, a K-means method was used to group the samples for further PLSR prediction. Figure 4 shows the PCA score-plot with the three clusters obtained (C1, C2, and C3). Although samples from the same lithology were classified into different clusters, C1 mostly contains BC; BNC and BCBA were mainly grouped in C1 and C2, while most BN samples were grouped in C3.

Prediction of AvAl 2 O 3 and RxSiO 2 by PLSR
Once the samples were classified into C1, C2, and C3 clusters, the respective sample calibration sets were used to build the PLSR models. Melo et al. [19] achieved an optimized condition of XRD data collection by increasing the step-size from 0.026 • to 0.065 • (2θ) and reducing the 2θ range up to 13-34 • , such optimized condition for reflection mode allowed a less than 1 min XRD scan time. As observed in the PCA loadings-plot ( Figure 5), using transmission data, the full pattern is relevant to extract the latent variables, therefore, in this study, the diffractograms were reduced only to the 13-65 • (2θ) interval, just removing the background noise. This treatment resulted in a scan time of 1 min 15 s. Comparing to the traditional wet chemistry in which analyses can take 3-8 h, the use of PLSR on XRD data is much faster, being able to provide quick feedback to the process for decision-making.
After calibrating the models, defining the best pre-processing method for the dataset (mean-centered or standardized) and the number of factors to be included in the models through cross-validation, each sample in the test set was classified into one of the three clusters and then the parameters of bauxite quality control-AvAl 2 O 3 and RxSiO 2 were predicted using the respective models. It can be observed in Figure 6 that there is a good fit of the predicted values for both parameters, mainly in those samples classified in C1.
Although the predicted mean values are close to the reference values, the models C2 and C3 showed a precision slightly lower than the acceptable limits for the quality control of bauxites [30]. The parameters that indicate the performance of the models are summarized in Tables 1 and 2 for AvAl 2 O 3 and RxSiO 2 , respectively.
A mean of residuals (mean of the difference between reference and predicted) close to zero denotes that the models present a good accuracy. The RMSEP denotes the precision of the model in the same unit as the predicted parameters (%AvAl 2 O 3 and %RxSiO 2 ). Thus, a model with high precision presents lower RMSEP. In terms of bauxite quality control, a precision of <1.0-1.5% for AvAl 2 O 3 and <0.5% for RxSiO 2 [14,25] is usually required. Feret [30] argues that these numbers are sometimes difficult to attain in the industrial practice using traditional wet chemistry methods.
The RPD indicates how well the model performs compared to using only the average of the original data [26]. Some authors argue that RPD < 1.0 denotes a very poor model, 1.0 ≤ RPD < 1.4 a poor model, 1.4 ≤ RPD < 1.8 a fair model, 1.8 ≤ RPD < 2.0 a good model and RPD ≥ 2.0 an excellent model [26][27][28]. It can be observed that the models C1 and C2 for both AvAl 2 O 3 and RxSiO 2 performed well with RPD~2.0. It is interesting to note that, although the RMSEP of C3 model is high, it presented RPD = 2.9 which means an excellent model to predict AvAl 2 O 3 , denoting that the samples in this cluster have a wide range of AvAl 2 O 3 content (min: 38.57%, max: 52.79%), and therefore, the model is sensitive to variations and capable of predicting this parameter satisfactorily.   It is interesting to note that the best model (C1) mostly contains samples of the main ore lithology (BC), denoting that the method is suitable for quality control. In contrast, the worst model (C3) is mainly related to the samples of the marginal ore (BN, generally considered as gangue). This bauxite lithology presents the highest kaolinite content and the lowest gibbsite content in the Miltonia 3 bauxite profile. The low degree of crystallinity of the kaolinite in this lithology affects the XRD profile [13,14]. Melo et al. [19] also observed a lower precision for this material that could be related to the preferred orientation. As this effect was eliminated using transmission mode, the low precision of the C3 model may be related to the wide range of AvAl 2 O 3 and RxSiO 2 content in the sample group or another unknown effect. It also may represent a limitation of this method (or an optimization point), that is, it is not suitable for geological survey applications; where it may be present, samples that strongly deviate from the ore in terms of mineralogical composition and phases' content.
Feret et al. [37] state that methods of phase quantification based on regression may successfully be used in bauxite exploration, however, they are deposit-specific. Similarly, the method presented in this work (and also in Melo et al. [19]) represents a case study with lateritic gibbsitic bauxite from the Miltonia plateau. Although König and Norberg [18] reported satisfactory prediction using a generic PLSR model for bauxites from different locations, the results suggest that changes in the ore's mineralogy, related only to the concentration and crystallinity of the phases, may impact prediction. Nevertheless, it is believed that the methodology can be easily adapted to bauxites of different origins, in particular, the Amazon lateritic bauxites (Paragominas plateau Miltonia 5, Juruti, Trombetas and Rondon do Pará) with similar mineralogy and containing only gibbsite as an aluminumbearing mineral [6,12,16,38].
As depicted in Figure 7, the coefficients and factors of each PLSR model can be plotted in relation to the • 2θ position, allowing to interpret the obtained model in terms of the XRD pattern of the clustered bauxite samples. It is observed that the largest coefficients of the AvAl 2 O 3 models (C1, C2, and C3) are negatively correlated with kaolinite basal reflections and positively correlated with gibbsite (d 110 ), whereas basal gibbsite reflection (d 002 ) has greater weights on the factor loadings. The asymmetry shape of this reflection, however, was revealed as at least two generations of gibbsites. Similar results were observed by Melo et al. [19], and according to König et al. [10] and Negrão et al. [16], it could be associated with aluminum-rich horizons, where along the bauxite profile, well-formed coarse gibbsite crystals fills microvoids, and a new generation of fine, poor-crystalline gibbsite is dispersed in the matrix. Interestingly, Al-goethite (d 110 ) has a high impact on model coefficients (highlighted area in Figure 7a,c). The respective broad peak area in C1 and C2 coefficients denote the presence of several %Al-substitution in the goethite structure. It is also noted that, in the C3 model, a wide area from 15-20 • 2θ presented high coefficients. This area, however, has no XRD reflection associated, and therefore, may be related to the amorphous in these samples. It was not possible to quantify the amorphous, but it is assumed that this is more evident in the overlying lithologies, in particular, BNC and BN due to different laterization cycles and the greater presence of neoformed minerals [8,16]. This assumption is in agreement with the results since BN samples were mainly grouped in C3, and is probably related to the relatively low prediction of this model.
On the models for RxSiO 2 , higher coefficients related to kaolinite and Al-goethite were also observed, however, positively. For models with standardized datasets (C1 and C3), the basal reflections of kaolinite have greater impact on loadings, while in the C2 model (mean-centered), the d 002 peak of gibbsite had greater weight. A double peak referring to the d001 of kaolinite was observed for the coefficients of models C2 and C3, which must also be associated with kaolinite generations with different degrees of crystallinity. Melo et al. [8] demonstrated that this difference actually occurs, so that kaolinites from the overlying lithologies are less ordered and, consequently, more reactive to the Bayer process. It is interesting to note that, although this significantly impacts processing costs, this information is not known by the industry in the context where all the quality control of bauxite is by traditional wet methods. The variability of kaolinite crystallinity in these two clusters may be associated with the reduced accuracy of the respective models.

Conclusions
In this work, the use of a method for quality control of bauxites based on statistical tools on XRD transmission data was evaluated. The method comprises classifying bauxite samples by PCA and K-means according to their latent mineralogical characteristics, building PLSR models for each sample group, and using these models to predict the AvAl 2 O 3 and RxSiO 2 parameters in new samples.
The samples were classified into three clusters (C1, C2, and C3) and the respective models were evaluated in relation to the wet chemistry reference values. The C1 model presented satisfactory accuracy and precision for both parameters. The RMSEP of 0.85% (AvAl 2 O 3 ) and 0.49% (RxSiO 2 ) attain the required limits (1.0-1.5% and 0.5%, respectively). The C2 and C3 models, related to marginal ore lithologies presented satisfactory accuracy but low precision.
These results also indicate that, although the preferred orientation was eliminated using the XRD transmission data collection, there was no incremental improvement compared with the PLSR models obtained with reflection data [14].
These results cleared showed that the methodology can be applied for quality control in the beneficiation plant, but not suitable for geological survey applications. It is worth mentioning that this method presents several advantages over traditional wet chemistry, mainly due to its speed (less than 5 min to run XRD analysis and obtain the prediction), ease of being completely automated, and no dangerous chemical reagents are required.

Data Availability Statement:
The data presented in this study is contained within the present article.