Prediction of Soluble Al2O3 in Calcined Kaolin Using Infrared Spectroscopy and Multivariate Calibration

: In the production of calcined kaolin, the soluble Al 2 O 3 content is used as a quality control criterion for some speciality applications. The increasing need for automated quality control systems in the industry has brought the necessity of developing techniques that provide (near) real-time data. Based on the understanding that the presence of water in the calcined kaolin detected using infrared spectroscopy can be used as a proxy for the soluble Al 2 O 3 measurement, in this study, a hand-held infrared spectrometer was used to analyse a set of calcined kaolin samples obtained from a production plant. The spectra were used to predict the amount of soluble Al 2 O 3 in the samples by implementing partial least squares regression (PLS-R) and support vector regression (SVR) as multivariate calibration methods. The presence of non-linearities in the dataset and the different types of association between water and the calcined kaolin represented the main challenges for developing a good calibration. In general, SVR showed a better performance than PLS-R, with root mean squared error of the cross-validation (RMSECV) = 0.046 wt % and R 2 = 0.87 for the best-achieved prediction. This accuracy level is adequate for detecting variation trends in the production of calcined kaolin which could be used not only as a quality control strategy, but also for the optimisation of the calcination process. only adsorbed but also chemically bound and free water) hinders a univariate or linear relation with the amount of soluble Al 2 O 3 . This study developed multivariate calibration models for the prediction of the content of soluble Al 2 O 3 in calcined kaolin clays by using infrared spectroscopy. A combination of factors that affect the achievement of a good prediction was used to develop testing scenarios: (1) spectral regions where the water features are present between 2.6 to 3.3 µm and 5.7 to 6.5 µm; (2) standard normal variate (SNV) and continuum removal (CR) as spectral processing strategies to improve the discrimination among the different forms in which water is present in the samples; (3) partial least squares regression (PLS-R) and support vector regression (SVR) as multivariate calibration methods that cope with non-linearity. In general, the SVR models demonstrated a better ability to relate the water features to the soluble Al 2 O 3 parameter than the PLS-R ones. The best-performing model was achieved by using SVR in the 5.7 to 6.5 µm range after CR processing. The SVR model predicts the soluble Al 2 O 3 content in the calcined clay with RMSECV = 0.046 wt % and R 2 = 0.87. Even though this level of accuracy is lower than that one of the standard operational procedure (SOP), it is suitable for detecting variation trends in the production of calcined kaolin. These results encourage the use of infrared spectroscopy as a technique for (near) real-time quality control that supports the optimisation of the calcination process.


Introduction
High-grade kaolin is mainly composed of the mineral kaolinite [Al 2 Si 2 O 5 (OH) 4 ], and is commonly thermally treated for industrial purposes [1]. This process aims to tailor the physical and chemical properties of the original mineral to specific market specifications. When kaolinite is calcined to high temperatures (above 600 • C), it dehydroxylates and transforms into amorphous metakaolinite [Al 2 Si 2 O 7 ] (Equation (1)), which has typically high chemical reactivity. Calcination to temperatures around 980 • C transforms the metakaolinite into the spinel phase (Al-spinel [Al 2 O 3 ] and Si-spinel [SiO 2 ]) (Equation (2)), followed by the nucleation of the spinel and transformation into mullite (Equation (3)), which is characteristically hard and abrasive [2][3][4].
For applications in the pharmaceutical industry, natural kaolin is calcined to temperatures around 1100 • C in industrial furnaces to generate a product in an intermediate stage between the amorphous and crystalline spinel phase. This stage marks the balance point between low reactivity and low abrasiveness that is required for calcined kaolin products [5]. In the industry, the criterion to assess the extent of the calcination reaction is the chemical extraction of soluble Al 2 O 3 . This method makes use of the solubility of the alumina in the spinel phase (γ-alumina) in strong acids [6][7][8] to estimate the reactivity of the calcined kaolin [9,10]. Consequently, the extraction of soluble Al 2 O 3 is employed as the standard operational procedure (SOP) to determine the quality of the calcined product. However, this is a laboratory and time-consuming method that restricts timely operational feedback, thus limiting the opportunities for the on-line optimisation of the calcination process.
The development of an on-site and (near) real-time method that can support the SOP and can be used as an on-site quality control measurement would benefit the production of calcined kaolin [11]. Other researchers have developed different strategies for controlling the extent of the calcination process inside a multiple hearth furnace (MHF). For example, Thomas et al. [5] investigated the residence time of kaolin in the calciner to improve the consistency of the properties of the calcined product; Eskelinen et al. [12] modelled some of the furnace's variables and related them to the physical-chemical phenomena taking place during calcination. However, these studies did not address a mechanism for quality control of the calcined product directly. Recently, Guatame-Garcia et al. [13] investigated the relationship between the infrared spectra of the kaolin calcination reaction and the changes in the calcined clay's reactivity, measured as soluble Al 2 O 3 . Infrared spectroscopy is particularly useful in the characterisation of the kaolin calcination reactions. The spectral features describe changes in the mineral structure related to the kaolinite dehydroxylation (Equation (1)) and recrystallisation of the spinel and mullite phases (Equations (2) and (3)) [14][15][16]. Besides these transformations, the results reported by [13] showed an inverse correlation between the adsorption of water by the calcined clay as detected by the water-related spectral features, and the soluble Al 2 O 3 content. Since the spectral ranges that exhibit the spectrum of water can be measured by using portable and hand-held spectrometers, infrared spectroscopy could be used as an on-site and (near) real-time proxy for the SOP.
To be able to correlate the spectra of calcined kaolin samples-and in particular the water-related spectral features-with the soluble Al 2 O 3 content and establish quantitative predictions for unknown samples, it is necessary to use a chemometric approach [17]. Chemometrics has been used, for example, in applications for mineral processing and control of zinc [18], coal [19,20], and petroleum oil [21]. The correlation between spectral data and the mineral characteristics is commonly done by using multivariate calibration methods, among which partial least squares regression (PLS-R) [22] is often used as the standard methodology. However, in the presence of non-linearities or complex datasets, it is not possible to implement PLS-R. In these cases, multivariate non-linear models such as support vector regression (SVR) should be used along with spectral processing strategies [23,24]. SVR models operate in a kernel-induced feature space that facilitates non-linear modelling with a good performance, even for relatively small datasets, leading to more robust and accurate predictions [21,25]. The use of PLS-R and SVR calibrations can generate a method for the quantification and prediction of soluble Al 2 O 3 in calcined kaolin products based on their infrared spectra.
In this study, spectral processing strategies such as standard normal variate (SNV) and continuum removal (CR), combined with PLS-R and SVR regression are applied on the infrared spectra of calcined kaolin samples to determine their soluble Al 2 O 3 content. Different combinations of spectral processing strategies and model calibration parameters are tested on the spectral regions that exhibit water features. The performance of the generated models is compared and discussed in the view of the use of an infrared-based prediction model as a quality control strategy in the production of calcined kaolin.

Calcination of Kaolin and Samples
The samples used in this study were obtained from a kaolin processing plant where natural kaolin is calcined in an industrial multiple hearth furnace (MHF) [12]. The production of calcined kaolin for pharmaceutical applications is performed using the soak calcination method, which involves exposing the kaolin to high temperatures for a prolonged amount of time to guarantee complete calcination. The MHF consists of eight vertical hearths with gas flows as a source of energy. The calcination temperature is controlled by varying the gas flow at determined hearths. The raw kaolin is fed at the top of the first hearth and flows spirally through the furnace moved by rabble arms until it reaches the lowest hearth. Finally, the calcined kaolin is extracted through exit holes at the bottom of the furnace. In the calciner's operational set-up, the initial and maximum final temperatures are respectively 500 and 1100 • C, with a residence time of approximately 35 min.
The collected calcined powders are cumulative and homogenised samples taken over a period of 12 h (one shift) after blast-cooling. Since the water content in the samples was one of the critical parameters investigated for analysis, two sampling campaigns were conducted during different seasons to ensure that the possible influence of environmental factors was also considered in the measurements. The first sampling campaign occurred during the summertime under local high precipitation and humidity conditions, whereas the second campaign was carried out in autumn, when the precipitation levels are low. In the first period, only samples from the day shift were collected over 23 consecutive days. In the second period, both day and evening shift samples were collected during 17.5 consecutive days, for a total of 56 samples. Approximately 1 kg of sample was taken for every shift. Table 1 presents the average chemical composition of the samples.
X-ray diffraction (XRD) analysis was performed to identify the mineral phases that can influence the infrared spectra. The XRD patterns were collected with a Bruker D8 Advance diffractometer (PANalytical, Almelo, The Netherlands) featuring Bragg Brentano geometry using a Cu-Kα radiation of 45 kV and 40 mA on powders placed on a PMMA holder L25. The XRD spectra were measured with a coupled θ−2θ scan ranging from 10 to 110 • 2θ (step size: 0.03 • 2θ, time per step: 1 s ). The XRD patterns displayed in Figure 1 show that the calcined kaolin is mostly amorphous, with crystalline phases related to illite and quartz. Diffraction peaks characteristic of Al-spinel (γ-alumina) and mullite are also present.  The amount of soluble Al 2 O 3 present in the calcined samples was measured following the conventional method used in the industry, as reported by Taylor [9] and Thomas [10]. The extraction of soluble Al 2 O 3 was done by diluting 0.1 g of sample into 10 mL of concentrated (16 M) nitric acid (Analar grade) for 4 h. Inductively coupled plasma atomic emission spectroscopy (ICP-AES) determined the concentration of Al 2 O 3 in solution. These analyses were performed at the plant's laboratory using a Thermo Electron Iris-AP emission spectrometer (TJA Solutions, Windsford, UK); the overall error of the method is reported as 0.015 wt %.

Infrared Spectra Collection and Processing
The infrared spectra were collected with an Agilent 4300 hand-held Fourier transform infrared (FTIR) spectrometer (Edinburgh, UK) using a diffuse reflectance interface and coarse silver calibration. The spectra were measured in the range from 5200 to 1250 cm −1 (1.9 to 8.0 µm), with a spectral resolution of 4 cm −1 and 128 scans per measurement. A petri dish was filled with each of the powder samples; the surface was first compacted and flattened with a spatula to minimise void spaces and then slightly roughened to maximise the direction of the reflections. Five spectral measurements per sample were taken at different spots of the surface. For consistency with previous studies [13], the units were converted from wavenumbers (cm −1 ) to wavelengths (µm) during the data import. For every sample, the respective five spectral measurements were averaged as means of noise reduction, generating one spectrum per sample. The resulting spectra were smoothed using the Savitzky-Golay (SG) filter [26] with polynomial order of 3 and windows size of 55 data points. Spectral subsets were made on the regions that exhibit features of molecular water, namely between 2.60 to 3.30 µm and 5.70 to 6.50 µm.
The spectral processing strategies used in this study are standard normal variate (SNV) and continuum removal (CR). SNV removes multiplicative interferences caused by scattering and particle size, and corrects shifts in the baseline of the reflectance spectra of powders using a second-degree polynomial regression [27]. CR normalises the albedo of the reflectance curve, which is also known as continuum. It can be modelled as a mathematical function to isolate specific absorption bands [28]. Both signal processing strategies were computed independently for each spectral subset in the R environment using the prospectr package [29].

Partial Least Squares Regression
Partial least squares regression (PLS-R) was developed with the aim of maximising the correlation between the information variables and the parameters to be quantified [30]. In regression problems where the variables largely exceed the number of observations, PLS-R decomposes the variables into orthogonal scores and loadings and performs the regression to the scores. The PLS-R latent variables (LVs), which are analogous to the components in principal component analysis (PCA), describe the variability of the input data that is relevant for the determination of the parameters to be predicted [22]. Cross-validation determines the number of LVs that are optimal for the regression. Even though PLS-R is best suited to linear systems, it is also able to cope with mild non-linearities.
For the development of the PLS-R models in this study, the information variables or predictors corresponded to the reflectance value for each wavelength in the infrared spectra, whereas the parameter to be quantified corresponded to the laboratory measurement of soluble Al 2 O 3 . In a second experiment, chemical components of the samples (K 2 O, Fe 2 O 3 , and MgO) were also added as input variables. The number of LVs was determined by using 10-fold cross-validation.

Support Vector Regression
Cortes and Vapnik [31] initially established support vector machines (SVMs) to perform binary classification and pattern recognition. After further developments, SVMs have demonstrated to be a powerful technique, particularly in non-linear systems. For solving regression problems, SVMs are implemented via support vector regression (SVR) using the epsilon-insensitive SVR (ε-SVR) application, which is extensively described in the literature [23,24,32]. ε-SVR aims to find a function f(x) where the errors or deviations from a target in the training data are not larger than a given ε value, and that is as flat or smooth as possible. For solving the regression problem, the input data is first mapped into a high dimensional feature space, and kernel functions are used to impart linearity to the dataset. The ε-SVR function makes use of an ε-tube that defines the margins where deviations are tolerated; a constant C represents the cost parameter, which determines the trade-off between the flatness of the function and the tolerance in the deviations, assigning greater penalty on the error of the samples outside the ε-tube. The ε-insensitive loss function ignores the errors inside the ε-tube and calculates the loss for the data points outside the ε-tube based on the distance between the data point and the ε boundary. All the data points that contribute to the regression are the support vectors (SVs). The analyst should optimise the parameters ε and C.
For the development of the ε-SVR models, the input vectors corresponded to the spectra of the samples, and the response vector corresponded to the measured soluble Al 2 O 3 . The ε-SVR models were developed using the RBF (radial basis function) kernel function. The parameters ε and C were optimised using a grid search and 10-fold cross-validation.
In this study, for developing the PLS-R and SVR multivariate calibration models, the samples were split into a calibration set (n = 46) and a validation set (n = 10) by random selection. Each spectral subset (2.6 to 3.3 µm and 5.7 to 6.5 µm) was tested under raw, SNV, and CR spectra, for a total of six different test scenarios. All the analyses were performed in the R environment. PLS-R was carried out with the built-in routines in the PLS-R package [33]. For the SVR regression, the interface to LIBSVM [34] in the package e1071 [35] was used. The performance of the models was assessed using the root mean squared error of the cross-validation (RMSECV), the root mean squared error of the calibration (RMSEC), and the root mean squared error of the prediction (RMSEP). The RMSECV corresponds to the results of the 10-fold cross-validation used for the selection of the parameters in the PLS-R and SVR models using the calibration set. The RMSEC was measured as the difference between the predicted and measured soluble Al 2 O 3 values using the calibration set, whereas the RMSEP was measured as the difference between the predicted and measured soluble Al 2 O 3 values using the validation set.

Results
The calcined kaolin samples used in this study had soluble Al 2 O 3 content from 0.26 to 0.54 wt % (Table 1). Figure 2 shows that the samples collected in the first period (samples 1 to 23) generally had lower soluble Al 2 O 3 content-average 0.35 wt %-than those obtained in the second period (samples 24 to 56), with an average of 0.46 wt %. The variation in the average values between the two periods might reflect differences in the quality of the raw kaolin used as a feed for calcination, volume of material fed into the calciner, or variations in the calciner's temperature profile. Nevertheless, according to the plant's historical data, the soluble Al 2 O 3 content in the produced calcined kaolin typically varies from 0.3 to 0.6 wt %. Despite the differences between the two production periods, the sample set covers the production range continuously, ensuring the data representability. These data are the actual values used for developing the calibration PLS-R and SVR models.

Spectral Processing
The average spectrum of the calcined kaolin clay after SG smoothing is presented in Figure 3. The adsorbed water on the surface of γ-alumina from the spinel phase exhibits spectral features in two regions. The first region ranges from 2.6 to 3.3 µm and corresponds to the stretching H-O-H vibrations (νH 2 O); the second region extends from 5.7 to 6.5 µm, where the O-H bending vibrations (δH 2 O) are located [36][37][38]. From the minerals detected by XRD analysis, only illite is expected to have spectral features at 2.74, 3.48, 5.56 µm [39]; mullite and quartz are featureless in the presented spectral range. Even though the sample preparation sought to minimise the effect from the environment on the spectra, the CO 2 in the air induced artefacts around 4.2 µm. The analysis of the spectra of water in γ-alumina was then constrained to the spectral regions previously described to reduce the influence of the features of illite and CO 2 . Figures 4 and 5 display the spectra of the 2.6-3.3 µm and 6.5-7.5 µm subsets after Savitzky-Golay (SG) smoothing (hereafter referred to as raw spectra), SNV, and CR processing. In the 2.6 to 3.3 µm range, spectral processing using SNV and CR (Figure 4b,c) enhances the 2.78 µm peak, attributed to the adsorption of water [36]. The other two peaks at 2.73 and 2.75 µm are also better resolved; however, they can be indistinctly attributed to the presence of water or the surface hydroxyls in illite. The 5.7 to 6.5 µm region contains only the δH 2 O spectral feature ( Figure 5). In this range, SNV and CR processing enhance the depth at the centre of the feature at 6.15 µm, apparently separating the spectra into two groups. Besides, the type of spectral processing influences the shape of the spectra. The CR spectra appear to be symmetric, whereas the SNV one is biased towards longer wavelengths, also having a difference in the shoulder position to the deeper spectra at 6.24 µm, and that in the shallower one at 6.27 µm. Following the approach proposed in a previous study [13], the depth of the water feature was used to investigate the general behaviour of the spectra of the calcined kaolin samples in relation to the soluble Al 2 O 3 content. Figure 6 shows these parameters using the depth of the δH 2 O feature at 6.15 µm extracted from CR spectra. The most remarkable aspect of this plot is that the feature depth separates the samples collected in the first and the second period. The samples of the first period-which have the lowest soluble Al 2 O 3 -have a deep water feature, whereas in samples of the second period (with higher soluble Al 2 O 3 ), this feature is shallower, as was expected from former studies. However, in an individual assessment of every period, it is difficult to describe a pattern of correlation between the depth of the water feature and the soluble Al 2 O 3 content. A proper assessment of the spectral features of water in relation to the amount of soluble Al 2 O 3 in calcined kaolin samples should include not only the depth, but also the overall shape of the spectral feature. Besides, the clustering of data points in Figure 6 and lack of trends make it clear that the relationship between the spectra and the soluble Al 2 O 3 is not linear. As a consequence, a multivariate approach that can cope with non-linearities is more appropriate than a univariate one.  Table 2 presents the results of the PLS-R models developed for the six scenarios by using only the infrared spectra as the information variables. In general, the difference in the performance of the models is minimal. Since the RMSECV for all the scenarios is in principle the same, the selection of the best model was based on the lowest RMSEC and RMSEP, which is the model generated from the 5.7 to 6.5 µm SNV processed spectra, using two LVs and with a prediction error RMSECV = 0.050 wt %. For this model, the coefficient of determination is R 2 = 0.64. The measured vs. predicted plot (Figure 7a) shows clustering rather than an even distribution along the regression line. The same is true for the predicted values from the validation set. In addition, the residuals plot (Figure 7b) shows large residuals-particularly for the extremes, where low and high values have respectively strong negative and positive residuals. These results suggest that in this case, the PLS-R calibration is not able to cope entirely with the non-linearity of the dataset. A possible cause of the non-linearity might be related to the mineralogical content of the samples. The interpretation of the XRD and infrared spectra revealed that the samples also contain illite. Even though the relative amount of illite is low, its presence can contribute to the variability of the spectra. In order to consider the influence of illite in the performance of the PLS-R calibration, the chemical contents associated exclusively to illite-i.e., K 2 O, Fe 2 O 3 , and MgO-were included as information variables. The results of the calibration for the six scenarios are presented in Table 3. Compared to the previous results, when taking illite into account, the performance indicators of the new models are slightly better. Moreover, in the measured vs. predicted plot (Figure 8a), the data points are more evenly distributed along the regression line than those presented in Figure 7, suggesting that the inclusion of illite as a variable can improve the performance of the model. However, Figure 8 also shows a poor prediction for high soluble Al 2 O 3 values and larger residuals for some data points, leading to a low coefficient of determination (R 2 = 0.53). Even though the consideration of illite seems to correct for the non-linearities in the dataset, the overall result of the calibration is not optimal.  Another strategy for coping with non-linearities is the implementation of SVR models. The results of the SVR calibration for the six scenarios are presented in Table 4. The SVR cross-validation errors do not differ significantly from those reported by the PLS-R models; however, the RMSECs of the SVR are considerably smaller than the PLS-R ones, indicating better performance. The performance of the SVR models improved after applying spectral processing; however, the most notorious improvement occurred in the 5.7 to 6.5 µm spectra using CR (RMSECV = 0.046 wt %), which was consequently selected as the best model. The coefficient of determination for this model is R 2 = 0.87, which confirms its good performance. The measured vs. predicted plot (Figure 9a) shows that even though the data points seem to separate into two groups, there is an even distribution along the regression line for the entire dataset. Moreover, the predicted values from the validation set are also evenly distributed, except for the values above 0.5 wt %. The magnitude of the residuals (Figure 9b) is remarkably smaller than those of the best PLS-R model; however, their spread shows that the low values are generally over-estimated, whereas the highest values are under-estimated.

Discussion
Prior studies noted the correlation between the reactivity of calcined kaolin and the hydration of γ-alumina when the calcination reaches the spinel phase. Such a relationship was also assessed by the amount of extracted soluble Al 2 O 3 and the infrared spectral features of water adsorbed by γ-alumina in the calcined clay. In this study, infrared spectroscopy was used to determine the soluble Al 2 O 3 content in the calcined clay utilising a combination of spectral processing strategies and multivariate calibration. Even though previous studies suggested that the depth of the water spectral features could predict the soluble Al 2 O 3 content, the results of this work showed that such direct prediction is not possible. Since the depth of the water feature captures not only the water physically adsorbed on the surface of γ-alumina but also all the water present in the system (i.e., chemically bound and free water), the depth parameter is influenced by the γ-alumina properties as well as the sample's environmental conditions. In Figure 6, the separation of the two sampling periods by the water parameter indicates differences in the total amount of water present in the samples, where those with a larger exposure to water from the environment have the deepest features. Therefore, the proper assessment of the relationship between soluble Al 2 O 3 and adsorbed water requires the analysis of the water the features as a whole in such a way that the spectral characteristics that relate to physically adsorbed water would have more relevance than those that are related to chemically bound or free water.
Since the spectral analysis must take the shape of the water feature into account, it is relevant to select an appropriate spectra processing strategy. The main difference between the strategies used in this study is that CR exaggerates the spectral features for a more precise discrimination. As a consequence, CR processing enhances the depth differences in the spectra of the water feature, thus increasing variations that are not relevant to the calibration. Using CR in the PLS-R modelling influenced the number of LVs required for the prediction, making only one or two LVs necessary for achieving a low prediction error. However, the calibration assigns high loadings to the peak of the feature enhancing the influence of the feature's depth, hindering a good prediction. Consequently, the RMSEC of the calibrations using CR-processed spectra were consistently larger than those processed with SNV, given that SNV assigned relatively small loadings to the feature's depth. In contrast, for the SVR modelling, the performances of the calibrations generated from CR spectra are better than those from SNV. The SVs used in these models seem to reduce the influence of the depth differences, although they are not eliminated, while giving more importance to features that are related to water adsorption.
Aside from the spectral processing, the selection of a spectral range also influenced the performance of the multivariate calibrations. In general, the errors of the models in the 2.6 to 3.3 µm range are higher than those in the 5.7 to 6.5 µm. Even though both ranges represent water features related to water adsorption, the presence of illite produces a mixed spectrum around 2.7 µm that explains the underperformance of the calibrations. Since the feature in the 5.7 to 6.5 µm range is exclusive for water, the calibrations in this part of the spectrum are more reliable. However, illite itself can also host water, inducing error to the predictions. The results of the PLS-R calibrations that included the chemical components of illite as variables show that there is indeed an influence of this mineral in the performance of the predictions. However, the inclusion of these variables can make the models more sensitive to errors induced by other types of impurities. To achieve more reliable calibrations, it would be necessary to quantify the amount of illite in the calcined clay and estimate its actual influence on the spectra.
Overall, the performance indicators in all the assessed scenarios indicate that the SVR calibrations are more accurate than the PLS-R ones. However, the most evident difference in the performance between SVR and PLS-R models is shown by the residuals of the different achieved models. The size and the distribution of the PLS-R residuals are evidence of a poor calibration, and suggest that a non-linear approach should be used. The smaller SVR residuals indicate a better calibration; however, the model struggles with the high and low values by systematically over-estimating values lower than 0.3 wt % and under-estimating values higher than 0.5 wt %. As a consequence, a note of caution is due here for the interpretation of new predictions that result in values lower than 0.3 wt % or higher than 0.5 wt %, since they might be considered as over-calcination in the first case, and under-calcination in the latter.
Even though the accuracy of the model with the best performance (5.7 to 6.5 µm spectra after CR processing) which has an RMSECV = 0.046 wt % is lower than the one of the SOP (0.015 wt %), it is enough for detecting variations in the soluble Al 2 O 3 content in the calcined clay. For this sort of application, the lower performance of the prediction at extreme values-i.e., around 0.3 wt % and 0.5 wt %-is not of great consequence, since the over-or under-predicted values are still an indication of general variation trends in the calcination output. These factors encourage the use of the SVR predictions based on infrared spectra as part of a control system.
For an on-site and (near) real-time implementation scenario, the fact that the prediction model is based on spectra collected with a hand-held instrument increases the likelihood that such a system can be used in a mineral production environment. Moreover, the delimitation of the spectral range that is useful for the prediction optimises the required time for spectral collection and data processing. This approach enables the analysis of a larger volume of material than the one that could be possibly measured by the SOP. Therefore, the predicted soluble Al 2 O 3 values could assist a more efficient utilisation of the SOP by performing the laboratory analysis in specifically determined samples. Furthermore, the infrared-based predicted values for soluble Al 2 O 3 could be used to support the in-plant monitoring by improving operational feedback, thus making an impact on the performance of the calciner and the quality of the calcined product.

Conclusions
Soluble Al 2 O 3 , as an industrial parameter for quality control in calcination, is related to the formation of γ-alumina and consequently to the reactivity of the calcined clay. In the infrared spectra, the presence of γ-alumina is evidenced by the spectral features related to water adsorption. However, the nature of the water spectra (which captures not only adsorbed but also chemically bound and free water) hinders a univariate or linear relation with the amount of soluble Al 2 O 3 .
This study developed multivariate calibration models for the prediction of the content of soluble Al 2 O 3 in calcined kaolin clays by using infrared spectroscopy. A combination of factors that affect the achievement of a good prediction was used to develop testing scenarios: (1) spectral regions where the water features are present between 2.6 to 3.3 µm and 5.7 to 6.5 µm; (2) standard normal variate (SNV) and continuum removal (CR) as spectral processing strategies to improve the discrimination among the different forms in which water is present in the samples; (3) partial least squares regression (PLS-R) and support vector regression (SVR) as multivariate calibration methods that cope with non-linearity.
In general, the SVR models demonstrated a better ability to relate the water features to the soluble Al 2 O 3 parameter than the PLS-R ones. The best-performing model was achieved by using SVR in the 5.7 to 6.5 µm range after CR processing. The SVR model predicts the soluble Al 2 O 3 content in the calcined clay with RMSECV = 0.046 wt % and R 2 = 0.87. Even though this level of accuracy is lower than that one of the standard operational procedure (SOP), it is suitable for detecting variation trends in the production of calcined kaolin. These results encourage the use of infrared spectroscopy as a technique for (near) real-time quality control that supports the optimisation of the calcination process.