Next Article in Journal
Evaluation of Phytochemical and Antioxidant Properties of 15 Italian Olea europaea L. Cultivar Leaves
Next Article in Special Issue
Applications of Photonics in Agriculture Sector: A Review
Previous Article in Journal
Flow Hydrodediazoniation of Aromatic Heterocycles
Previous Article in Special Issue
Calibration Transfer Based on Affine Invariance for NIR without Transfer Standards
Open AccessArticle

Investigation of Direct Model Transferability Using Miniature Near-Infrared Spectrometers

Viavi Solutions Inc., 1402 Mariner Way, Santa Rosa, CA 95407, USA
*
Author to whom correspondence should be addressed.
Academic Editors: Christian Huck and Krzysztof B. Bec
Molecules 2019, 24(10), 1997; https://doi.org/10.3390/molecules24101997
Received: 17 April 2019 / Revised: 15 May 2019 / Accepted: 23 May 2019 / Published: 24 May 2019

Abstract

Recent developments in compact near infrared (NIR) instruments, including both handheld and process instruments, have enabled easy and affordable deployment of multiple instruments for various field and online or inline applications. However, historically, instrument-to-instrument variations could prohibit success when applying calibration models developed on one instrument to additional instruments. Despite the usefulness of calibration transfer techniques, they are difficult to apply when a large number of instruments and/or a large number of classes are involved. Direct model transferability was investigated in this study using miniature near-infrared (MicroNIR™) spectrometers for both classification and quantification problems. For polymer classification, high cross-unit prediction success rates were achieved with both conventional chemometric algorithms and machine learning algorithms. For active pharmaceutical ingredient quantification, low cross-unit prediction errors were achieved with the most commonly used partial least squares (PLS) regression method. This direct model transferability is enabled by the robust design of the MicroNIR™ hardware and will make deployment of multiple spectrometers for various applications more manageable.
Keywords: NIR; direct model transferability; MicroNIR™; SVM; hier-SVM; SIMCA; PLS-DA; TreeBagger; PLS; calibration transfer NIR; direct model transferability; MicroNIR™; SVM; hier-SVM; SIMCA; PLS-DA; TreeBagger; PLS; calibration transfer

1. Introduction

In recent years, compact near infrared (NIR) instruments, including both handheld and process instruments, have attracted considerable attention and received wider adoption due to their cost-effectiveness, portability, ease of use, and flexibility in installation. These instruments have been used for various applications in different industries, such as the pharmaceutical industry, agriculture, the food industry, the chemical industry, and so on. [1,2,3,4,5] They enable point-of-use analysis that brings advanced laboratory analysis to the field [6,7] and online and inline analysis that permits continuous process monitoring [8,9]. Moreover, scalability of NIR solutions has become possible. It is common that users of compact NIR instruments would desire more than one instrument to be used for their applications. Sometimes a large number of instruments are deployed.
Intrinsically, NIR solutions require multivariate calibration models for most applications due to the complexity of the spectra resulting from vibrational overtones and combination bands. Usually a calibration data set is collected using an NIR instrument to develop a calibration model. However, when multiple instruments are deployed for the same application, it is too time and labor consuming to collect calibration sets and develop calibration models for these instruments individually. It is also very inconvenient to manage different calibration models for different instruments. Therefore, it is highly desirable that calibration development is performed only once, and that the calibration model can be used on all these instruments successfully. In practice, when multiple instruments are involved for a particular application, the calibration model is often developed on one instrument and then applied to the rest of the instruments, especially when a project starts with one instrument for a feasibility test and then multiple instruments are procured. When a large number of instruments are involved, a global model approach can be taken in which calibration data from at least two to three instruments are pooled to develop the calibration model, in order to minimize noncalibrated variations from the instruments [10]. For any of the cases, model transferability from one or multiple instruments to the others is critical.
Historically, instrument-to-instrument variations could prohibit the success of the direct use of calibration models developed on one instrument with the other instruments. To avoid full recalibration, various calibration transfer methods have been developed to mathematically correct for instrument-to-instrument variations [10,11]. Common methods include direct standardization [12], piecewise direct standardization (PDS) [12,13,14], spectral space transformation [15], generalized least squares (GLS) [16], and so on. These methods have been extensively used to transfer quantitative calibration models [17,18,19,20], but very few studies were focused on the transfer of classification models [21,22]. Although these methods are very useful, they can only deal with calibration transfer from one instrument to another at a time and require transfer datasets to be collected from the same physical samples with both instruments. This is practical when there are only a few instruments involved. One instrument can be designated as the master instrument to develop the calibration model. Then data collected by the other instruments can be transformed into the master instrument’s approximate space via the respective pair of transfer datasets. Thus, the master calibration model can be used by the other instruments. Alternatively, the master calibration data can be transferred to the other target instruments and calibration models can be developed on these target instruments. However, in the new era of handheld and process NIR instrumentation, a large number of instruments (e.g., > 20) could be deployed for one application. It would be difficult to perform calibration transfer in this way, especially when these instruments are placed in different locations. Other calibration transfer methods have been developed without using the transfer datasets from both instruments [23,24,25]. But unlike the commonly used methods, these methods have not been extensively studied and made easily available to general NIR users. Moreover, calibration transfer of classification models typically requires transfer data to be collected from every class. When a large number of classes are included in the model, the efforts required would be close to rebuilding a library on the secondary instrument. This may explain why very few studies have been conducted on transfer of classification models.
Considering all the advantages and potentials the handheld and process NIR instruments can offer and the challenges for calibration transfer when a large number of instruments and/or a large number of classes are involved, it is intriguing to understand if advances in instrumentation and modeling methods could make direct use of the master calibration model acceptable. However, to the best of our knowledge, little research has been done in this area.
The authors have demonstrated in the past that the use of miniature near-infrared (MicroNIR™) spectrometers with the aid of support vector machine (SVM) modeling can achieve very good direct transferability of models with a large number of classes for pharmaceutical raw material identification [26]. In the current study, using MicroNIR™ spectrometers, direct model transferability was investigated for polymer classification. Five classification methods were tested, including two conventional chemometric algorithms, partial least squares discriminant analysis (PLS-DA) [27] and soft independent modeling of class analogy (SIMCA) [28], and three machine learning algorithms that are burgeoning in chemometrics, bootstrap-aggregated (bagged) decision trees (TreeBagger) [29], support vector machine (SVM) [30,31] and hierarchical SVM (hier-SVM) [26]. High cross-unit prediction success rates were achieved. Direct transferability of partial least squares (PLS) regression models was also investigated to quantify active pharmaceutical ingredients (API). Low cross-unit prediction errors were obtained.

2. Results

2.1. Classification of Polymers

Polymers are encountered in everyday life and are of interest for many applications. In this study polymer classification was used as an example to investigate direct model transferability. Resin kits containing 46 materials representing the most important plastic resin used in industry today were used. Each material was treated as one class. Three resin kits were used to show prediction performance on different physical samples of the same material. The samples were measured by three randomly chosen MicroNIR™ OnSite spectrometers (labeled as Unit 1, Unit 2 and Unit 3).

2.1.1. Spectra of the Resin Samples

Spectra collected by the three spectrometers were compared in Figure 1. For clarity, example spectra of two samples were presented. The same observations were obtained for the other samples. The raw spectra in Figure 1a only show baseline shifts between measurements using different spectrometers for the same sample. These shifts were mainly due to different measurement locations, since these resin samples are injection molded and are not uniform in thickness and molecular orientation. In fact, baseline shifts were also observed when using the same spectrometer to measure different locations of the same sample. These shifts can be corrected by spectral preprocessing, and the preprocessed spectra from the same sample collected by different spectrometers were very similar as shown in Figure 1b.

2.1.2. Direct Model Transferability of the Classification Models

The performance of the polymer classification models was evaluated at four levels, the same-unit-same-kit performance, the same-unit-cross-kit performance, the cross-unit-same-kit performance, and the cross-unit-cross-kit performance. To account for the most variation in sample shape and thickness, each resin sample was scanned in five specified locations. In addition, at each position the sample was scanned in two orientations with respect to the MicroNIR™ lamps to account for any directionality in the structure of the molding. For each position and orientation, three replicate scans were acquired, totaling thirty scans per sample, per spectrometer. Prediction was performed for every spectrum in the validation set. For the same-unit-same-kit performance, the models built with data collected from four locations on each sample in one resin kit by one spectrometer were used to predict data collected from the other location on each sample in the same resin kit by the same spectrometer. The total number of predictions was 276 for all 46 materials for each case. For the same-unit-cross-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from a different resin kit by the same spectrometer. The total number of predictions was 1380 for all 46 materials for each case. For the cross-unit-same-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from the same resin kit by a different spectrometer. The total number of predictions was 1380 for all 46 materials for each case. For the cross-unit-cross-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from a different resin kit by a different spectrometer. The total number of predictions was 1380 for all 46 materials for each case. Five different classification algorithms were used to build the models, which were PLS-DA, SIMCA, TreeBagger, SVM and hier-SVM.
The prediction performance was evaluated in terms of prediction success rates and the number of missed predictions. The representing results were summarized in Table 1 and Table 2, respectively. The prediction success rates were calculated by dividing the number of correct predictions with the number of total predictions. The number of missed predictions is presented to make the difference clearer, since with a large number of total predictions a small difference in prediction success rate would mean a conceivable difference in the number of missed predictions. It should be noted that in a few cases the total number of predictions was not exactly 276 or 1380, because extra spectra were collected unintentionally during experiments and no spectra were excluded from analysis. To make the comparison consistent, in these tables all the models were developed using data from Kit 1 for different spectrometers. The prediction data were collected using different resin kits and different spectrometers for the four levels of performance.
The same-unit-same-kit cases were control cases and presented as the diagonal elements for each algorithm in the left three columns of the tables. As expected, 100% prediction success rates and 0 missed predictions were obtained for all algorithms except for one PLS-DA case (Unit 1 K1 for modeling and testing) where there was only 1 missed prediction. The same-unit-cross-kit cases showed the true prediction performance of the models for each spectrometer, since independent testing samples were used. The results are presented as the diagonal elements for each algorithm in the right three columns of the tables. All the models showed very good same-unit-cross-kit predictions. Although SIMCA showed the best performance, the differences in performance were very small between algorithms. It should be noted that samples made of the same type of material but with different properties are included in the resin kits, indicating that the MicroNIR™ spectrometers have the resolution to resolve minor differences between these polymer materials. For the cross-kit cases, Kit 2 was used for Unit 1 and Unit 2, while Kit 3 was used for Unit 3, because at the time of data collection using Unit 3, Kit 2 was no longer available. Nonetheless, conclusions about the cross-kit performance were not impacted by this.
The direct model transferability was first demonstrated by the cross-unit-same-kit results, which are presented by the non-diagonal elements for each algorithm in the left three columns of the tables. Except the PLS-DA algorithm, all the other algorithms showed good performance. In general, the order of performance was Hier-SVM > SVM > SIMCA > TreeBagger >> PLS-DA. When the hier-SVM algorithm was used, the worst case only had 28 missed predictions out of 1380 predictions, and 1/3 of the cases showed perfect predictions.
The direct model transferability was further demonstrated by the most stringent cross-unit-cross-kit cases, which are often the real-world cases. The results are presented by the non-diagonal elements for each algorithm in the right three columns of the tables. Other than the PLS-DA algorithm, all the other algorithms showed good performance, but which was slightly worse than the cross-unit-same-kit results with some exceptions. In general, the order of performance was hier-SVM > SVM > TreeBagger ≈ SIMCA >> PLS-DA.
Besides the representing results shown in these tables, all possible combinations of datasets were analyzed, including 6 same-unit-same-kit cases, 6 same-unit-cross-kit cases, 8 cross-unit-same-kit cases, and 16 cross-unit-cross-kit cases in total for each algorithm. The conclusions were similar to those presented above. For the most stringent cross-unit-cross-kit cases, the mean prediction success rates of all the cases were 98.15%, 97.00%, 96.74%, 95.83%, and 80.19% for hier-SVM, SVM, TreeBagger, SIMCA, and PLS-DA, respectively. The high prediction success rates for hier-SVM, SVM, TreeBagger and SIMCA indicate good direct model transferability for polymer classification with MicroNIR™ spectrometers. To achieve the best result, hier-SVM should be used. But the conventional SIMCA algorithm that is available to most NIR users is also sufficient.

2.2. Quantification of Active Pharmaceutical Ingredients

Quantitative analysis of an active pharmaceutical ingredient is important in several different steps of a pharmaceutical production process and it was proved that NIR spectroscopy is a good alternative to other more time-consuming means of analysis [32]. As one of the process analytical technology (PAT) tools adopted by the pharmaceutical industry, compact NIR spectrometers can be installed for real-time process monitoring, enabling the quality by design (QbD) approach that is now accepted by most pharmaceutical manufacturers to improve manufacturing efficiency and quality [33,34]. In this context, multiple NIR spectrometers will be needed for the same application. It is important to understand the direct transferability of calibration models to determine APIs quantitatively.
To investigate this, a five-component pharmaceutical powder formulation including three APIs, acetylsalicylic acid (ASA), ascorbic acid (ASC), and caffeine (CAF), as well as two excipients, cellulose and starch, was used. A set of 48 samples was prepared by milling varying amounts of the three APIs in the concentration range of 13.77–26.43% w/w with equal amounts (40% w/w) of a 1:3 (w/w) mixture of cellulose and starch [4]. The set of samples was measured by three randomly chosen MicroNIR™ 1700ES spectrometers (labeled as Unit 1, Unit 2 and Unit 3).

2.2.1. Spectra of the Pharmaceutical Samples

The spectra were first compared across the three instruments. Raw spectra of two samples with the lowest ASA concentration and the highest ASA concentration collected by all three instruments are shown in Figure 2a. Only slight baseline shifts can be seen between spectra collected by different instruments. The preprocessed spectra collected by different instruments became almost identical, as shown in Figure 2b. However, spectral differences between the high concentration sample and the low concentration can be clearly seen. Similar observations were obtained for the other two APIs, ASC (Figure 2c,d) and CAF (Figure 2e,f). It should be noted the optimized preprocessing steps were chosen to generate the preprocessed spectra for each API, respectively.

2.2.2. Direct Model Transferability of the Quantitative Models

To develop the quantitative calibration models, 38 out of the 48 samples were selected as the calibration samples via the Kennard-Stone algorithm [35], based on the respective API concentration, which was determined by the amount of API added to the powder sample. The remaining 10 samples were used as the validation samples. Twenty spectra were collected from each sample with every spectrometer. Thus, 760 spectra from the 38 calibration samples were used to build every model and 200 spectra from the 10 validation samples were used to validate each model. For each API, an individual model was developed on each instrument by partial least squares (PLS) regression. Different preprocessing procedures with different settings were tested and the optimal one was determined based on the cross-validation statistics using the calibration set. The same optimal preprocessing procedure was selected on all three instruments for the same API. The API models were developed using the corresponding preprocessed spectra.
The model performance was first evaluated in terms of normalized root mean square error of prediction (NRMSEP), which is root mean square error of prediction (RMSEP) normalized to the mean reference value of the validation set. NRMSEP was used to provide an estimate of how big the error was relative to the value measured. Since the mean reference value was the same for all the validation sets, it is equivalent to comparing RMSEP. Two types of prediction performance were examined, the same-unit performance and the cross-unit performance. Using a calibration model developed on one instrument, the same-unit performance was determined by predicting the validation set obtained with the same instrument, and the cross-unit performance was determined by predicting the validation set obtained with a different instrument. The cross-unit performance is the indicator of direct model transferability. The results were reported under the No Correction section in Table 3, Table 4 and Table 5 for ASA, ASC and CAF, respectively. The unit number in the row title represents which of the instruments was used to develop the calibration model, and the unit number in the column title represents which instrument was used to collect the validation data. Therefore, the NRMSEP values on the diagonal indicate the same-unit performance, while the other values indicate the cross-unit performance. The data show that cross-unit performance was close to the same-unit performance, all below 5%.
In another independent study, the same samples were measured by a benchtop Bruker Vector 22/N FT-NIR spectrometer. The reported mean absolute bias based on 3 validation samples was 0.28, 0.62 and 0.11 for ASA, ASC and CAF, respectively [36]. In the current study, the mean absolute bias of the three same-unit cases based on 10 validation samples was 0.21, 0.35 and 0.22 for ASA, ASC and CAF, respectively. The mean absolute bias of the six cross-unit cases based on 10 validation samples was 0.14, 0.30 and 0.25, respectively. These results indicate that both the same-unit and the cross-unit MicroNIR™ performance is comparable with the benchtop instrument performance. However, it should be noted in the current study 38 samples were used for calibration and 10 samples were used for validation, while in the other study 45 samples were used for calibration and 3 samples were used for validation.
The model performance was further examined by the predicted values of the validation set versus the reference values. Using calibration models developed on Unit 1, the same-unit predicted results and the cross-unit predicted results for ASA, ASC and CAF are shown in Figure 3. It can be seen that most of the predicted values stay close to the 45-degree lines, explaining the good model performance. Moreover, the cross-unit results (red circles) are very close to the same-unit results (blue circles), explaining the similar cross-unit performance to the same-unit performance.
The corresponding Bland-Altman plots were used to illustrate the agreement between the cross-unit prediction results and the same-unit prediction results in Figure 4. The Bland-Altman analysis is a well-accepted technique for method comparison in highly regulated clinical sciences [37] and shows good visual comparison between two instruments [11]. The x-axis shows the mean predicted value and the y-axis shows the difference between the cross-unit predicted value and the same-unit predicted value. The limits of agreement (LOA) were calculated by Equation (1):
L O A = d ¯ ± 1.96 × S D
where d ¯ is the bias or the mean difference, and SD is the standard deviation of the differences. It can be seen from Figure 4 that with only a few exceptions, all data points stayed within the LOA, indicating that at a 95% confidence level, the cross-unit prediction results agreed well with the same-unit prediction results. LOA relative to the mean of the mean predicted values (x-axis) was below 3% for all three APIs.
The corresponding reduced Hotelling’s T2 and reduced Q residuals are shown in Figure 5. The reduced statistics were calculated by normalizing Hotelling’s T2 and Q residuals to their respective 95% confidence limit. The black circles represent the calibration data, the blue circles represent the same-unit validation data, and the red circles represent the cross-unit validation data. It can be clearly seen that the cross-unit validation data stayed close to the same-unit validation data, further explaining the similar cross-unit performance to the same-unit performance. It was noticed that 20 calibration data points (from the same physical sample) and 20 cross-unit validation data points (from another physical sample) are in the high reduced Hotelling’s T2 and high reduced Q residuals quadrant for ASA (Figure 5a,b). These explained why the prediction results of one sample significantly deviated from the 45-degree lines in Figure 3a,b. However, to keep the analysis consistent with the other two APIs and data available in literature [4,36] for comparison, no sample was excluded from calibration or validation.

2.2.3. Calibration Transfer

To check how direct model transfer compared with calibration transfer, three types of calibration transfer methods were tested. The first method was bias correction by standardizing the predicted values, which is probably the simplest method. The second method was PDS by mapping spectral responses of the slave instrument to the master instrument, which is probably the most commonly used method. The third method was GLS by removing the differences between instruments from both instruments. To perform the calibration transfer, 8 transfer samples were selected from the calibration samples with the Kennard-Stone algorithm. The calibration transfer results using Unit 1 as the master instrument were summarized in Table 3, Table 4 and Table 5 for ASA, ASC and CAF, respectively. It should be noted that different settings for PDS and GLS were tested. The results presented were obtained under the best settings based on RMSEP. By comparing these results with the corresponding same-unit and cross-unit results (Column 1 under No Correction), there was not a single method that could improve cross-unit results for all three APIs. Choosing the best method for individual API, only slight improvement (decrease of 0.3–0.9% in RMSEP%) of cross-unit performance was observed. Calibration transfer could sometimes damage the performance when a certain method was applied to a certain API. In addition, for ASC and CAF, the cross-unit performance was already close to or slightly better than the same-unit performance. For ASA, although the same-unit performance was better than the cross-unit performance using the calibration model on Unit 1 (Column 1 under No Correction in Table 3), it was similar to the cross-unit performance using calibration models on Unit 2 and Unit 3 (Row 1 under No Correction in Table 3). All these observations indicate that the instrument-to-instrument difference was small. Therefore, calibration transfer may not be necessary for this application.

3. Discussion

The good direct model transferability demonstrated in this study was enabled by the minimal instrument-to-instrument differences owing to the robust design of the MicroNIR™ hardware. The MicroNIR™ spectrometer utilizes a wedged linear variable filter (LVF) as the dispersive element on top of an InGaAs array detector, which results in an extremely compact and rugged spectral engine with no moving parts [4]. The operation of the on-board illumination allows for a steady output of optical power and an extended lamp-life. Thus, a very stable performance can be achieved without the need for realignment of hardware over time. In addition to the hardware design, the performance of every MicroNIR™ spectrometer is evaluated and calibrated at the production level. The accuracy of the MicroNIR™ wavelength calibration enables precise spectral alignments from instrument to instrument. The repeatability of the photometric response ensures the consistency of signal amplitude from instrument to instrument. The unit-specific temperature calibration stabilizes the MicroNIR™ response over the entire operating temperature range. In the Supplementary Material, the wavelength reference plots and the photometric response plots are shown for the MicroNIR™ OnSite units used for the polymer classification example (Figure S1) and the MicroNIR™ ES units used for the API quantification example (Figure S2), respectively. Very small instrument-to-instrument differences were observed. It should be noted that findings from the handheld MicroNIR™ OnSite and ES units could be extended to the MicroNIR™ PAT units for process monitoring, since the spectral engine and the calibration protocol at the production level are the same.
In this study, both a classification example and a quantification example were investigated. For the quantification example, the good direct model transferability was demonstrated with the most commonly used regression method, PLS. For the classification example, the good direct model transferability was demonstrated with both the commonly used chemometric algorithm, SIMCA, and the machine learning algorithms, SVM, hier-SVM and TreeBagger. It should be noted the PLS-DA performance could be improved to about 90% prediction success rate by manually optimizing the number of PLS factors. The results presented in Table 1 and Table 2 were based on automatically selected PLS factors. This automatic selection procedure sometimes causes overfitting. However, since all the other algorithms were also using automatic model building, which may not always generate the best results, for a fair comparison no manual intervention was introduced to PLS-DA. In fact, even with the improved performance, PLS-DA still didn’t perform as well as the other algorithms for this specific application. Although the direct model transferability was good with conventional SIMCA, it can be further improved with the use of SVM algorithms. SVM has found increasing interest in chemometrics in recent years, since it is such a sound methodology, where geometric intuition, elegant mathematics, theoretical guarantees, and practical algorithms meet [38]. Among SVM’s many appealing features, generalization ability, that is the ability to accurately predict outcome values for previously unseen data, can help minimize cross-unit prediction errors. The basic principle of SVM is to construct the maximum margin hyperplanes to separate data points into different classes. Maximizing the margin reduces complexity of the classification function, thus minimizing the possibility of overfitting. Therefore, better generalization can be achieved intrinsically for SVM [38]. When many classes are involved, like the polymer classification example in this study, the hier-SVM algorithm was shown to be beneficial, because this multilevel classification scheme facilitates refined classification for chemically similar materials to achieve more accurate prediction [26]. In addition, the TreeBagger algorithm is based on random forest, which is one of the most powerful classifiers in machine learning [39]. However, for the current study, the cross-unit performance of TreeBagger was not as good as the SVM algorithms.
The combination of the hardware design and implementation of advanced calibration techniques results in a repeatable and reproducible performance between different MicroNIR™ spectrometers, allowing effective direct model transferability. However, it is not intended to say that this will be the ultimate solution that eliminates all problems that necessitate calibration transfer. The scope of the current study was limited to model transferability only involving instrument-to-instrument differences, not very heterogeneous samples, and data collected with sound sampling and measurement protocols. For example, when different instruments are placed in different environments, environmental changes may have to be corrected for the model via calibration transfer. Very heterogeneous samples, such as biological samples, will be more difficult to handle in general. Even very small instrument-to-instrument differences could cause unsatisfactory cross-unit prediction results. A global model approach using data from samples with all expected sources of variance and/or measured with multiple instruments for calibration could significantly minimize prediction errors. Model updating techniques will also be very helpful [40]. Direct model transferability will be evaluated for very heterogeneous materials in our future studies. In addition, poor cross-unit model performance often results from nonqualified calibration data that are not collected with a careful sampling plan and a proper measurement protocol. The success of a multi-instrument NIR project must start with reliable NIR data that are collected with best practices in sampling [41,42] and measurement [43,44].
The current study demonstrated the possibility of direct model transfer from instrument to instrument for both classification and quantification problems, which has laid a good foundation for the use of a large number of compact NIR instruments. More studies should be encouraged in wider applications and using all kinds of instruments from various manufacturers. Scalability of handheld and process NIR solutions can become more manageable when the number of times that calibration transfer has to be performed between instruments can be minimized.

4. Materials and Methods

4.1. Materials

For the polymer classification study, 46 injection molded resins were obtained from The ResinKit™ (The Plastics Group of America, Woonsocket, RI, USA). The set of resins contains a variety of polymer materials, as well as various properties within the same type of material (for example different densities or strengths). Each resin was treated as an individual class in this study. All the resins used in this study are listed in Table 6 and detailed properties of these materials are available upon request. To evaluate the cross-kit prediction performance, three resin kits were used.
For the API quantification study, 48 pharmaceutical powders consisting of different concentrations of three crystalline active ingredients, as well as two amorphous excipients were provided by Prof. Heinz W. Siesler at University of Duisburg-Essen, Germany [4]. The active ingredients used were acetylsalicylic acid (ASA, Sigma-Aldrich Chemie GmbH, Steinheim, Germany), ascorbic acid (ASC, Acros Organics, NJ, USA), and caffeine (CAF, Sigma-Aldrich Chemie GmbH, Steinheim, Germany), and the two excipients used were cellulose (CE, Fluka Chemie GmbH, Buchs, Switzerland) and starch (ST, Carl Roth GmbH, Karlsruhe, Germany). The concentration of the active ingredients ranged from 13.77–26.43% (w/w), and all samples consisted of 40% (w/w) of a 3:1 (w/w) mixture of cellulose and starch.

4.2. Spectra Collection

4.2.1. Resin Samples

Three MicroNIR™ OnSite spectrometers (Viavi Solutions Inc., Santa Rosa, CA, USA) in the range of 908–1676 nm were randomly picked to collect the spectra of the resin samples. The spectral bandwidth is ~1.1% of a given wavelength. Three kits of samples were measured in the diffuse reflection mode. A MicroNIR™ windowless collar was used to interface with the samples, which optimized the sample placement relative to the spectrometer. Each sample was placed between the windowless collar of the MicroNIR™ spectrometer and a 99% diffuse reflection standard (Spectralon®, LabSphere, North Sutton, NH, USA). The reason for using the Spectralon® behind each sample was to return signal back to the spectrometer, particularly for very transparent samples, in order to improve the signal-to-noise ratio.
Each sample was scanned in five specified locations to account for the most variation in sample shape and thickness. In addition, at each position the sample was scanned in two orientations with respect to the MicroNIR™ lamps to account for any directionality in the structure of the molding. For each position and orientation, three replicate scans were acquired, totaling thirty scans per sample, per spectrometer. The MicroNIR™ spectrometer was re-baselined after every ten samples, using a 99% diffuse reflectance reference scan (Spectralon®), as well as a lamps-on dark scan, in which nothing was placed in front of the spectrometer. Each sample was measured by all three spectrometers following the same protocol.

4.2.2. Pharmaceutical Samples

Each of the 48 samples were placed in individual glass vials, and their spectra were collected by three randomly picked MicroNIR™ 1700ES spectrometers in the range of 908–1676 nm using the MicroNIR™ vial-holder accessory. The spectral bandwidth is ~1.1% of a given wavelength. In this measurement setup, the samples were scanned from the bottom of the vial in the diffuse reflection mode.
Each sample was scanned twenty times using each MicroNIR™ spectrometer. The sample was rotated in the vial-holder between every scan to account for sample placement variation, as well as the non-uniform thickness of the vial. Before every new sample, the MicroNIR™ spectrometer was re-baselined by scanning a 99% diffuse reflectance reference (Spectralon®), as well as a lamps-on dark scan, which consisted of an empty vial in place of a sample. Each sample was measured by all three spectrometers following the same protocol.

4.3. Data Processing and Multivariate Analysis

4.3.1. Polymer Classification

All steps of spectral processing and chemometric analysis were performed using MATLAB (The MathWorks, Inc., Natick, MA). All spectra collected were pretreated using Savitzky-Golay first derivative followed by standard normal variate (SNV).
PLS-DA, SIMCA, TreeBagger, SVM and hier-SVM were applied to preprocessed datasets. Autoscaling was performed when running these algorithms. To implement PLS-DA, the number of PLS factors was chosen by training set cross validation and the same number was used for all classes. To implement SIMCA, the number of principal components (PC) was optimized for each class by training set cross validation. No optimization was performed for TreeBagger, SVM and hier-SVM, and the default settings were used. For TreeBagger, the number of decision trees in the ensemble was set to be 50. Since random selection of sample subsets and variables is involved when running TreeBagger, there are small differences in the results from run to run. To avoid impacts from these differences, all the TreeBagger results were based on the mean of 10 runs. For SVM algorithms, the linear kernel with parameter C of 1 was used.
For the same-unit-same-kit performance, the models built with data collected from four locations on each sample in one resin kit by one spectrometer were used to predict data collected from the other location on each sample in the same resin kit by the same spectrometer. For the same-unit-cross-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from a different resin kit by the same spectrometer. For the cross-unit-same-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from the same resin kit by a different spectrometer. For the cross-unit-cross-kit performance, the models built with all the data collected from one resin kit by one spectrometer were used to predict all the data collected from a different resin kit by a different spectrometer.

4.3.2. API Quantification

All steps of spectral processing and chemometric analysis were performed using MATLAB. Some functions in PLS_Toolbox (Eigenvector Research, Manson, WA, USA) were called in the MATLAB code. To develop the calibration models, 38 out of the 48 samples were selected as the calibration samples via the Kennard-Stone algorithm based on the respective API concentration. The remaining 10 samples were used as the validation samples. The preprocessing procedure was optimized for each API separately based on the calibration set cross validation. The same preprocessing procedure was used on all three instruments for the same API. PLS models were developed using the corresponding preprocessed datasets for each API.
To evaluate the same-unit performance, the model built on one instrument was used to predict the validation set collected by the same instrument. To evaluate the cross-unit performance without calibration transfer, the model built on one instrument was used to predict the validation set collected by the other instruments.
For calibration transfer demonstration, Unit 1 was used as the master instrument, and Unit 2 and Unit 3 were used as the slave instruments. Eight transfer samples were selected from the calibration samples with the Kennard-Stone algorithm. To perform bias correction, bias was determined using the transfer data collected by the slave instrument, and the bias was applied to the predicted values using the validation data collected by the slave instrument. To perform PDS, the window size was optimized based on RMSEP, and the corresponding lowest RMSEP was reported in this study. To perform GLS, parameter a was optimized based on RMSEP, and the corresponding lowest RMSEP was reported in this study.

5. Conclusions

In this study, direct model transferability was investigated when multiple MicroNIR™ spectrometers were used. As demonstrated by the polymer classification example, high prediction success rates can be achieved for the most stringent cross-unit-cross-kit cases with multiple algorithms including the widely used SIMCA method. Better performance was achieved with SVM algorithms, especially when a hierarchical approach was used (hier-SVM). As demonstrated by the API quantification example, low prediction errors were achieved for the cross-unit cases with PLS models. These results indicate that the direct use of a model developed on one MicroNIR™ spectrometer on the other MicroNIR™ spectrometers is possible. The successful direct model transfer is enabled by the robust design of the MicroNIR™ hardware and will make deployment of multiple spectrometers for various applications more manageable and economical.

Supplementary Materials

The supplementary materials on reproducibility of MicroNIR™ products are available online https://www.mdpi.com/1420-3049/24/10/1997/s1. Figure S1: MicroNIR™ OnSite manufacturing data demonstrating instrument-to-instrument reproducibility, Figure S2: MicroNIR™ 1700ES manufacturing data demonstrating instrument-to-instrument reproducibility.

Author Contributions

Conceptualization, L.S.; data curation, V.S.; formal analysis, L.S. and C.H.; investigation, L.S., C.H. and V.S.; methodology, L.S., C.H. and V.S; software, L.S. and C.H.; validation, L.S. and C.H.; visualization, L.S.; writing—original draft, L.S. and V.S.; writing—review & editing, L.S., C.H. and V.S.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank Heinz W. Siesler at University of Duisburg-Essen, Germany, for providing the pharmaceutical samples used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yan, H.; Siesler, H.W. Hand-held near-infrared spectrometers: State-of-the-art instrumentation and practical applications. NIR News 2018, 29, 8–12. [Google Scholar] [CrossRef]
  2. Dos Santos, C.A.T.; Lopo, M.; Páscoa, R.N.M.J.; Lopes, J.A. A Review on the Applications of Portable Near-Infrared Spectrometers in the Agro-Food Industry. Appl. Spectrosc. 2013, 67, 1215–1233. [Google Scholar] [CrossRef]
  3. Santos, P.M.; Pereira-Filho, E.R.; Rodriguez-Saona, L.E. Application of Hand-Held and Portable Infrared Spectrometers in Bovine Milk Analysis. J. Agric. Food Chem. 2013, 61, 1205–1211. [Google Scholar] [CrossRef]
  4. Alcalà, M.; Blanco, M.; Moyano, D.; Broad, N.; O’Brien, N.; Friedrich, D.; Pfeifer, F.; Siesler, H. Qualitative and quantitative pharmaceutical analysis with a novel handheld miniature near-infrared spectrometer. J. Near Infrared Spectrosc. 2013, 21, 445. [Google Scholar] [CrossRef]
  5. Paiva, E.M.; Rohwedder, J.J.R.; Pasquini, C.; Pimentel, M.F.; Pereira, C.F. Quantification of biodiesel and adulteration with vegetable oils in diesel/biodiesel blends using portable near-infrared spectrometer. Fuel 2015, 160, 57–63. [Google Scholar] [CrossRef]
  6. Risoluti, R.; Gregori, A.; Schiavone, S.; Materazzi, S. “Click and Screen” Technology for the Detection of Explosives on Human Hands by a Portable MicroNIR–Chemometrics Platform. Anal. Chem. 2018, 90, 4288–4292. [Google Scholar] [CrossRef]
  7. Pederson, C.G.; Friedrich, D.M.; Hsiung, C.; von Gunten, M.; O’Brien, N.A.; Ramaker, H.-J.; van Sprang, E.; Dreischor, M. Pocket-size near-infrared spectrometer for narcotic materials identification. In Proceedings Volume 9101, Proceedings of the Next-Generation Spectroscopic Technologies VII, SPIE Sensing Technology + Applications, Baltimore, MD, USA, 10 June 2014; Druy, M.A., Crocombe, R.A., Eds.; International Society for Optics and Photonics: Bellingham, WA, USA, 2014; pp. 91010O-1–91010O-11. [Google Scholar] [CrossRef]
  8. Wu, S.; Panikar, S.S.; Singh, R.; Zhang, J.; Glasser, B.; Ramachandran, R. A systematic framework to monitor mulling processes using Near Infrared spectroscopy. Adv. Powder Technol. 2016, 27, 1115–1127. [Google Scholar] [CrossRef]
  9. Galaverna, R.; Ribessi, R.L.; Rohwedder, J.J.R.; Pastre, J.C. Coupling Continuous Flow Microreactors to MicroNIR Spectroscopy: Ultracompact Device for Facile In-Line Reaction Monitoring. Org. Process Res. Dev. 2018, 22, 780–788. [Google Scholar] [CrossRef]
  10. Feudale, R.N.; Woody, N.A.; Tan, H.; Myles, A.J.; Brown, S.D.; Ferré, J. Transfer of multivariate calibration models: A review. Chemom. Intell. Lab. Syst. 2002, 64, 181–192. [Google Scholar] [CrossRef]
  11. Workman, J.J. A Review of Calibration Transfer Practices and Instrument Differences in Spectroscopy. Appl. Spectrosc. 2018, 72, 340–365. [Google Scholar] [CrossRef]
  12. Wang, Y.; Veltkamp, D.J.; Kowalski, B.R. Multivariate instrument standardization. Anal. Chem. 1991, 63, 2750–2756. [Google Scholar] [CrossRef]
  13. Wang, Y.; Lysaght, M.J.; Kowalski, B.R. Improvement of multivariate calibration through instrument standardization. Anal. Chem. 1992, 64, 562–564. [Google Scholar] [CrossRef]
  14. Wang, Z.; Dean, T.; Kowalski, B.R. Additive Background Correction in Multivariate Instrument Standardization. Anal. Chem. 1995, 67, 2379–2385. [Google Scholar] [CrossRef]
  15. Du, W.; Chen, Z.-P.; Zhong, L.-J.; Wang, S.-X.; Yu, R.-Q.; Nordon, A.; Littlejohn, D.; Holden, M. Maintaining the predictive abilities of multivariate calibration models by spectral space transformation. Anal. Chim. Acta 2011, 690, 64–70. [Google Scholar] [CrossRef]
  16. Martens, H.; Høy, M.; Wise, B.M.; Bro, R.; Brockhoff, P.B. Pre-whitening of data by covariance-weighted pre-processing. J. Chemom. 2003, 17, 153–165. [Google Scholar] [CrossRef]
  17. Cogdill, R.P.; Anderson, C.A.; Drennen, J.K. Process analytical technology case study, part III: Calibration monitoring and transfer. AAPS Pharm. Sci. Tech. 2005, 6, E284–E297. [Google Scholar] [CrossRef]
  18. Shi, G.; Han, L.; Yang, Z.; Chen, L.; Liu, X. Near Infrared Spectroscopy Calibration Transfer for Quantitative Analysis of Fish Meal Mixed with Soybean Meal. J. Near Infrared Spectrosc. 2010, 18, 217–223. [Google Scholar] [CrossRef]
  19. Salguero-Chaparro, L.; Palagos, B.; Peña-Rodríguez, F.; Roger, J.M. Calibration transfer of intact olive NIR spectra between a pre-dispersive instrument and a portable spectrometer. Comput. Electron. Agric. 2013, 96, 202–208. [Google Scholar] [CrossRef]
  20. Krapf, L.C.; Nast, D.; Gronauer, A.; Schmidhalter, U.; Heuwinkel, H. Transfer of a near infrared spectroscopy laboratory application to an online process analyser for in situ monitoring of anaerobic digestion. Bioresour. Technol. 2013, 129, 39–50. [Google Scholar] [CrossRef]
  21. Myles, A.J.; Zimmerman, T.A.; Brown, S.D. Transfer of Multivariate Classification Models between Laboratory and Process Near-Infrared Spectrometers for the Discrimination of Green Arabica and Robusta Coffee Beans. Appl. Spectrosc. 2006, 60, 1198–1203. [Google Scholar] [CrossRef]
  22. Milanez, K.D.T.M.; Silva, A.C.; Paz, J.E.M.; Medeiros, E.P.; Pontes, M.J.C. Standardization of NIR data to identify adulteration in ethanol fuel. Microchem. J. 2016, 124, 121–126. [Google Scholar] [CrossRef]
  23. Ni, W.; Brown, S.D.; Man, R. Stacked PLS for calibration transfer without standards. J. Chemom. 2011, 25, 130–137. [Google Scholar] [CrossRef]
  24. Lin, Z.; Xu, B.; Li, Y.; Shi, X.; Qiao, Y. Application of orthogonal space regression to calibration transfer without standards. J. Chemom. 2013, 27, 406–413. [Google Scholar] [CrossRef]
  25. Kramer, K.E.; Morris, R.E.; Rose-Pehrsson, S.L. Comparison of two multiplicative signal correction strategies for calibration transfer without standards. Chemom. Intell. Lab. Syst. 2008, 92, 33–43. [Google Scholar] [CrossRef]
  26. Sun, L.; Hsiung, C.; Pederson, C.G.; Zou, P.; Smith, V.; von Gunten, M.; O’Brien, N.A. Pharmaceutical Raw Material Identification Using Miniature Near-Infrared (MicroNIR) Spectroscopy and Supervised Pattern Recognition Using Support Vector Machine. Appl. Spectrosc. 2016, 70, 816–825. [Google Scholar] [CrossRef] [PubMed]
  27. Ståhle, L.; Wold, S. Partial least squares analysis with cross-validation for the two-class problem: A Monte Carlo study. J. Chemom. 1987, 1, 185–196. [Google Scholar] [CrossRef]
  28. Wold, S. Pattern recognition by means of disjoint principal components models. Pattern Recognit. 1976, 8, 127–139. [Google Scholar] [CrossRef]
  29. Breiman, L. Random Forrest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  30. Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  31. Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory-COLT′92, Pittsburgh, PA, USA, 27 July 1992; pp. 144–152. [Google Scholar]
  32. Blanco, M.; Bautista, M.; Alcalà, M. API Determination by NIR Spectroscopy Across Pharmaceutical Production Process. AAPS Pharm. Sci. Tech. 2008, 9, 1130–1135. [Google Scholar] [CrossRef]
  33. Swarbrick, B. The current state of near infrared spectroscopy application in the pharmaceutical industry. J. Near Infrared Spectrosc. 2014, 22, 153–156. [Google Scholar] [CrossRef]
  34. Gouveia, F.F.; Rahbek, J.P.; Mortensen, A.R.; Pedersen, M.T.; Felizardo, P.M.; Bro, R.; Mealy, M.J. Using PAT to accelerate the transition to continuous API manufacturing. Anal. Bioanal. Chem. 2017, 409, 821–832. [Google Scholar] [CrossRef]
  35. Kennard, R.W.; Stone, L.A. Computer Aided Design of Experiments. Technometrics 1969, 11, 137–148. [Google Scholar] [CrossRef]
  36. Sorak, D.; Herberholz, L.; Iwascek, S.; Altinpinar, S.; Pfeifer, F.; Siesler, H.W. New Developments and Applications of Handheld Raman, Mid-Infrared, and Near-Infrared Spectrometers. Appl. Spectrosc. Rev. 2011, 47, 83–115. [Google Scholar] [CrossRef]
  37. Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies. Stat. Methods Med. Res. 1999, 8, 135–160. [Google Scholar] [CrossRef]
  38. Bennett, K.P.; Campbell, C. Support Vector Machines: Hype or Hallelujah? Sigkdd Explor. Newslett. 2000, 2, 1–13. [Google Scholar] [CrossRef]
  39. Briand, B.; Ducharme, G.R.; Parache, V.; Mercat-Rommens, C. A similarity measure to assess the stability of classification trees. Comput. Stat. Data Anal. 2009, 53, 1208–1217. [Google Scholar] [CrossRef]
  40. Wise, B.M.; Roginski, R.T. A Calibration Model Maintenance Roadmap. IFAC-PapersOnLine 2015, 48, 260–265. [Google Scholar] [CrossRef]
  41. Petersen, L.; Esbensen, K.H. Representative process sampling for reliable data analysis—A tutorial. J. Chemom. 2005, 19, 625–647. [Google Scholar] [CrossRef]
  42. Romañach, R.; Esbensen, K. Sampling in pharmaceutical manufacturing—Many opportunities to improve today’s practice through the Theory of Sampling (TOS). TOS Forum 2015, 4, 5–9. [Google Scholar] [CrossRef]
  43. The Effects of Sample Presentation in Near-Infrared (NIR) Spectroscopy. Available online: https://www.viavisolutions.com/en-us/literature/effects-sample-presentation-near-infrared-nir-spectroscopy-application-notes-en.pdf (accessed on 12 March 2019).
  44. MicroNIRTM Sampling Distance. Available online: https://www.viavisolutions.com/en-us/literature/micronir-sampling-distance-application-notes-en.pdf (accessed on 12 March 2019).
Sample Availability: Not available.
Figure 1. Spectra of example polymer samples by three instruments: (a) raw spectra; (b) preprocessed spectra by Savitzky-Golay 1st derivative (5 smoothing points and 3rd polynomial order) and standard normal variate (SNV).
Figure 1. Spectra of example polymer samples by three instruments: (a) raw spectra; (b) preprocessed spectra by Savitzky-Golay 1st derivative (5 smoothing points and 3rd polynomial order) and standard normal variate (SNV).
Molecules 24 01997 g001
Figure 2. Spectra of samples with the highest and the lowest active pharmaceutical ingredient (API) concentrations measured by three instruments: (a) selected raw spectra based on the acetylsalicylic acid (ASA) concentration; (b) selected preprocessed spectra based on the ASA concentration by Savitzky-Golay 1st derivative (5 smoothing points and 2nd polynomial order) and SNV; (c) selected raw spectra based on the ascorbic acid (ASC) concentration; (d) selected preprocessed spectra based on the ASC concentration by Savitzky-Golay 2nd derivative (7 smoothing points and 3rd polynomial order) and SNV; (e) selected raw spectra based on the caffeine (CAF) concentration; (f) selected preprocessed spectra based on the CAF concentration by Savitzky-Golay 1st derivative (17 smoothing points and 3rd polynomial order) and SNV.
Figure 2. Spectra of samples with the highest and the lowest active pharmaceutical ingredient (API) concentrations measured by three instruments: (a) selected raw spectra based on the acetylsalicylic acid (ASA) concentration; (b) selected preprocessed spectra based on the ASA concentration by Savitzky-Golay 1st derivative (5 smoothing points and 2nd polynomial order) and SNV; (c) selected raw spectra based on the ascorbic acid (ASC) concentration; (d) selected preprocessed spectra based on the ASC concentration by Savitzky-Golay 2nd derivative (7 smoothing points and 3rd polynomial order) and SNV; (e) selected raw spectra based on the caffeine (CAF) concentration; (f) selected preprocessed spectra based on the CAF concentration by Savitzky-Golay 1st derivative (17 smoothing points and 3rd polynomial order) and SNV.
Molecules 24 01997 g002aMolecules 24 01997 g002b
Figure 3. Predicted values versus reference values using models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction. The corresponding bias, R2 for prediction, and root mean square error for prediction (RMSEP) are presented in each plot.
Figure 3. Predicted values versus reference values using models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction. The corresponding bias, R2 for prediction, and root mean square error for prediction (RMSEP) are presented in each plot.
Molecules 24 01997 g003
Figure 4. The Bland-Altman plots comparing the cross-unit prediction results and the same-unit prediction results using models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction.
Figure 4. The Bland-Altman plots comparing the cross-unit prediction results and the same-unit prediction results using models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction.
Molecules 24 01997 g004aMolecules 24 01997 g004b
Figure 5. Reduced Q residuals versus reduced Hotelling’s T2 for models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction.
Figure 5. Reduced Q residuals versus reduced Hotelling’s T2 for models developed on Unit 1: (a) validation sets by Unit 1 and Unit 2 for ASA prediction; (b) validation sets by Unit 1 and Unit 3 for ASA prediction; (c) validation sets by Unit 1 and Unit 2 for ASC prediction; (d) validation sets by Unit 1 and Unit 3 for ASC prediction; (e) validation sets by Unit 1 and Unit 2 for CAF prediction; (f) validation sets by Unit 1 and Unit 3 for CAF prediction.
Molecules 24 01997 g005aMolecules 24 01997 g005b
Table 1. Prediction success rates (%) of polymer classification.
Table 1. Prediction success rates (%) of polymer classification.
AlgorithmUnit# Kit# for ModelingUnit# Kit# for Testing
Unit1 K1Unit2 K1Unit3 K1Unit1 K2Unit2 K2Unit3 K3
PLS-DAUnit 1 K199.6489.6883.9995.8788.9182.39
Unit 2 K191.9610081.5290.8799.5784.49
Unit 3 K176.7475.3210075.0773.1299.20
SIMCAUnit 1 K110099.4296.4599.3597.3296.81
Unit 2 K198.7710095.4397.6899.9395.80
Unit 3 K196.3093.2910096.0992.17100
TreeBaggerUnit 1 K110097.1195.8098.0495.9496.30
Unit 2 K197.8310093.5594.4998.2696.16
Unit 3 K195.1498.4110096.0998.8498.84
SVMUnit 1 K110099.8697.5498.2697.9097.83
Unit 2 K198.7010097.0394.9398.2698.26
Unit 3 K197.8396.1810096.3095.0099.57
Hier-SVMUnit 1 K110010097.9797.8397.8397.25
Unit 2 K199.9310098.2698.2699.1399.13
Unit 3 K199.1310010096.8897.83100
Table 2. Number of missed predictions of polymer classification in the format of number of missed predictions/total number of predictions.
Table 2. Number of missed predictions of polymer classification in the format of number of missed predictions/total number of predictions.
AlgorithmUnit# Kit# for ModelingUnit# Kit# for Testing
Unit1 K1Unit2 K1Unit3 K1Unit1 K2Unit2 K2Unit3 K3
PLS-DAUnit 1 K11/276143/1386221/138057/1380153/1380243/1380
Unit 2 K1111/13800/277255/1380126/13806/1380214/1380
Unit 3 K1321/1380342/13860/276344/1380371/138011/1380
SIMCAUnit 1 K10/2768/138649/13809/138037/138044/1380
Unit 2 K117/13800/27763/138032/13801/138058/1380
Unit 3 K151/138093/13860/27654/1380108/13800/1380
TreeBaggerUnit 1 K10/27640/138658/138027/138056/138051/1380
Unit 2 K130/13800/27789/138076/138024/138053/1380
Unit 3 K167/138022/13860/27654/138016/138016/1380
SVMUnit 1 K10/2762/138634/138024/138029/138030/1380
Unit 2 K118/13800/27741/138070/138024/138024/1380
Unit 3 K130/138053/13860/27651/138069/13806/1380
Hier-SVMUnit 1 K10/2760/138628/138030/138030/138038/1380
Unit 2 K11/13800/27724/138024/138012/138012/1380
Unit 3 K112/13800/13860/27643/138030/13800/1380
Table 3. The normalized root mean square error of prediction (NRMSEP, %) for ASA.
Table 3. The normalized root mean square error of prediction (NRMSEP, %) for ASA.
Test SetsNo CorrectionBiasPDSGLS
Unit 1Unit 2Unit 3Unit 1Unit 1Unit 1
Unit 13.43.53.5---
Unit 24.04.23.93.73.33.6
Unit 34.34.54.24.13.54.4
Table 4. The normalized root mean square error of prediction (NRMSEP, %) for ASC.
Table 4. The normalized root mean square error of prediction (NRMSEP, %) for ASC.
Test SetsNo CorrectionBiasPDSGLS
Unit 1Unit 2Unit 3Unit 1Unit 1Unit 1
Unit 13.02.62.7---
Unit 22.72.72.62.33.52.6
Unit 32.52.52.72.23.12.4
Table 5. The normalized root mean square error of prediction (NRMSEP, %) for CAF.
Table 5. The normalized root mean square error of prediction (NRMSEP, %) for CAF.
Test SetsNo CorrectionBiasPDSGLS
Unit 1Unit 2Unit 3Unit 1Unit 1Unit 1
Unit 14.04.63.7---
Unit 24.14.74.24.24.33.2
Unit 34.24.94.04.16.23.9
Table 6. Polymer materials used for the classification study.
Table 6. Polymer materials used for the classification study.
No.Polymer TypeNo.Polymer Type
1PolyStyrene-General Purpose24Polyethylene-High Density
2PolyStyrene-High Impact25Polypropylene-Copolymer
3Styrene-Acrylonitrile (SAN)26Polypropylene-Homopolymer
4ABS-Transparent27Polyaryl-Ether
5ABS-Medium Impact28Polyvinyl Chloride-Flexible
6ABS-High Impact29Polyvinyl Chloride-Rigid
7Styrene Butadiene30Acetal Resin-Homopolymer
8Acrylic31Acetal Resin-Copolymer
9Modified Acrylic32Polyphenylene Sulfide
10Cellulose Acetate33Ethylene Vinyl Acetate
11Cellulose Acetate Butyrate34Urethane Elastomer (Polyether)
12Cellulose Acetate Propionate35Polypropylene-Flame Retardant
13Nylon-Transparent36Polyester Elastomer
14Nylon-Type 6637ABS-Flame Retardant
15Nylon-Type 6 (Homopolymer)38Polyallomer
16Thermoplastic Polyester (PBT)39Styrenic Terpolymer
17Thermoplastic Polyester (PETG)40Polymethyl Pentene
18Phenylene Oxide41Talc-Reinforced Polypropylene
19Polycarbonate42Calcium Carbonate-Reinforced Polypropylene
20Polysulfone43Nylon (Type 66–33% Glass)
21Polybutylene44Thermoplastic Rubber
22Ionomer45Polyethylene (Medium Density)
23Polyethylene-Low Density46ABS-Nylon Alloy
Back to TopTop