Effective Recycling Solutions for the Production of High-Quality PET Flakes Based on Hyperspectral Imaging and Variable Selection

In this study, effective solutions for polyethylene terephthalate (PET) recycling based on hyperspectral imaging (HSI) coupled with variable selection method, were developed and optimized. Hyperspectral images of post-consumer plastic flakes, composed by PET and small quantities of other polymers, considered as contaminants, were acquired in the short-wave infrared range (SWIR: 1000–2500 nm). Different combinations of preprocessing sets coupled with a variable selection method, called competitive adaptive reweighted sampling (CARS), were applied to reduce the number of spectral bands useful to detect the contaminants in the PET flow stream. Prediction models based on partial least squares-discriminant analysis (PLS-DA) for each preprocessing set, combined with CARS, were built and compared to evaluate their efficiency results. The best performance result was obtained by a PLS-DA model using multiplicative scatter correction + derivative + mean center preprocessing set and selecting only 14 wavelengths out of 240. Sensitivity and specificity values in calibration, cross-validation and prediction phases ranged from 0.986 to 0.998. HSI combined with CARS method can represent a valid tool for identification of plastic contaminants in a PET flakes stream increasing the processing speed as requested by sensor-based sorting devices working at industrial level.


Introduction
Plastics represent one of the most used materials, in daily life, in a wide range of applications, due to their peculiar characteristics and low production costs [1]. As a consequence, there has been an uncontrolled growth of large quantities of plastic waste [2], especially from packaging, still creating a series of challenges for industrialized countries at a political, economic, social, and environmental level [3]. In order to achieve circular economy and recycling targets, set by European and national legislation, to prevent the environmental impacts of plastic packaging waste, it is essential to implement efficient plastic waste recovery strategies [3][4][5]. Several actions can be taken to improve plastic recycling processes, thus allowing to bring high-quality recycled products to the market. In this context, the on-line sorting step of the mechanical recycling process plays a preeminent role in order to improve processing performance, increasing recycled plastic quality. Contaminants, i.e., other materials and other types of polymers, inside the post-consumer stream of a specific recycled polymer, can degrade the final properties of the secondary raw material [6][7][8]. A correct recognition and separation of materials in recycling plants is, thus, crucial.
Optical-based sorting of polymers is one of the key points in order to produce highquality plastics as secondary raw materials [9,10]. Many spectroscopy-based approaches Plastic flakes of PET and other polymers were selected and divided into calibration and prediction datasets for the evaluation of the PLS-DA models ( Figure 1). In detail, the calibration dataset (CAL) was created from an individual image containing 36 samples divided into 18 PET and 18 contaminant flakes (Figure 1a). Principal component analysis (PCA) was used to set classes and defining the calibration set. The calibration dataset was pre-processed and cross-validated (CV) for building a PLS-DA model to detect the presence of contaminants on PET stream. The prediction image (PRED) was created from a set of plastic samples external to the model, characterized by 18 PET and 18 flakes of contaminants randomly arranged (Figure 1b).
Plastic samples, representative of a flow-stream of PET flakes contaminated by other polymers, were collected from a recycling plant. In this scenario, the contaminants have limited and finite variability sources [46], allowing the possibility to create a representative prediction model with defined wavelengths.
Plastic flakes of PET and other polymers were selected and divided into calibration and prediction datasets for the evaluation of the PLS-DA models ( Figure 1). In detail, the calibration dataset (CAL) was created from an individual image containing 36 samples divided into 18 PET and 18 contaminant flakes (Figure 1a). Principal component analysis (PCA) was used to set classes and defining the calibration set. The calibration dataset was pre-processed and cross-validated (CV) for building a PLS-DA model to detect the presence of contaminants on PET stream. The prediction image (PRED) was created from a set of plastic samples external to the model, characterized by 18 PET and 18 flakes of contaminants randomly arranged (Figure 1b).

Data Acquisition and Analysis
Hyperspectral images acquisition was carried out at the Raw Materials Laboratory (RawMaLab) of the Department of Chemical Engineering, Materials and Environment of Sapienza University of Rome by the Sisuchema XL TM Chemical Imaging Workstation (Specim Ltd., Oulu, Finland) ( Figure 2). The HSI platform is based on a push-broom acquisition architecture, with a camera operating from 1000 to 2500 nm (SWIR range). The selected configuration of the device covers a maximum field of view of 20 cm with a pixel resolution of 625 µm. The HSI platform is equipped with a diffuse line illumination unit, consisting of quartz halogen lamps producing dual linear light, covering a spectrum range of 920 to 2514 nm, thus optimizing the imaging of various surfaces [47]. The working distance between the spectrograph lens and the sample tray plan was 30 cm. The device technical specifications are summarized in Table 1. Reflectance of hypercube was automatically set up by an internal standard reference target. A total of 240 wavelengths were collected and analyzed for each dataset. The number of pixels collected for the calibration dataset was 3420 for the PET class and 2089 for the contaminant class, while the number of pixels of the prediction dataset was 3594 for the PET class and 2836 for the contaminant

Data Acquisition and Analysis
Hyperspectral images acquisition was carried out at the Raw Materials Laboratory (RawMaLab) of the Department of Chemical Engineering, Materials and Environment of Sapienza University of Rome by the Sisuchema XL TM Chemical Imaging Workstation (Specim Ltd., Oulu, Finland) ( Figure 2). The HSI platform is based on a push-broom acquisition architecture, with a camera operating from 1000 to 2500 nm (SWIR range). The selected configuration of the device covers a maximum field of view of 20 cm with a pixel resolution of 625 µm. The HSI platform is equipped with a diffuse line illumination unit, consisting of quartz halogen lamps producing dual linear light, covering a spectrum range of 920 to 2514 nm, thus optimizing the imaging of various surfaces [47]. The working distance between the spectrograph lens and the sample tray plan was 30 cm. The device technical specifications are summarized in Table 1. Reflectance of hypercube was automatically set up by an internal standard reference target. A total of 240 wavelengths were collected and analyzed for each dataset. The number of pixels collected for the calibration dataset was 3420 for the PET class and 2089 for the contaminant class, while the number of pixels of the prediction dataset was 3594 for the PET class and 2836 for the contaminant class.

Principal Component Analysis (PCA)
PCA is often applied for HSI data exploration, useful to provide an overview of multivariate data and to evaluate the selected preprocessing combinations [54]. PCA allows the decomposition of preprocessed spectral data into linear combinations of the original spectral data, called principal components (PCs), collecting the spectral variations in reduced set of factors. The first PCs were used to analyze the common characteristics of samples and their grouping, as the samples characterized by similar spectral signatures tend to aggregate in the score plot of the first two or three components [54].

Competitive Adaptive Reweighted Sampling (CARS)
CARS is an innovative and useful wavelength selection approach [44] used in NIR spectroscopy to select variables (i.e., significant wavelengths) [45]. CARS has the potential to select an optimal combination of the useful wavelengths from the full spectrum, combined with PLS regression [54]. In CARS method, regression coefficients (RC) absolute values of PLS model are used to evaluate the weight of each wavelength. Based on the importance of each wavelength, CARS sequentially selects N subsets of wavelengths by N Monte Carlo sampling run in an iterative and competitive manner. First, in each sampling run, samples are randomly selected in a fixed ratio (e.g., 80%) to build a calibration model. Then, based on RC, exponentially decreasing function (EDF) and adaptive reweighted sampling (ARS) procedures are applied to select the key wavelengths. Finally, the subset with the lowest root mean-square error of cross validation (RMSECV) is chosen.

Partial Least Square Discriminant Analysis (PLS-DA)
PLS-DA was used to identify predefined classes of materials (i.e., PET and contaminants), by forming discriminant functions from input variables (i.e., wavelengths) to produce a new set of transformed values useful to provide a more accurate discrimination than any single variable [55]. Venetian blind (number of data splits = 10) as cross-validation method was used, in order to evaluate the complexity of the models and to select the appropriate number of latent variables (LVs) ( Figure S1). The optimal number of LVs was also determined by the smaller difference between RMSEC and RMSECV [56][57][58].

Average Raw Reflectance Spectra
The average raw reflectance spectra and the standard deviation of the two classes of polymers are shown in

Preprocessing Sets and Variables Selection
In order to emphasize the spectral differences between PET and contaminants, three sets of preprocessing techniques were used, that is: appropriate number of latent variables (LVs) ( Figure S1). The optimal number of LVs was also determined by the smaller difference between RMSEC and RMSECV [56][57][58].

PLS-DA Performances
The classification performances obtained by PLS-DA models were evaluated in terms of statistical parameters: sensitivity, specificity and efficiency (Equations 1, 2, and 3).

Average Raw Reflectance Spectra
The average raw reflectance spectra and the standard deviation of the two classes of polymers are shown in

Preprocessing Sets and Variables Selection
In order to emphasize the spectral differences between PET and contaminants, three sets of preprocessing techniques were used, that is: The average reflectance spectra of the PET and contaminant classes resulting from the application of the aforementioned preprocessing sets, are shown in Figure 4. The average reflectance spectra of the PET and contaminant classes resulting from the application of the aforementioned preprocessing sets, are shown in Figure 4.

Preprocessing Sets and Variables Selection
In order to emphasize the spectral differences between PET and contaminants, three sets of preprocessing techniques were used, that is: The average reflectance spectra of the PET and contaminant classes resulting from the application of the aforementioned preprocessing sets, are shown in Figure 4.  Subsequently, for each preprocessing set, the CARS method [44] was applied, in order to reduce the number of wavelengths useful to discriminate the spectral characteristics between PET and contaminants. The selected wavelengths for each adopted preprocessing sets are shown in Table 2.  Subsequently, for each preprocessing set, the CARS method [44] was applied, in order to reduce the number of wavelengths useful to discriminate the spectral characteristics between PET and contaminants. The selected wavelengths for each adopted preprocessing sets are shown in Table 2. PCA score and loadings plots are shown in Figure 5. Most of the variance was captured by the first two PCs, as shown in the score plot (Figure 5a), where PC1 and PC2 explained about the 74.09% and 21.07% of the variance, respectively. The PCA score plot showed two clouds corresponding to the two analyzed classes (i.e., PET and contaminant). The cluster separation was acceptable with a low overlapping of clouds in the fourth quadrant. In more detail, the PET scores, due to the low spectral variance and high uniformity detected in PET samples, were more grouped than the contaminant scores. The variance of the contaminant was greater than the PET class, as it was influenced by the spectral combination of different polymers, as shown in the PCA score plot. The loadings plot of PC1 and PC2 was shown in Figure 5b. The main PC1 variance was given by the wavelengths around 1240 and 1720 nm for positive values, while for negative values it was mainly given by the wavelengths around 1320 and 2100 nm. PC2 was mostly influenced by wavelengths around 1730, 1910, and 2100 nm for positive values, whereas negative values were highlighted by wavelengths about 1005 and 2450 nm.

PCA Results of the Preprocessing Set 2 (SNV + MC)
PCA score and loadings plots are shown in Figure 6. The PCA model showed a captured variance of 95.07% with 3 PCs. The best separation between PET and contaminant clusters was allowed by PC1 vs. PC2, as shown in the PCA score plot (Figure 6a). Cluster separation was very noticeable with few overlapping pixels compared to the previous

PCA Results of the Preprocessing Set 2 (SNV + MC)
PCA score and loadings plots are shown in Figure 6. The PCA model showed a captured variance of 95.07% with 3 PCs. The best separation between PET and contaminant clusters was allowed by PC1 vs. PC2, as shown in the PCA score plot (Figure 6a). Cluster separation was very noticeable with few overlapping pixels compared to the previous preprocessing sets. PET cluster was mainly located in the first and second quadrant, while the distribution of the contaminant scores was mainly localized in the third and fourth quadrant. Both clusters showed a similar variance distribution. Therefore, the preprocessing Set 2 (SNV + MC) approach allowed to minimize the intra-class variance, emphasizing the differences between PET and contaminant classes. The loadings plots of PC1 and PC2 are shown in Figure 6b

PCA Results of the Preprocessing Set 2 (SNV + MC)
PCA score and loadings plots are shown in Figure 6. The PCA model showed a captured variance of 95.07% with 3 PCs. The best separation between PET and contaminant clusters was allowed by PC1 vs. PC2, as shown in the PCA score plot (Figure 6a). Cluster separation was very noticeable with few overlapping pixels compared to the previous preprocessing sets. PET cluster was mainly located in the first and second quadrant, while the distribution of the contaminant scores was mainly localized in the third and fourth quadrant. Both clusters showed a similar variance distribution. Therefore, the preprocessing Set 2 (SNV + MC) approach allowed to minimize the intra-class variance, emphasizing the differences between PET and contaminant classes. The loadings plots of PC1 and PC2 are shown in Figure 6b. The PC1 variance was mainly given for positive values by the wavelengths around 1720, 2250, and 2480 nm, and for negative values by the wavelengths around 1020, 1130, and 1326 nm. PC2 was mostly marked for positive values by wavelengths around 1206, 1720, and 1920 nm, and for negative values by wavelengths around 1650 and 2255 nm.

PCA Results of Preprocessing Set 3 (MSC + Derivative + MC)
PCA scores and loadings plots are shown in Figure 7. The PCA model showed a captured variance of 95.59% with 3 PCs. The best separation between PET and contaminants

PCA Results of Preprocessing Set 3 (MSC + Derivative + MC)
PCA scores and loadings plots are shown in Figure 7. The PCA model showed a captured variance of 95.59% with 3 PCs. The best separation between PET and contaminants was allowed by PC1 vs. PC2. The PCA score plot showed two clusters related to PET and contaminant classes. The score plot showed a cluster separation and a low cluster overlap in the central zone (Figure 7a). PET cluster was mainly located in the first quadrant, while the class of contaminants was mainly localized in the second quadrant. Therefore, preprocessing Set 3 (MSC + Derivative + MC) allowed to minimize the intra-class variance, and to preserve the spectral differences between the two classes. The loadings plot of PC1 and PC2 is shown in Figure 7b. The PC1 variance was mainly given, for positive values, by the wavelengths around 2274 nm, and for negative values by the wavelengths around 2300 nm. PC2 was mainly influenced for positive values by wavelengths around 1050 nm, and for negative values by wavelengths around 1060 nm.  Figure 8) and border-effect in some flakes. The results obtained by PLS-DA models related to preprocessing Set 2 (SNV + MC) and 3 (MSC + Derivative + MC) showed a similar prediction quality, with few misclassification pixels mainly due to border-effect. However, the few pixels not correctly assigned, do not significantly affect the correct class recognition.
R PEER REVIEW 10 of 16 was allowed by PC1 vs. PC2. The PCA score plot showed two clusters related to PET and contaminant classes. The score plot showed a cluster separation and a low cluster overlap in the central zone (Figure 7a). PET cluster was mainly located in the first quadrant, while the class of contaminants was mainly localized in the second quadrant. Therefore, preprocessing Set 3 (MSC + Derivative + MC) allowed to minimize the intra-class variance, and to preserve the spectral differences between the two classes. The loadings plot of PC1 and PC2 is shown in Figure 7b. The PC1 variance was mainly given, for positive values, by the wavelengths around 2274 nm, and for negative values by the wavelengths around 2300 nm. PC2 was mainly influenced for positive values by wavelengths around 1050 nm, and for negative values by wavelengths around 1060 nm.

PLS-DA Models Constructed for a Limited Set of Spectral Variables
Starting from the characteristics detected by the PCA of each preprocessing set with selected variables, a PLS-DA model was constructed. The correct number of LVs was chosen based on the smallest difference between the root mean square error for calibration (RMSEC) and cross-validation (RMSECV) values (Table 3) Figure 8) and border-effect in some flakes. The results obtained by PLS-DA models related to preprocessing Set 2 (SNV + MC) and 3 (MSC + Derivative + MC) showed a similar prediction quality, with few misclassification pixels mainly due to border-effect. However, the few pixels not correctly assigned, do not significantly affect the correct class recognition.
The classification performances obtained by the different preprocessing sets, shown in Table 4, revealed sensitivity and specificity values in calibration, cross-validation, and prediction ranging from 0.957 to 0.999. Efficiency values in prediction ranges from 0.969  Table 3. Root mean square error for calibration (RMSEC) and cross-validation (RMSECV) for the three PLS-DA models constructed for preprocessing Set 1, 2, and 3. The classification performances obtained by the different preprocessing sets, shown in Table 4   Finally, the performances of full spectrum PLS-DA using preprocessing Set 3 (MSC + Derivative + MC) were compared with those obtained in variables selection mode with the same preprocessing set. In details, the full spectrum PLS-DA model showed a captured variance of 99.39% with 5 LVs. The LVs number was chosen based on the smallest difference between the RMSEC and RMSECV values (Table 5). Full spectrum PLS-DA prediction results are shown in Figure 9. In particular, PET and contaminant classes were well predicted, with sensitivity and specificity values in calibration, cross-validation and prediction phases and efficiency (Table 6) ranging from 0.986 to 1.000 for both classes. Finally, the performances of full spectrum PLS-DA using preprocessing + Derivative + MC) were compared with those obtained in variables selectio the same preprocessing set. In details, the full spectrum PLS-DA model sh tured variance of 99.39% with 5 LVs. The LVs number was chosen based on difference between the RMSEC and RMSECV values (Table 5). Full spectr prediction results are shown in Figure 9. In particular, PET and contaminant well predicted, with sensitivity and specificity values in calibration, cross-va prediction phases and efficiency (Table 6) ranging from 0.986 to 1.000 for bot    The comparison of the prediction results based on the PLS-DA-Set 3 (MSC + Derivative + MC) applied to the full spectrum hypercubes (Figure 9) and to the 14 selected wavelengths (Figure 8c) showed as they are similar. In detail, analyzing the sensitivity, specificity and efficiency values of the two models, it can be noticed a slight increase in misclassified pixel/spectra in the PLS-DA in variable selection model. However, the misclassified pixels were mainly located along the boundary of the samples, not affecting the correct attribution of the class.

Economic and Environmental Impact
The systematic implementation of the HSI detection and classification-based logic could have important effects both at commercial-industrial and at economic-environmental level. The proposed approach can produce not only a better separation efficiency, but also a product of better quality. The fulfilment of these two goals generates social, economic, and environmental benefits [59]. In fact, in an economic viability context, a stronger and widespread PET recycling sector generates employment and contributes to reduce the volume of municipal solid waste [60]. In addition, high-quality recycled PET contributes to reduce the consumption of energy and non-renewable raw materials, [61], according to the sustainable development goals (SDGs) of UN Agenda 2030, and in particular to SDG 12, and to the principles of circular economy.

Conclusions
The application of HSI in the SWIR region was investigated to evaluate the feasibility of a rapid and non-destructive method for the identification of plastic contaminants in a recycled PET flakes stream, producing a high-quality secondary raw material. CARS was tested as variable selection method after the application of three different preprocessing sequences to identify the best combination for the recognition of contaminants in PET stream with a limited number of wavelengths. The results of the variable selection obtained by CARS were evaluated by a PLS-DA model for each set of selected wavelengths. The best prediction results in calibration and cross-validation were provided by the combination of CARS and the preprocessing Set 3 (MSC + Derivative + MC), reducing the spectral dataset from 240 to 14 wavelengths. In addition, a comparison was made between the performances of the full spectrum PLS-DA model using preprocessing Set 3 (MSC + Derivative + MC) and those obtained in variable selection mode with the same preprocessing set. The results demonstrated that the correctness of the classification was similar, further highlighting the possibility to identify plastic contaminants in the recycled PET flakes stream using a limited number of key wavelengths, useful for online sorting applications.
The current study supplied an effective procedure for variable selection from hyperspectral images, reducing data redundancy and obtaining a prediction efficiency close to that obtained by the full spectrum PLS-DA model. The obtained results enable the possibility to build a multispectral detection system based on filters analyzing selected spectral regions, with a significant reduction in costs compared to a conventional full spectrum hyperspectral camera and ensuring a high quality of recycled PET stream.
Funding: This research received no external funding.