Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes

Kanakis, Dimitris; Berdiñas, Zaira M.; Sioutas, Konstantinos N.; Santamarina, Elena; Prieto, Camilo; Koumoulos, Elias P.

doi:10.3390/ma19040751

Open AccessArticle

Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes

by

Dimitris Kanakis

¹,

Zaira M. Berdiñas

²,

Konstantinos N. Sioutas

¹

,

Elena Santamarina

²

,

Camilo Prieto

² and

Elias P. Koumoulos

^1,*

¹

IRES—Innovation in Research & Engineering Solutions SNC, Silversquare Europe, Square de Meeûs 35, 1000 Brussels, Belgium

²

AIMEN Technology Centre, Polígono Industrial de Cataboi SUR-PPI-2 (Sector 2) Parcela 3, 36418 O Porriño, Pontevedra, Spain

^*

Author to whom correspondence should be addressed.

Materials 2026, 19(4), 751; https://doi.org/10.3390/ma19040751

Submission received: 20 November 2025 / Revised: 18 December 2025 / Accepted: 15 January 2026 / Published: 14 February 2026

(This article belongs to the Special Issue Carbon Fiber-Reinforced Polymers (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

Efficient sorting of resin-based CFRP composites is critical for optimizing composite recycling streams. In this work, a methodology integrating Laser-Induced Breakdown Spectroscopy (LIBS) with Machine Learning (ML)-enhanced classification models to achieve accurate material discrimination is presented. LIBS is employed to identify the chemical composition of individual compounds, producing spectrograms that are subsequently processed to group chemically similar materials based on Epoxy resin (Bisphenol-A). The grouped datasets that contain 4000 peaks and 665 features were sampled to standardize feature dimensionality and cleaned to remove noise. A statistical analysis is then conducted to select the most informative features, followed by dimensionality reduction using Linear Discriminant Analysis (LDA). Finally, classification is performed using a Support Vector Classification (SVC) model, fine-tuned to the processed data to maximize accuracy. With a 5-fold cross validation (CV), the average nested accuracy score is 0.8317 ± 0.0212. This integrated approach demonstrates the potential for advancing automated sorting technologies in composite recycling applications.

Keywords:

recycling; identification; sorting; spectroscopy; machine learning; epoxy resins; composites

Graphical Abstract

1. Introduction

Carbon fiber-reinforced polymers (CFRPs) composites are increasingly used in high-performance applications in aerospace, automotive, marine, and renewable energy sectors owing to their excellent strength-to-weight ratio, fatigue resistance, and design versatility. However, as emphasized by Oliveux et al. [1] and Pickering [2], their end-of-life management remains one of the most critical sustainability challenges in advanced material engineering. The presence of thermosetting polymer matrices prevents reshaping or remelting, which limits conventional recycling and increases the need for innovative circular solutions.

Among existing strategies, chemical recycling has gained growing attention as it allows depolymerization of the thermoset matrix into reusable feedstocks, enabling fiber recovery with minimal performance degradation. Yet, the effectiveness of these processes strongly depends on accurate identification and sorting of waste feedstock. As shown in previous research, misclassification of resin types can alter chemical processing conditions and reduce recycling efficiency, underscoring the need for reliable, high-throughput identification systems [3,4].

Traditional analytical tools such as Fourier transform infrared (FTIR) spectroscopy, Raman spectroscopy, differential scanning calorimetry (DSC), and thermogravimetric analysis (TGA) offer valuable insight into polymer structure but are limited by slow data acquisition and sample preparation requirements. Several authors [5,6] presented a comparison of their main characteristics and highlighted the growing role of photonic and spectroscopic techniques in advancing material identification for industrial sorting applications. In this context, Laser-Induced Breakdown Spectroscopy (LIBS) has emerged as a promising tool due to its microsecond-scale response, minimal preparation, and non-contact nature. As discussed, LIBS is particularly suitable for rapid multi-element analysis and process automation [7,8]. LIBS operates by focusing a high-energy laser pulse on a material surface to create a microplasma whose emission lines reflect its elemental composition. LIBS offers unique advantages over IR and Raman spectroscopy for this purpose. While IR and Raman are powerful molecular probes, they often require contact or clean sample surfaces and may suffer from fluorescence, especially in the case of dark or heterogeneous samples. Despite LIBS speed and flexibility, the complex spectra produced containing overlapping lines and background requires robust computational analysis for accurate classification.

Recent studies [9,10] have demonstrated that combining LIBS with Machine Learning (ML) can dramatically enhance spectral interpretation and discrimination performance across polymers and composites. These data-driven models extract subtle, multivariate signatures invisible to conventional spectral comparison, enabling automated and reproducible material sorting.

Tymoshchuk et al. [11] further showed that ML models such as Support Vector Machines (SVMs), Random Forests (RFs), and neural networks outperform traditional threshold-based analyses when processing high-dimensional LIBS spectra. Similarly, researchers reported that while most LIBS–ML research targets thermoplastic polymers, thermosetting matrices such as epoxy and phenolic systems remain underexplored [12]. This gap presents both a scientific and industrial opportunity to extend the methodology toward thermoset resin identification, where differences in plasma behavior, elemental composition, and emission stability pose unique analytical challenges.

Machine learning substantially enhances the analytical power of LIBS by converting complex, high-dimensional spectra into informative, low-dimensional representations suitable for automated classification. Feature selection and dimensionality reduction, such as Linear Discriminant Analysis (LDA), improve class separability, while Support Vector Classification (SVC) provides robust decision boundaries in the presence of spectral overlap and limited training data. These techniques capture the subtle chemical and structural differences necessary to distinguish resin systems that exhibit similar optical signatures.

Accordingly, this study, produced by the European recycling and circularity in large composite components (EuReComp) project, proposes an integrated LIBS–ML methodology to classify resin-based composite materials, emphasizing thermoset systems extensively used in aeronautic and wind turbine blade components, thus based on epoxy resins. The proposed workflow encompasses spectral preprocessing, feature selection, LDA-based dimensionality reduction, and optimized SVC classification. This framework is designed to demonstrate the feasibility of LIBS–ML approaches for high-accuracy material discrimination while contributing to a standardized and reproducible pipeline for composite recycling analysis.

The present work builds upon the foundation of prior LIBS–ML research while extending it into thermosets resin identification. By doing so, it aligns with recent efforts, such as those reported by Okafor et al. [13], toward data-driven sustainability in composite design and recycling. The results aim to expand the role of photonic spectroscopy and artificial intelligence in achieving circularity within advanced materials systems.

2. Materials and Methods

A structured and iterative workflow was established to integrate experimental LIBS spectroscopy with Machine Learning classification, enabling progressive refinement of both data acquisition and model performance (Figure 1). The process begins with the definition of experimental parameters and the acquisition of spectra, which are subsequently preprocessed through baseline correction, intensity normalization, and feature extraction. The resulting dataset is partitioned into a training subset, used to develop and tune the predictive model, and an independent test subset is reserved for final evaluation. During model development, variable selection is applied to identify the most informative spectral features and reduce dimensionality prior to classifier training. The iterative feedback loop enables adjustments in preprocessing routines, feature selection strategies, and even experimental conditions, allowing systematic improvement of data quality and classification accuracy. The final model is validated on the independent test set to ensure reliability.

2.1. Materials

Table 1 presents the reference materials used for the classification study. The samples consist of the same resin combined with a specific hardener to achieve a solid-state matrix. The selected resin, Bisphenol A, belongs to the epoxy family, the most common choice in high-end composite applications. The study included eight different resin-hardener systems, grouped into four matrix categories depending on the type of hardener. The classification approach is applied to identify both individual samples and grouped systems (matrix) according to their hardener family.

2.2. Experimental LIBS Setup

A LIBS workstation has been set up for the execution of these project developments. This station consists of a high-power Nd:YAG laser that emits nanosecond pulses at 10 Hz at 1064 nm. The laser pulses are guided into a safety laser cabin with mirrors and lenses that focus the laser beam at the surface of the sample. The energy of the laser was set to 70 mJ using a focusing lens of 300 mm, which provides a spot diameter of 150 µm at the focal plane. A precise motorized XYZ stage is used to place and move the sample allowing the acquisition of a large amount of data in a short period of time. The laser pulses have the necessary energy to break down the surface of the material and create plasma. The light emitted by plasma is collected by a system of mirrors and an optical fiber which guides the light to a high-resolution spectrometer that collects plasma light from 200 to 900 nm with a delay of 800 ns and exposure time of 60 µs. LIBS spectra acquisition for large datasets is fully automated (Figure 2).

2.3. Data Acquisition and Spectra Processing

A spectral dataset consisting of 500 spectra per material was acquired at different points of each sample, resulting in a total 4000 of spectra. Prior to spectral acquisition, the sample surface was cleaned with two laser pulses to remove any possible superficial contamination. Each spectrum was collected by accumulating the emission from 10 laser pulses at each measurement point. All measurements were performed at ambient laboratory conditions.

Raw spectra were first corrected by subtracting the dark current, defined as the spectrum taken with no laser source, which accounts only for electronic noise, covering the same wavelength range as the raw spectra. A dark current spectrum was recorded for each experiment and subtracted from the corresponding resin dataset. Subsequently, spectra were normalized using the Standard Normal Variate (SNV) method by subtracting its mean and dividing by the standard deviation of the original spectrum.

It is well known that keeping redundant or unnecessary information when training models for classification reduces the efficiency of the data analysis by increasing, besides the computing time, the multidimensional space. This issue is commonly referred in Machine Learning as the “curse of dimensionality”, which can reduce generalization of the models when new samples are tested. Therefore, the next step in the experimental data analysis focused on extracting meaningful descriptors from the spectra, instead of using the entire spectrum profile to train models. This approach consists of calculating intensity ratios between selected elemental and molecular emission lines that are used as input features for model training.

In this context, a peak analysis was conducted to identify the atomic emission lines relevant to the sample composition (Figure 3). Since both resins and hardeners are organic materials, primarily composed of C, H, N, and O, the emission lines of these elements were selected. Additionally, the CN and C₂ bands provide complementary evidence of organic matter and offer further insight into the molecular structure of the sample. Peak intensities for each selected line were calculated by applying a parabolic fit around local maxima to reduce the errors given by wavelength resolution. Combining these emission lines allows the detection of small variations in chemical composition, and the use of ratios provides more robust indicators, as relative measurements help to mitigate experimental instabilities. Therefore, the ratios between pairs of these spectral line peaks are calculated, resulting in a total of 665 descriptors per spectrum, which constituted the descriptors matrix.

2.4. Selective Feature Importance

Different descriptors contribute unequally to the resin identification task; therefore, several feature reduction techniques were applied to reduce the dimensionality of the descriptor matrix while preserving the relevant information prior to training the classification model. Although dimensionality reduction is crucial for performance, selectively eliminating features provides a more targeted approach. By retaining informative features and removing redundant ones that hinder inference, this strategy enables improved performance across multiple classification architectures.

Exploratory analysis revealed substantial discrepancies among the three acquisition dates. One of the days showed statistical characteristics that deviated markedly from the others: (i) a significantly reduced outlier density based on IQR analysis, indicating unusually low within-day variability, and (ii) abnormally high intra-label correlation, more than twice that observed in the remaining days. To quantify these discrepancies, multivariate distribution comparisons were performed, confirming a pronounced shift in the feature space indicative of a batch effect. Because such distributional drift violates the assumption of identically distributed samples and can lead the classifier to learn acquisition-day artifacts rather than class-dependent structure, this day was deemed non-representative and excluded prior to model training. This exclusion was based solely on objective data-quality criteria and applied uniformly across acquisition days.

Additionally, to address label imbalance and improve model generalization, a down-sampling strategy was employed. This approach reduced the dominance of overrepresented classes and contributed to more accurate classification performance by preventing the model from overfitting to majority label patterns. The process, as depicted in Figure 4, begins with data acquisition. Following this, the temporal and categorical aspects of the dataset are analyzed for outlier detection and correlation filtering. Next, labels representing similar physicochemical behavior are grouped, and down-sampling is applied. The data are subsequently normalized, followed by feature selection and dimensionality reduction, before being divided into training and testing sets using the train_test_split function [14] with a test size of 0.2 and stratification based on a 5-fold scheme (k = 5). The processed data are then passed to the Machine Learning model for prediction.

First, normalization is used to ensure comparability across features and to facilitate downstream statistical analyses. All input data were first standardized using the z-score transformation (Equation (1)).

(\frac{X - μ}{σ})

(1)

where

μ: mean value of each feature
σ: standard deviation of each feature

To further approximate a Gaussian distribution and address skewness in the data, the Yeo–Johnson transformation was subsequently applied [15]. This transformation is suitable for both positive and negative values and allows for a flexible normalization framework.

After normalization, features exhibiting low variance were removed, based on the assumption that they contribute minimally to the model’s discriminative power. This step reduces noise and computational complexity. The dataset was then partitioned into training and testing subsets to enable evaluation under non-overlapping conditions and to ensure model generalizability. To curate the dataset of biases endowed, an outlier test was utilized. The Interquartile Range (IQR) alongside the quantiles of 25% (Q1) and 75% (Q3) for each feature was computed. The calculation of lower and upper bounds is shown in Equations (2) and (3)

L o w e r = Q 1 - 3 \times I Q R

(2)

U p p e r = Q 3 + 3 \times I Q R

(3)

To identify the number of outliers, counting the amount of data points that reside outside of the calculated bounds for each label and each day was performed. Then, the outlier count was then filtered by a threshold, and data points with a population of less than 40 outliers were excluded. The result of this process is the outlier density for each day. The day with the lowest density is suspected of being clustered by correlated values.

A Pearson’s correlation coefficient [16] is then calculated to assist the pruning efforts. Days that exhibit high correlation between the represented labels and low outlier density are candidates for exclusion. By acknowledging the known biases that the dataset possesses, the acquisition date with underrepresented labels, low outlier density, and high correlation between its labels was removed. The curation is then finalized by merging physiochemically similar labels and down-sampling each group to minimize the interference of correlated labels, improve generalization, and ensure that all groups have equal population sizes to maintain consistency and balance.

To evaluate feature relevance and reduce dimensionality, a multistage feature selection pipeline was employed. Initially, Analysis of Variance (ANOVA), a statistical hypothesis test that is used to determine if there are statistically significant differences between the means of three or more groups, was performed to ensure that only the most relevant features are used in the analysis. ANOVA utilizes F-statistic, a ratio that compares the variance between groups with the variance within groups. Then, feature selection is assisted by using the Mutual Information algorithm that measures mutual dependence between groups.

The Mutual Information algorithm (MI) [17] was used to quantify the degree of dependency between each feature and the target variable. The MI criterion, computed via entropy-based estimations from k-nearest neighbor statistics, captures both linear and non-linear associations. Based on this measure, the top 200 most informative features were selected to form a compact yet representative feature subset.

Dimensionality reduction was further refined using Linear Discriminant Analysis (LDA) [18], a supervised technique that projects the data onto a lower-dimensional space by maximizing the ratio of between-class variance to within-class variance. LDA thus enhances class discrimination by emphasizing directions in the feature space that are most relevant for classification. The selected features, products of the previous process, are examined to find linear combinations of features between classes. This process ensures that only features that retain 90% of variance are kept for classification.

The final classification was performed using a Support Vector Machine (SVM) classifier [19,20], a robust method known for its capacity to handle high-dimensional feature spaces. To optimize model performance, a hyperparameter tuning procedure was conducted, typically involving grid search or cross-validated optimization, to identify the best-performing kernel and regularization parameters.

An overview of building blocks for the developed classification pipeline is illustrated in the following Figure 5:

3. Results and Discussion

The overall approach to this issue was to identify problematic clusters of data and prune them. The nature of the experiment’s results served to mitigate the interference of label’s correlation by grouping “families” of labels. However, data acquisition on certain dates and the absence of labels on said dates proved to be a significant obstacle in predictive clustering. Several models were employed to mitigate this issue; Random Forest (RF) [21], and Multi-Layer Perceptron Classifier (MLP) [22,23,24] were used, as well as a comparison to SVC-Grouping and Dates (SVC-GD) and their filtered counterparts noted as MLP-Grouping and Dates (MLP-GD) and Random Forest-Grouping and Dates (Random Forest-GD). The results are summarized in Table 2.

The Outlier filter is the reason SVC-GD stands out compared to other architectures because it is used in tandem with the knowledge of correlation of labels and can identify problematic data to be excluded from the analysis.

When the outlier filter is applied in combination with other models, a substantial improvement in test accuracy is observed. Outlier removal consistently improves prediction performance, largely independent of model complexity. Among the evaluated approaches, the SVC-GD model achieves the highest performance (average nested CV score: 0.8317 ± 0.0212) This can be attributed to the kernel formulation, which is well suited for handling high-dimensional and correlated data by improving class separability without explicitly increasing model complexity.

In contrast, MLP models rely on multilayer, hierarchical representations that can become computationally expensive and less interpretable, while Random Forest classifiers produce piecewise, axis-aligned decision boundaries that may be too rigid or locally complex to capture the smoother structure of the data.

A visual comparison with the plain SVC model, can be made with the SVC-GD model, shown in Figure 6 and Figure 7, where the matrices match the corresponding Material ID as shown in Table 1. The plain model, as shown in Figure 8 and Figure 9, while retaining the spatial relations between the principal components, is more susceptible to mislabeling.

The high level of multicollinearity of the data is apparent even after data curation and can be seen in the following figures that show the geometric relations between the components that capture 90% of the total variations and the confusion matrices for train and test data, respectively (Figure 6, Figure 7, Figure 8 and Figure 9). The component geometry graphs represent the spatial relationship between the two principal components, product of the Linear Discriminant Analysis (LDA), and their respective predicted labels. Alongside the component geometry, the confusion matrix is presented. The confusion matrix calculates the amount of correct and falsely predicted labeled data for each label.

4. Limitations

The applicability and performance of the proposed filter are inherently linked to the characteristics of the dataset used in this study, which introduces several limitations that should be acknowledged. In particular, the effectiveness of the filter depends on the correlational structure of the feature space. In the present case, the data exhibit a high degree of intrinsic correlation, as all specimens share a common carbon fiber reinforcement and differ primarily in the resin matrix. This structured correlation makes outlier density a meaningful proxy for anomaly detection. For datasets lacking similar correlation patterns, however, the performance and reliability of the filter may be reduced, and different outcomes should be expected.

Additional limitations arise from practical aspects of data acquisition. The dataset is affected by non-uniform sampling and missing feature values, both of which can distort its statistical representation and influence the behavior of the filter. Although imputation strategies were employed to mitigate these effects, such imperfections introduce uncertainty and limit the direct generalization of the method to idealized or uniformly sampled datasets.

To manage the dimensionality and complexity of the tabular data, Linear Discriminant Analysis (LDA) was adopted as a preprocessing step. While this choice improves computational tractability and alleviates the curse of dimensionality, it inevitably results in partial information loss. By retaining 90% of the explained variance, a controlled trade-off is made between information preservation and dimensionality reduction. Consequently, the filter operates on a transformed representation of the original data, and this attenuation should be considered when interpreting the results.

Finally, alternative strategies for addressing feature redundancy, such as mutual-information-based feature selection, were considered but ultimately not pursued. In highly correlated datasets such as this one, these approaches may introduce artificial coupling between features, potentially reducing physical interpretability without ensuring a corresponding improvement in anomaly detection performance.

5. Conclusions

This study demonstrates that the integration of Laser-Induced Breakdown Spectroscopy (LIBS) with targeted data curation and Machine Learning classification enables effective discrimination of epoxy resin matrices in CFRP composites, addressing a key challenge in composite recycling. Using a Support Vector Classification model, an average nested cross-validation accuracy of 0.8317 ± 0.0212 was achieved, despite the intrinsic complexity and variability of resin-based spectral data.

The results highlight that classification performance is strongly influenced by dataset biases arising from resin chemistry, feature correlation, and experimental variability. Selective dataset curation—through outlier-based filtering, dimensionality reduction, and balanced sampling—was critical in mitigating these effects and consistently improved model performance across different classification architectures. These findings emphasize the statistically significant role of feature interdependencies in resin identification tasks.

Although fiber reinforcement can affect plasma formation, LIBS remains particularly effective for probing the polymer matrix, where its speed and robustness support high-throughput spectral discrimination. Overall, the proposed framework provides a quantitative and practical pathway toward automated resin sorting, with clear potential to enhance circularity in CFRP value chains. Future work will focus on expanding the range of resin chemistries to improve generality and industrial applicability.

Author Contributions

D.K.: Machine Learning analysis and experimentation, K.N.S.: Data analysis/preprocessing and baseline model development, E.S.: LIBS experiments and data processing, Z.M.B.: Methodology and data processing, C.P.: Conceptualization, writing—review and editing, E.P.K.: Overview, review, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Commission under the CL42021-RESILIENCE-01 project European recycling and circularity in large composite components (EuReComp), call “A Digitized, Resource-Efficient and Resilient Industry 2021” HORIZON Research and Innovation Actions, GA Number 101058089. The views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or HADEA. Neither the European Union nor HADEA can be held responsible for them.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://zenodo.org/records/17750886.

Acknowledgments

Acknowledgement to the project partner that provided reference materials (TUD, ITA, INEGI and UPATRAS).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Oliveux, G.; Dandy, L.O.; Leeke, G.A. Current status of recycling of fibre reinforced polymers: Review of technologies, reuse and resulting properties. Prog. Mater. Sci. 2015, 72, 61–99. [Google Scholar] [CrossRef]
Pickering, S.J. Recycling technologies for thermoset composite materials—Current status. Compos. Part A Appl. Sci. Manuf. 2006, 37, 1206–1215. [Google Scholar] [CrossRef]
Butenegro, J.; Bahrami, M.; Abenojar, J.; Martínez Casanova, M. Recent Progress in Carbon Fiber Reinforced Polymers Recycling: A Review of Recycling Methods and Reuse of Carbon Fibers. Materials 2021, 14, 6401. [Google Scholar] [CrossRef] [PubMed]
Chen, B.; Gong, J.; Huang, W.; Gao, N.; Deng, C.; Gao, X. Constructing a parallel aligned shish kebab structure of HDPE/BN composites: Toward improved two-way thermal conductivity and tensile strength. Compos. Part B Eng. 2023, 259, 110699. [Google Scholar] [CrossRef]
Adarsh, U.K.; Kartha, V.B.; Santhosh, C.; Unnikrishnan, V.K. Spectroscopy: A promising tool for plastic waste management. TrAC Trends Anal. Chem. 2022, 149, 116534. [Google Scholar] [CrossRef]
Araujo-Andrade, C.; Bugnicourt, E.; Philippet, L.; Rodriguez-Turienzo, L.; Nettleton, D.; Hoffmann, L.; Schlummer, M. Review on the photonic techniques suitable for automatic monitoring of the composition of multi-materials wastes in view of their posterior recycling. Waste Manag. Res. 2021, 39, 631–651. [Google Scholar] [CrossRef] [PubMed]
Weiss, Z.; Concepcion-Mairey, F.; Pickering, J.C.; Smid, P. Emission spectroscopic study of an analytical glow discharge with plane and hollow cathodes: Titanium and iron in argon discharge. Spectrochim. Acta Part B At. Spectrosc. 2021, 180, 106208. [Google Scholar] [CrossRef]
Brunnbauer, L.; Gajarska, Z.; Lohninger, H.; Limbeck, A. A critical review of recent trends in sample classification using Laser-Induced Breakdown Spectroscopy (LIBS). TrAC Trends Anal. Chem. 2023, 159, 116859. [Google Scholar] [CrossRef]
Cho, K.; Bahn, H. A Lightweight File System Design for Unikernel. Appl. Sci. 2024, 14, 3342. [Google Scholar] [CrossRef]
Gajarska, Z.; Brunnbauer, L.; Lohninger, H.; Limbeck, A. Identification of 20 polymer types by means of laser-induced breakdown spectroscopy (LIBS) and chemometrics. Anal. Bioanal. Chem. 2021, 413, 6581–6594. [Google Scholar] [CrossRef] [PubMed]
Tymoshchuk, D.; Didych, I.; Maruschak, P.; Yasniy, O.; Mykytyshyn, A.; Mytnyk, M. Machine Learning Approaches for Classification of Composite Materials. Modelling 2025, 6, 118. [Google Scholar] [CrossRef]
Yılmaz, V.S.; Eseller, K.E.; Aslan, O.; Bayraktar, E. Classification of Different Recycled Rubber-Epoxy Composite Based on Their Hardness Using Laser-Induced Breakdown Spectroscopy (LIBS) with Comparison Machine Learning Algorithms. Inventions 2023, 8, 54. [Google Scholar] [CrossRef]
Okafor, C.E.; Iweriolor, S.; Ani, O.I.; Ahmad, S.; Mehfuz, S.; Ekwueme, G.O.; Chukwumuanya, O.E.; Abonyi, S.E.; Ekengwu, I.E.; Chikelu, O.P. Advances in machine learning-aided design of reinforced polymer composite and hybrid material systems. Hybrid Adv. 2023, 2, 100026. [Google Scholar] [CrossRef]
Train_Test_Split. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html (accessed on 18 December 2025).
Yeo, I.-K.; Johnson, R. A new family of power transformations to improve normality or symmetry. Biometrika 2000, 87, 954–959. [Google Scholar] [CrossRef]
Soper, H.E.; Young, A.W.; Cave, B.M.; Lee, A.; Pearson, K. On the Distribution of the Correlation Coefficient in Small Samples. Appendix II to the Papers of “Student” and R. A. Fisher. Biometrika 1917, 11, 328–413. [Google Scholar] [CrossRef]
Kraskov, A.; Stoegbauer, H.; Grassberger, P. Estimating Mutual Information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
Lachenbruch, P.A. Discriminant Analysis. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2014; ISBN 978-1-118-44511-2. [Google Scholar]
Awad, M.; Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef]
Hinton, G.E. Connectionist learning procedures. Artif. Intell. 1989, 40, 185–234. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hintont, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Bengio, Y.; Glorot, X. Understanding the difficulty of training deep feed forward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]

Figure 1. Overview of the methodology for developing a ML classification model based on LIBS spectra.

Figure 2. Scheme of experimental LIBS setup.

Figure 3. Representative LIBS spectra of a thermoset resin with main elemental and molecular peaks identified.

Figure 4. The workflow of the Group and Dates filter. Black arrows indicate the primary workflow. Blue arrows denote a direct correlation between the data components (Values, Dates, and Labels). Yellow arrows indicate that Labels and Dates inform the application of the Outlier and Correlation Filters.

Figure 5. Methodology developed for data processing and classification algorithms.

Figure 6. Scatter plot of the training data projected onto the first two of three Linear Discriminant Analysis (LDA) components. These two components capture 90% of the total variance, providing effective dimensionality reduction while preserving class separability. The LDA model achieves a training classification accuracy of 88.5%, as reflected in the associated confusion matrix comparing predicted and true labels for four classes (matrix1–matrix4). The clear separation of classes along Components 1 and 2 highlights LDA’s ability to identify directions that maximize class discriminability.

Figure 7. Scatter plot of the testing data projected onto the first two of three Linear Discriminant Analysis (LDA) components. These components capture 90% of the total variance, enabling dimensionality reduction while maintaining class separation. The LDA model achieves a classification accuracy of 80% on the test set, as indicated by the associated confusion matrix.

Figure 8. Scatter plot of the training data without the GD filter projected onto the first two of five Linear Discriminant Analysis (LDA) components. These components capture 90% of the total variance. The samples exhibit a high degree of correlation, which reduces class separability and negatively impacts classification performance, as reflected in the associated confusion matrix. The LDA model achieves a training classification accuracy of 79%.

Figure 9. Scatter plot of the testing data without the GD filter projected onto the first two of five Linear Discriminant Analysis (LDA) components. These components capture 90% of the total variance, providing dimensionality reduction while maintaining class separation. The LDA model achieves a classification accuracy of 79% on the test set, as shown in the associated confusion matrix.

Table 1. Material matrix groups used for classification.

Matrix Group	Resin	Hardener	Materials ID
1	Epoxy (Bisphenol A)	Modified aliphatic polyamine	A, B, E
2	Epoxy (Bisphenol A)	Polyaminoamide/polyamine	D, F
3	Epoxy (Bisphenol A)	Dicarboxylic anhydride	C, G
4	Epoxy (Bisphenol A)	Modified aromatic polyamine	H

Table 2. The accuracy score on test data for each model.

Model	Score (%)
SVC	75.0
MLP	71.75
MLP-GD	78.75
Random Forest	71.0
Random Forest-GD	77.5
SVC Grouping	72.0
SVC-GD	83.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kanakis, D.; Berdiñas, Z.M.; Sioutas, K.N.; Santamarina, E.; Prieto, C.; Koumoulos, E.P. Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes. Materials 2026, 19, 751. https://doi.org/10.3390/ma19040751

AMA Style

Kanakis D, Berdiñas ZM, Sioutas KN, Santamarina E, Prieto C, Koumoulos EP. Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes. Materials. 2026; 19(4):751. https://doi.org/10.3390/ma19040751

Chicago/Turabian Style

Kanakis, Dimitris, Zaira M. Berdiñas, Konstantinos N. Sioutas, Elena Santamarina, Camilo Prieto, and Elias P. Koumoulos. 2026. "Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes" Materials 19, no. 4: 751. https://doi.org/10.3390/ma19040751

APA Style

Kanakis, D., Berdiñas, Z. M., Sioutas, K. N., Santamarina, E., Prieto, C., & Koumoulos, E. P. (2026). Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes. Materials, 19(4), 751. https://doi.org/10.3390/ma19040751

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Assisted LIBS Identification of Epoxy Resins in CFRP for Recycling Processes

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.2. Experimental LIBS Setup

2.3. Data Acquisition and Spectra Processing

2.4. Selective Feature Importance

3. Results and Discussion

4. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI