Thermographic Data Processing and Feature Extraction Approaches for Machine Learning-Based Defect Detection

. Abstract: Infrared thermography is a non-destructive testing method used to detect defects in materials and structures. Machine learning algorithms have been applied to thermographic data to automate the defect detection process. Data preparation and feature extraction are crucial factors affecting ML model results, especially in thermographic data analysis. This study focuses on automating the detection of impact damage in carbon ﬁber-reinforced polymer materials using ﬂash-pulse thermography and ML algorithms. Various machine learning models and data pre-processing techniques were evaluated for their effectiveness in detecting and locating impact damage. The results demonstrated that the combination of the K-nearest neighbors model with the differential absolute contrast data processing method achieved the highest balanced accuracy. Other combinations, such as Gaussian support vector machine model with raw data and K-nearest neighbor with thermographic signal reconstruction derivative data, also exhibited promising performances.


Introduction
Composite materials are being increasingly used in industries, like the aerospace, energetics and automotive sectors, due to their many advantages, such as being lightweight, strong and resistant to corrosion [1].However, the presence of defects, such as impact damage, can compromise the integrity of composite components and lead to structural failures [2].To ensure the quality and reliability of composite materials, it is crucial to develop and implement reliable nondestructive testing (NDT) methods.
Flash pulse thermography (FPT) is a popular NDT method used for inspecting composites.It involves using short pulses of intense light to heat the material's surface and then measuring its thermal response with infrared cameras [3,4].Defects such as cracks, delaminations and impact damage can affect the thermal response and can thus be detected.
One of the main challenges in infrared nondestructive testing (IRNDT) is interpreting the results and identifying defects, which is usually performed manually by operators.To address these challenges, various techniques have been developed to process thermographic data.These techniques, such as thermographic signal reconstruction (TSR) [5], pulse-phase thermography (PPT) [6] and principal component thermography (PCT) [7], aim to improve defect detection, enhance contrast and reduce the impact of uneven heating.In recent years, there has been a growing interest in automating the inspection process and using machine learning (ML) algorithms to analyze thermographic data [8][9][10].ML algorithms have shown promise in nondestructive testing applications.However, when applied to thermography, there are challenges related to the generalizability, robustness and reliability of the models and the complexity of thermographic data.
Our research focuses on comparing different ML models and data pre-processing techniques for automatically detecting impact damage in CFRP (carbon fiber-reinforced plastic) using FPT.The goal was to determine the most effective combination of ML models and data processing methods that can accurately detect and locate impact damage in CFRP materials using FPT.

Methodology
Flash-pulse thermography is the primary technique used in this study.It involves applying controlled heat pulses to the material and capturing its temperature response with infrared cameras.However, real-world thermographic inspections face challenges like background noise and uneven heating, limiting the effectiveness of simple analytical methods.Contrast methods, such as the differential absolute contrast (DAC) method, addresses these challenges by not requiring the definition of a non-defect area and being unaffected by non-uniform heating [11].
TSR is another data processing method that enhances defect detection.It involves fitting temperature data using a logarithmic polynomial equation, thus allowing for the reconstruction of the entire thermographic sequence.The PCT method utilizes principal component analysis (PCA) to extract valuable information from thermographic images, reducing dimensionality while retaining essential defect-related information [12].
Data processing based on statistical parameters utilizes higher-order statistic (HOS) parameters, such as mean, variance, skewness and kurtosis, to detect subsurface defects [13].These parameters compress the entire thermographic sequence into a single image or small number of images, containing comprehensive defect information.
Machine learning plays a significant role in thermographic data processing by automating defect detection and analysis.ML models, including support vector machines (SVMs), decision trees, and neural networks (NN), effectively learn patterns and features from large sets of thermographic sequences, enhancing the accuracy of defect identification and characterization.Preprocessing is crucial in ML, removing noise and artifacts, and extracting important features for effective defect detection and analysis.

Measurement and Evaluation Procedure
Flash-pulse thermographic inspections were conducted on flat CFRP panels using an infrared camera FLIR A6751 and a flash lamp Hensel EH Pro 6000 with a maximum power of 6 kJ.Experimental samples represented 16 impact-damaged CFRP plates.The size of the samples was 100 × 150 mm and their thickness was about 0.7 mm or 2.8 mm.The samples were positioned on a thermally insulating foam sheet, with the flash lamp and camera placed vertically at distances of 350 mm and 400 mm from the samples, respectively.The infrared camera recorded the thermal evolution on the surface of the inspected CFRP samples after the heating by the flash pulse with an acquisition of 225 frames, framerate of 50 Hz and a resolution of 640 × 512 pixels.The data recording was carried out using LabIR software, which is an internal tool developed at the University of West Bohemia.
Several combinations of data processing techniques and ML models were compared to evaluate their performance.Data processing methods included raw thermographic data (with only background subtraction applied), the logarithm representation of the data, the DAC method, first derivation data obtained using the TSR method, PCA, and the statistical parameters method.
For the raw data, every feature vector for the machine learning models represented the temperature evolution sequence obtained for one pixel.Logarithmic data represented 225 values of the logarithm of the temperature.For the TSR method, the feature vectors represented the sequence of 225 temperature derivatives.PCA data represented the first 10 principal components obtained via PCA analysis.Statistical parameters included 24 features obtained as the mean, variance, skewness and kurtosis of six time intervals: 0:0.1 s; 0.8:2 s; 0.16:0.5 s, 0.4:1 s; 0.8:2 s; and 1.6:5 s.
Defect and non-defect areas were chosen manually to create training and testing datasets.The training dataset was composed of 2085 feature vectors related to defect area pixels and 3337 feature vectors related to non-defect area pixels.It included the results of the inspection of 4 CFRP plates.The rest of the data, including 4243 defect-area pixels and 105,386 non-defect area pixels, were used for testing.
The following machine learning models were selected: decision tree, logistic regression, support vector machine (SVM), K-nearest neighbors (KNN), bagged trees ensemble and 3-layer neural network.
The performance of the models was compared via balanced accuracy (BA), which is a summary of the performance of a binary classifier that takes into account both the true positive rate (TPR) and the true negative rate (TNR).It is calculated as the average of TPR and TNR.

Results
Table 1 presents the balanced accuracy values for different combinations of machine learning models and data processing methods.The balanced accuracy metric provides an overall assessment of the model's performance by considering both sensitivity and specificity.Based on the table, it can be observed that the combination of the KNN model with the DAC data processing method achieves the highest balanced accuracy of 96.7%.Other combinations that exhibit relatively high balanced accuracy values include Gaussian SVM with raw data (95.1%),KNN with TSR derivative data (94.9%) and the neural network with statistical features (94.3%).
Naive Bayes demonstrated low balanced accuracy and poor results in defect identification for data processed using all methods.On the other hand, the neural network method demonstrated good results (BA > 90%) using all processing techniques.When comparing the processing techniques, the DAC and TSR methods demonstrated the best results (BA > 90% in all models except naive Bayes and logistic regression).

Conclusions
This study investigated the application of machine learning algorithms for automatically detecting impact damage in CFRP samples using flash pulse thermography.The results highlighted the effectiveness of different ML models and data pre-processing techniques in thermographic data analysis.The combination of the KNN model with the DAC data processing method showed the highest balanced accuracy of 96.7%, indicating its superior performance in accurately detecting and locating impact damage.Additionally, the Gaussian SVM with raw data, KNN with TSR derivative data and neural network with statistical features also demonstrated satisfactorily balanced accuracy values.The neural network method demonstrated good results (BA > 90%) via all processing techniques.However, the naive Bayes model showed poor performance across all data processing methods.Comparing the processing techniques, the DAC and TSR methods consistently provided excellent results via most models.
These findings emphasize the importance of carefully selecting an appropriate machine learning model and data processing method to achieve accurate and reliable thermographic data analysis.Further research should focus on refining and optimizing the ML models to improve their generalizability and robustness in real-world applications.

Table 1 .
Balanced accuracy of ML models with different preprocessing methods.