1. Introduction
Digital Image Correlation (DIC) and infrared thermography (IRT) are common non-destructive testing techniques used for experimental analysis of materials and structures. DIC provides full-field displacement and strain distributions with high spatial resolution and has become a standard tool for analyzing deformation and damage evaluation [
1,
2,
3]. In this work, DIC fields were computed and evaluated using NCorr, an open-source 2D DIC software widely adopted in the experimental mechanics community for accurate displacement and strain evaluation [
4]. Infrared thermography, on the other hand, measures surface temperature distributions and is sensitive to local processes such as plastic dissipation, friction heating, crack growth, or thermal softening [
5,
6,
7]. When these two methods are combined, researchers obtain a more complete picture of thermomechanical behaviour, especially during dynamic loading, fracture initiation, or fatigue testing.
However, IR thermography requires specialized instruments that are not easily justified for occasional use, and they also require calibration and are sensitive to reflections, ambient light, and environmental noise [
5,
8]. Additionally, they require a direct line of sight to the specimen, which is not always possible. A common example is high-energy impact testing or very-high-strain tests using a drop tower machine or split Hopkinson bar [
9]. In many laboratories, safety regulations require the specimen to be placed inside a protective enclosure made from acrylic, polycarbonate, or tempered glass. These transparent shields protect the operator from fragments during impact but are IR-opaque [
8]. DIC, however, can still function normally through such transparent windows, while infrared cameras cannot record correct temperature fields. As a result, only mechanical deformation data is available, and the thermal information is missing. In previous work, similarities of infrared (IR) image patterns and effective strain distributions obtained by DIC were described by Krstulović-Opara et al. [
10], suggesting that the spatial structure of thermal and mechanical fields is strongly correlated during dynamic deformation.
Because of these limitations, a method that can estimate temperature fields from strain measurements alone would be extremely useful. With such a method, a researcher could still obtain thermomechanical insights even when thermography cannot be used due to safety barriers, optical restrictions, or budget limitations.
Recent developments in deep learning show that convolutional neural networks (CNNs) can learn complex mappings between different image datasets. Encoder–decoder architectures such as U-Net have demonstrated strong performance in segmentation, denoising, and general pixel-wise prediction tasks [
7,
11,
12]. U-Net, originally introduced for biomedical imaging, combines multi-scale encoding with skip connections to preserve spatial detail [
7]. These architectures are well suited for learning relationships between physical fields, provided that a sufficiently large, paired dataset exists, of which this is the first. Although image-to-image translation is well explored (e.g., pix2pix [
13] and Cycle GAN [
14]), very few studies attempt to predict thermal fields directly from DIC strain data. Still, mechanical deformation and thermal response are coupled through plastic dissipation, viscoelastic heating and other mechanisms [
6,
15,
16]. Therefore, with enough paired examples, a neural network can learn this mapping and reconstruct temperature distributions even when no infrared camera is used—such as in the drop-tower case described earlier.
In this work, we present a U-Net-based deep regression model that predicts full-field thermal maps from DIC strain images. A custom dataset of paired strain–temperature fields was collected and processed using normalization and paired geometric augmentations. The U-Net architecture was modified to perform continuous regression, and several training strategies were designed to preserve the physical meaning of the data. Our results show that, when trained on a sufficiently representative dataset, the model can reconstruct thermal fields with high correlation and low error, demonstrating that thermography-free thermal estimation is feasible and practical.
2. Experimental Setup, Data Acquisition and Data Processing
Before the paired DIC–thermal dataset could be used for training, the experiment and the measurement equipment had to be carefully designed to ensure that both deformation and temperature fields were acquired/recorded with sufficient quality. The overall research workflow of the proposed framework is summarized in
Figure 1.
This section gives an overview of the testing configuration, the imaging systems (
Figure 2a) used in the study, and the procedure for calculating strain based on high-speed images. In addition, several processing steps were required to convert the recorded images into a consistent and usable form for the neural network. The following subsections describe the data acquisition process and the steps applied for processing and augmenting the dataset.
2.1. Experimental Setup and Data Acquisition
The goal of this research was to demonstrate that thermal fields can be reconstructed directly from strain measurements even during rapid and strongly dynamic loading conditions. To ensure clear thermomechanical activity, the three-point bending test was chosen because of its highly predictable deformation pattern [
10,
17,
18]. The test was performed on an Instron 8801 Servo hydraulic dynamic testing system at jaw separation speed of 287 mm/s. The infrared and optical cameras were mounted on two independent stands and positioned in a vertical, coaxial arrangement, with the infrared camera located below and the high-speed optical camera positioned directly above it. The housing of the optical camera was placed approximately 2 mm above the thermal camera housing to avoid thermal interference, while maintaining an unobstructed field of view of the same planar specimen surface. The optical camera was positioned at an approximate working distance of 1 m from the specimen, and both cameras remained fixed throughout the experiments. The optical axes of the two cameras were nearly parallel, with a small in-plane angular offset of approximately 1–2° measured via the fixture.
Fast loading promotes pronounced thermomechanical coupling, including thermoelastic effects and plastic dissipation, which become visible in infrared recordings [
6,
10]. In contrast, quasi-static tests at very low displacement rates (e.g., 0.1 mm/s) tend to equalize the surface temperature between frames due to heat dissipation, reducing observable gradients and making the thermal response less suitable for supervised learning. Thermal data were acquired using the cooled InSb Flir SC5000MW series infrared camera, operating at 320 × 256 pixels and 487 frames per second with emissivity coefficient 0.95. High-speed infrared imaging inherently involves trade-offs between frame rate and spatial resolution due to detector readout limits [
5,
15]. The selected frame rate provided adequate temporal resolution for dynamic thermography while preserving sufficient thermal detail. Simultaneously, mechanical deformation was captured using a Chronos 1.4 high-speed camera at 1280 × 1024 pixels and 487 frames per second, ensuring that both cameras remained synchronized. In addition, specimens were coated with black paint and sprayed with random speckled pattern for DIC [
1,
19]. Triggering of the two systems was coordinated using a custom-built trigger box that was manually activated prior to the start of the Instron test. Manual triggering was adopted due to differences between the trigger-out signal of the testing machine and the trigger-in requirements of the IR and high-speed cameras. A total of five specimens were tested under identical conditions undergoing a three-point bending test which ensured a predictable deformation region [
10,
17,
18]. Specimens consisted of square and circular aluminum tubes, with one specimen filled with Ethylene–Vinyl Acetate to introduce a small amount of structural support and, therefore, variety in the dataset (see
Figure 2b). The synchronized thermal and high-speed optical recordings formed the basis for constructing the paired DIC–thermal dataset used in this study.
2.2. Data Processing, Normalization, and Augmentation
The raw data obtained from the experiments consisted of full-frame DIC recordings from the high-speed optical system and temperature fields extracted from the infrared camera. Before the images could be used for network training, both datasets required several processing steps to obtain spatially aligned, normalized and sufficiently large datasets. The following subsections describe the procedures for DIC and thermal data separately, followed by the data preprocessing and augmentation strategy applied to both.
2.2.1. Digital Image Correlation Data Processing
The DIC-derived effective strain fields were treated as the geometric reference and were not modified, such that all geometric alignment operations were applied exclusively to the infrared data. The high-speed optical recordings were first processed in NCorr [
4] to obtain displacement and strain fields. The settings used to acquire DIC from each individual datasets are shown in
Table 1. The subset radius sets the pixel size of each correlation window and determines how much image texture is used for tracking, while the subset spacing controls the distance between neighbouring points and, therefore, the spatial resolution of the displacement grid. The number of threads specifies how many CPU cores are used during processing. Convergence of the correlation is governed by the difference norm, which measures the change between iterations, and by the maximum number of iterations allowed for each subset. For cases involving large deformation, the high-strain leapfrog number of steps defines how many intermediate tracking steps NCorr introduces to maintain correlation stability. The discontinuous analysis option enables the algorithm to handle cracks or separations by permitting decorrelation when continuity breaks down. Finally, the strain radius controls the neighbourhood size used for polynomial fitting when computing strain fields, influencing the smoothness of the resulting strain maps [
4].
NCorr calculates both the in-plane Green–Lagrange strain tensor, which does not move with the object [
20], and the Eulerian–Almansi strain tensor, which is evaluated in the deformed configuration [
20]. For our purposes, Eulerian–Almansi strain tensor components were chosen as the strain moves with the object during deformation [
20].
where
Exx is the normal strain in the x-direction,
Eyy is the normal strain in the y-direction, and
Exy and
Eyx represent the shear strain. They are used to describe in-plane deformation, as only one camera was used. From the normal and shear components, two principal strains were calculated,
E1 and
E2, which are eigenvalues of the matrix
E (see Equation (1)) [
20].
An effective (von Mises-type) strain was adopted as a scalar invariant measure of deformation [
20].
Owing to its definition in terms of the deviatoric strain tensor, the effective strain is directly related to plastic deformation and the associated energy dissipation. Therefore, effective strain is expected to show a strong correlation with temperature changes observed by infrared thermography [
16]. In addition, the use of a scalar strain measure reduces the sensitivity to local fluctuations of principal strain directions and provides improved numerical robustness compared to individual principal strain components, which proved advantageous for deep learning-based image-to-field translation. Several frames from separate datasets and loading stages can be seen in the
Figure 3.
2.2.2. Infrared Camera Data Processing
The infrared camera records data in the proprietary FLIR format, and each frame was subsequently exported as an ASCII matrix of temperature values. To ensure that only the specimen region was used for training, a bounding box was created around the specimen based on its visible outline. Everything outside this bounding box was excluded (see
Figure 4), which removed the laboratory background, fixtures, and reflections that would otherwise degrade learning. Preliminary training was also performed using infrared images that included the full background; however, this configuration led to inferior performance, since the network was required to learn a more complex mapping from specimen-only DIC fields to thermal images containing both the specimen and surrounding background.
During the experiment, the thermal frames were recorded at a resolution of 320 × 256 pixels, which is approximately square but still includes unavoidable peripheral regions outside the specimens. Prior to cropping, a small in-plane rotation was applied only to the infrared frames to compensate for the measured angular offset between the infrared and optical cameras. Each frame was therefore cropped to a 256 × 256 pixel region centred on the specimen, ensuring spatial consistency with the processed DIC frames.
2.2.3. Data Processing
Both DIC-based strain fields and thermal images contained numerical artifacts that needed to be removed before the dataset could be used for training. These artifacts appeared as isolated pixels or small clusters showing unrealistically high values that were not physically possible for the given loading conditions. The problem is common in full-field numerical techniques: in DIC, it usually originates from local decorrelation or subset tracking failure [
1,
2], while in infrared measurements it is often caused by sensor noise or transient reflections [
15].
To address these issues, a dedicated processing procedure was implemented in MATLAB 2022b and applied identically to all datasets of DIC strain and thermographic fields. For each frame, the following steps were executed:
- 1.
Outlier Detection.
-
Each image (DIC effective strain or thermal field) was scanned for large outlier values using a threshold-based rule. Thresholds were not universal; instead, they were chosen based on the expected physical of values possible in datasets.
- a.
For DIC effective strain, values above an upper bound, e.g., >15% of maximum strain per image were flagged because such magnitudes should occur in the tested specimens.
- b.
For the thermal fields, values outside the physically possible temperature range for the experiment were identified in the same way.
-
The algorithm produced a list of outlier pixel coordinates together with the frame index, enabling frame-by-frame inspection.
- 2.
Neighbourhood replacement.
-
Once an outlier pixel was detected, its value was replaced by the mean of a local neighbourhood window. In practice, a 3 × 3 or 5 × 5 kernel was used depending on the dataset resolution.
-
This local replacement preserves the smoothness of strain and temperature fields while removing singular spikes that would otherwise distort the regression process. This approach is similar to commonly used “local averaging” strategies for stabilizing DIC or thermographic strain–temperature data.
- 3.
Consistency across datasets.
-
Because the datasets must remain paired, filtered frames from the DIC and the corresponding thermal frame were always kept synchronized. Frames identified as significantly corrupted (e.g., those showing large regions of decorrelation in DIC or severe saturation in thermography) were removed entirely before the normalization step. This removal ensured that only physically meaningful strain–temperature pairs contributed to network training.
- 4.
Validation of filtered datasets.
-
After removal, the number of remaining frames in each dataset was verified programmatically to maintain equal lengths between strain and thermal sequences. Additional visual checks were carried out on all frames to confirm that the neighbourhood replacement procedure removed only numerical artifacts without affecting real features in the data.
By applying this processing pipeline, all datasets were transformed into stable and physically consistent sequences of effective strain and thermal images. This step was essential because even a small number of corrupted frames can introduce significant errors in data-driven regression tasks, particularly when training neural networks that are highly sensitive to anomalies in pixel intensity distributions.
2.2.4. Data Normalization
After completing the data processing procedure, all thermography datasets and the corresponding DIC-derived strain datasets were normalized [
12] independently using their respective minimum and maximum values. The minimum and maximum effective strain values, as well as the minimum and maximum temperatures, for each dataset are summarized in the
Table 2.
This per-dataset normalization was applied to ensure that the intensity ranges were consistent across specimens, which generally leads to more stable convergence during network training [
12,
21].
2.3. Data Augmentation
After normalization, the full dataset consisted of 189 paired strain–temperature frames. While this is sufficient for preliminary analysis, it is not large enough to train a deep convolutional network without risking overfitting. These data were first split into training (70%), validation (15%), and test (15%) subsets. Data augmentation was then applied independently to each subset, with each original frame augmented up to nine times using physically consistent transformations. To expand the variability of the input space in a controlled and physically meaningful way, a paired augmentation procedure was implemented. All augmentations were applied simultaneously to the DIC-derived effective strain images and the corresponding thermal frames.
This paired approach follows the practice established in early image-to-image translation research, where geometric augmentation must preserve pixel-wise correspondences between datasets [
13,
14]. For this reason, only lightweight geometric perturbations were used, as more aggressive transformations (e.g., elastic distortions [
22]) would not reflect realistic mechanical behaviour [
1,
3,
19] and could mislead the model.
The augmentation therefore consisted of small random rotations (±3°), translations up to ±5 pixels, and uniform scaling within ±3%. These mild affine transforms emulate realistic variations in specimen placement or camera alignment while preserving the mechanical meaning of the data. Similar small-magnitude augmentations have been used in DIC accuracy studies to evaluate robustness to slight geometric perturbations [
3,
23], and this technique is also widely adopted in imaging tasks where dataset size is limited [
24,
25,
26]. Each frame was augmented multiple times with random combinations of these transformations, expanding the dataset to from 189 to 1655 paired images. This increased variability helps reduce model overfitting and encourages the network to learn the true thermomechanical mapping instead of memorizing specimen-specific features. During dataset preparation, frames corresponding to the initial loading stages with negligible deformation were excluded, as both the strain and thermal fields in these frames contain very limited informative structure and could bias the learning process. This filtering step was applied prior to augmentation and was applied consistently across all subsets. After augmentation, the final dataset comprised 1655 paired images, with no overlap between training, validation, and test data. For consistency with the network input, both effective strain and temperature fields are shown in grayscale, exactly as they are provided to the deep learning model after normalization. While colour maps are commonly used for visualization, they were intentionally omitted here to emphasize the raw numerical field representation rather than qualitative colour perception. Paired effective strain and thermography images for two selected pairs are shown in
Figure 5a,b and
Figure 5c,d.
4. Results
The trained U-Net model demonstrated strong predictive performance on the test dataset. To quantitatively assess the generalization capability of the trained network, prediction accuracy was evaluated on an independent test dataset that was not used during training or validation.
Table 3 summarizes the mean performance metrics computed on the test set, including SSIM, MAE, RMSE, and the coefficient of determination (R
2).
4.1. Global Prediction Accuracy
A global pixel-level comparison between predicted and true thermal fields (see
Figure 7a) shows tight clustering around the identity line, indicating strong agreement between the predicted and measured temperatures.
The coefficient of determination of R
2 = 0.985 was calculated on 5000 randomly sampled pixels from predicted and true thermal datasets. Additionally, 50,000 and 500,000 datapoints out of a possible 16,973,824 were sampled, with R
2 = 0.966 and R
2 = 0.968 respectfully (see
Figure 8). This indicated that the learned mapping preserves the global distribution of thermal intensities with relatively high fidelity.
4.2. Spatial Distribution of Prediction Accuracy
To further examine spatial consistency, a pixelwise R
2 field was computed across the entire image domain (see
Figure 9).
The central region corresponding to the deforming specimen exhibits uniformly high agreement (R2 ≈ 0.8–1.0), while the background and masked regions show lower values, as expected, due to near-zero temperature gradients that amplify relative errors. The predicted thermal fields obtained from the trained U-Net remained largely within this physically meaningful range, with values spanning from −0.0615 to 0.9825. The small drift into negative values (≈6% below zero) is expected for convolutional regression networks trained without an explicit output constraint or activation function at the final layer. Such networks can produce unbounded continuous outputs, and slight under- or overshooting near sharp gradients or data-sparse regions is common. These deviations are minor relative to the dynamic range of the normalized data and do not introduce structural artifacts. In practice, negative predictions can be safely clamped to zero during inference without loss of information, while the upper range remains relatively aligned with the true values.
Error distributions over the test dataset were quantified using SSIM, MAE, and RMSE (
Figure 10). A clear and physically meaningful trend is observable: the prediction error is largest in the earliest test frames, when the specimen is still undeformed and the thermal field contains very low spatial variation. In this initial stage, the true thermal images are dominated by nearly uniform background with minimal temperature gradients.
4.3. Frame-Wise Error Evolution
Because the strain fields at this point also contain only noise-level deformation, the mapping between DIC strain and thermal response is weakly defined, and small pixel-level deviations lead to comparatively high normalized error (e.g., RMSE ≈ 0.24–0.26 in frames 1 to 3, see
Figure 11). The residual maps for these frames confirm that the network slightly overpredicts a uniform heating pattern along the specimen. As loading begins and deformation localizes, the thermal field becomes more characteristic and structured. Correspondingly, the network’s performance improves significantly. Between approximately frames 6 and 50 (see
Figure 10), the MAE stabilizes around 0.02–0.04, the RMSE falls to 0.01–0.03, and the SSIM reaches 0.6–0.8, indicating very good agreement. Representative residual maps (e.g., frames 12 and 41, see
Figure 11) show that discrepancies are confined to small areas around the forming “hinge”, where temperature gradients are steepest. As the test progresses into the large-rotation region (frames 140–162, see
Figure 11), prediction quality remains strong despite increasing geometric complexity RMSE ≈ 0.05 and SSIM ≈ 0.5.
Overall, the error curves demonstrate that the U-Net model is most accurate during the mechanically meaningful phases of the experiment, where the thermomechanical coupling between strain and temperature is strongest. Importantly, no systematic artifacts or hallucinations appear in any residual map, and all deviations from true data remain physically interpretable. The largest errors are observed in frames with little or no informative thermal structure (i.e., initial uniform states where deformations are practically noise) and a few post-peak frames. Generally, the model attains low error and high structural similarity during the main deformation phase, where thermomechanical interpretation is most critical. The final trained U-Net model is relatively compact by modern deep learning standards, with approximately 31.0 million learnable parameters and a disk size of ≈110 MB. This moderate model capacity was sufficient to capture the thermomechanical mapping while remaining computationally efficient for both training and inference.
5. Discussion
The results indicate that the learned effective strain-to-temperature mapping follows the general thermomechanical behaviour observed in previous experimental studies, where temperature localisation coincided with the onset of plastic deformation and strain concentration [
6,
10]. In the present work, this relationship is clearly reflected in the frame-wise error evaluation. The highest prediction errors occur in the initial part of the loading sequence, where both the strain field and the thermal response exhibit very limited spatial structure. In these nearly uniform states, even small absolute deviations can translate into comparatively large, normalized errors, and similarity measures such as SSIM remain low.
As deformation progresses, the model accuracy improves substantially. Once measurable strain localization develops, the network predicts temperature fields with low RMSE and markedly higher structural similarity. The residual maps confirm that, in this regime, most discrepancies remain confined to narrow regions where gradients change sharply or where local decorrelation effects in the DIC data introduce noise. This behaviour is consistent with the expectation that prediction quality increases when the thermomechanical signal becomes stronger and more informative.
These observations suggest that data-driven thermal reconstruction is most reliable during the mechanically relevant stages of loading, where temperature gradients are typically analyzed for damage initiation or energy dissipation. While the reduced fidelity observed in the early, nearly uniform frames is of limited practical significance, it should be taken into consideration in future retraining strategies. Overall, the model reconstructs the thermal field without introducing non-physical artefacts and preserves the essential spatial structure of the thermographic response. The comparable temperature and effective strain ranges observed across the datasets (
Table 2) further suggest that the trained model may be transferable to experiments characterized by similar thermomechanical operating regimes and materials with comparable properties.
The present study is intended as a proof of concept demonstrating that full-field temperature distributions can be reconstructed directly from strain measurements using a data-driven deep learning framework. To this end, specimens with relatively symmetric geometries subjected to three-point bending were deliberately selected, as this configuration provides a predictable deformation pattern and a well-defined thermomechanical response, enabling controlled validation of the proposed approach.
As a consequence, the current experimental dataset does not include strongly asymmetric geometries, complex boundary conditions, or a broader range of material classes. While the proposed framework itself is not inherently limited to symmetric parts or a specific material, its performance under such conditions has not yet been experimentally verified. Future work will therefore focus on extending the framework to different materials, asymmetric geometries, and alternative loading scenarios, as well as expanding the dataset to assess generalization capability and robustness under more complex thermomechanical conditions.
In addition to experimental extensions, further research directions include the investigation of shared-encoder architectures, in which a single feature extractor feeds separate mechanical and thermal decoders. Such designs may reduce redundancy and strengthen the coupling learned between the two fields. It would also be valuable to retrain image-to-image translation models such as pix2pix on the same dataset to directly compare adversarial and regression-based approaches, particularly in their ability to capture fine thermal details or handle low-information strain inputs. Finally, incorporating temporal models, such as recurrent or attention-based architectures, could exploit frame-to-frame continuity and further improve prediction stability during high-rate deformation.