Author Contributions
Conceptualization, S.W. and K.C.; methodology, K.C.; validation, K.C.; formal analysis, K.C.; investigation, K.C. and S.T.; resources, S.W.; data curation, K.C. and S.T.; writing—original draft preparation, K.C.; writing—review and editing, K.C. and S.W.; visualization, K.C.; supervision, S.W.; project administration, S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Schematic illustration of the ceramic roller kiln structure and thermodynamic zoning. The roller kiln consists of three major zones: pre-heating zone (ambient–900 °C), sintering zone (900–1250 °C), and cooling zone (1250 °C–ambient). Ceramic bodies are transported on rotating rollers from the entrance to the exit. Combustion air, auxiliary air, and fuel are supplied to the sintering zone, while flue gases are discharged at both ends. The schematic highlights the spatial arrangement of heating devices, airflow direction, and thermal zone transitions relevant to temperature detection.
Figure 1.
Schematic illustration of the ceramic roller kiln structure and thermodynamic zoning. The roller kiln consists of three major zones: pre-heating zone (ambient–900 °C), sintering zone (900–1250 °C), and cooling zone (1250 °C–ambient). Ceramic bodies are transported on rotating rollers from the entrance to the exit. Combustion air, auxiliary air, and fuel are supplied to the sintering zone, while flue gases are discharged at both ends. The schematic highlights the spatial arrangement of heating devices, airflow direction, and thermal zone transitions relevant to temperature detection.
Figure 2.
Overall architecture of the proposed MST-FusionNet for multimodal temperature detection in ceramic roller kilns. The framework integrates combustion image sequences and wall-mounted thermocouple measurements. First, discrete thermocouple readings are transformed into multi-channel Gaussian pseudo-heatmaps and concatenated with chromaticity-transformed image frames for early spatial fusion. The fused features are extracted by a modified ResNet-18 backbone. In parallel, thermocouple vectors are encoded through MLP branches to provide baseline temperature estimation and global thermal context. Temporal dependencies are modeled using an LSTM followed by a multi-head self-attention (MHA) module to emphasize informative frames. Finally, a residual compensation mechanism combines baseline predictions and learned residuals to output temperatures at six roller positions.
Figure 2.
Overall architecture of the proposed MST-FusionNet for multimodal temperature detection in ceramic roller kilns. The framework integrates combustion image sequences and wall-mounted thermocouple measurements. First, discrete thermocouple readings are transformed into multi-channel Gaussian pseudo-heatmaps and concatenated with chromaticity-transformed image frames for early spatial fusion. The fused features are extracted by a modified ResNet-18 backbone. In parallel, thermocouple vectors are encoded through MLP branches to provide baseline temperature estimation and global thermal context. Temporal dependencies are modeled using an LSTM followed by a multi-head self-attention (MHA) module to emphasize informative frames. Finally, a residual compensation mechanism combines baseline predictions and learned residuals to output temperatures at six roller positions.
Figure 3.
Multi-channel Gaussian pseudo-heatmaps constructed from visible thermocouple measurements. Each subfigure corresponds to one visible thermocouple projected onto the image plane. The discrete temperature reading at the sensor location is spatially diffused using a two-dimensional Gaussian kernel, forming a localized thermal distribution centered at the sensor position. Different channels are constructed independently to preserve spatial separability before being concatenated with combustion images for early fusion. The blue bullets indicate the positions of different thermocouples in the images.
Figure 3.
Multi-channel Gaussian pseudo-heatmaps constructed from visible thermocouple measurements. Each subfigure corresponds to one visible thermocouple projected onto the image plane. The discrete temperature reading at the sensor location is spatially diffused using a two-dimensional Gaussian kernel, forming a localized thermal distribution centered at the sensor position. Different channels are constructed independently to preserve spatial separability before being concatenated with combustion images for early fusion. The blue bullets indicate the positions of different thermocouples in the images.
Figure 4.
Experimental platform for data collection in the sintering zone of the roller kiln. The platform consists of a roller-hearth kiln system, oxygen supply system, combustion system, thermocouple array, and a color CCD camera equipped with a cooling mechanism. The thermocouples are distributed along the kiln wall for boundary temperature measurement, while the camera captures combustion image sequences. PTCR temperature measurement rings placed on rollers are used to obtain ground-truth ceramic body temperatures.
Figure 4.
Experimental platform for data collection in the sintering zone of the roller kiln. The platform consists of a roller-hearth kiln system, oxygen supply system, combustion system, thermocouple array, and a color CCD camera equipped with a cooling mechanism. The thermocouples are distributed along the kiln wall for boundary temperature measurement, while the camera captures combustion image sequences. PTCR temperature measurement rings placed on rollers are used to obtain ground-truth ceramic body temperatures.
Figure 5.
Representative combustion images at different temperature levels in the sintering zone. The images correspond to approximate kiln temperatures of (a) 610 °C, (b) 690 °C, (c) 785 °C, (d) 950 °C, (e) 1010 °C, and (f) 1070 °C. As temperature increases, flame intensity and chromatic distribution gradually change, reflecting variations in combustion morphology and radiation characteristics.
Figure 5.
Representative combustion images at different temperature levels in the sintering zone. The images correspond to approximate kiln temperatures of (a) 610 °C, (b) 690 °C, (c) 785 °C, (d) 950 °C, (e) 1010 °C, and (f) 1070 °C. As temperature increases, flame intensity and chromatic distribution gradually change, reflecting variations in combustion morphology and radiation characteristics.
Figure 7.
Scatter plots comparing predicted and measured temperatures on the validation set for different methods. Each subplot corresponds to one model. The horizontal axis denotes PTCR-measured ground-truth temperatures (°C), and the vertical axis denotes predicted temperatures (°C). The diagonal line represents perfect prediction (y = x). Compared with single-modal baselines, the proposed MST-FusionNet shows the strongest linear correlation and minimal dispersion across the full temperature range (600–1200 °C).
Figure 7.
Scatter plots comparing predicted and measured temperatures on the validation set for different methods. Each subplot corresponds to one model. The horizontal axis denotes PTCR-measured ground-truth temperatures (°C), and the vertical axis denotes predicted temperatures (°C). The diagonal line represents perfect prediction (y = x). Compared with single-modal baselines, the proposed MST-FusionNet shows the strongest linear correlation and minimal dispersion across the full temperature range (600–1200 °C).
Figure 8.
Temperature tracking performance comparison on Roller 2 across validation samples. (a) Tracking curve of the vision-only baseline compared with ground truth; (b) Tracking curve of MST-FusionNet compared with ground truth; (c) Absolute error curves of different methods. The proposed method closely follows step-like temperature changes with minimal lag and reduced oscillation, demonstrating improved temporal stability under dynamic combustion conditions.
Figure 8.
Temperature tracking performance comparison on Roller 2 across validation samples. (a) Tracking curve of the vision-only baseline compared with ground truth; (b) Tracking curve of MST-FusionNet compared with ground truth; (c) Absolute error curves of different methods. The proposed method closely follows step-like temperature changes with minimal lag and reduced oscillation, demonstrating improved temporal stability under dynamic combustion conditions.
Figure 9.
Sensitivity analysis of the Gaussian diffusion scale on MAE. The kernel size is set to to cover approximately ±, ensuring negligible truncation error.
Figure 9.
Sensitivity analysis of the Gaussian diffusion scale on MAE. The kernel size is set to to cover approximately ±, ensuring negligible truncation error.
Table 1.
Comparison of representative multimodal fusion approaches in high-temperature industrial applications. The table summarizes differences in industrial scenarios, data modalities, fusion strategies, spatial alignment mechanisms, residual modeling, and whether the methods target solid workpiece temperature estimation.
Table 1.
Comparison of representative multimodal fusion approaches in high-temperature industrial applications. The table summarizes differences in industrial scenarios, data modalities, fusion strategies, spatial alignment mechanisms, residual modeling, and whether the methods target solid workpiece temperature estimation.
| Paper | Industrial Scenario | Data | Target Variable | Fusion Strategy | Spatial Alignment Mechanism | Residual Modeling | Focus on Solid Workpieces |
|---|
| [24] | 23 heterogeneous process variables | Converter Steelmaking | Oxygen content at the end point | Deep learning (CNN) | ✘ | ✘ | ✘ |
| [11] | Blast furnace | Image + operational data | Hot metal temperature, Si content | Image gray features + process variables (feature-level fusion) | ✘ | ✘ | ✘ |
| [25] | Blast furnace | Tuyere images + sequential numerical data(blast parameters and gas compositions) | Hot metal temperature | Dual-channel CNN–GRU fusion | ✘ | ✘ | ✘ |
| [10] | Blast furnace | Thermocouple | Thermal state | Thermocouple spatial modeling + ensemble learning | Spatial encoding of thermocouples | ✘ | ✘ |
| [8] | Blast furnace | Infrared image + cross temperature-measurer + wall thermocouple + coke/ore ratio | Burden surface temperature field | Fusion based on Reliability Theory and Kalman Filter | ✘ | ✘ | ✘ |
| This paper | Ceramic roller kiln | Image + thermocouple | Temperature of ceramic bodies | Early fusion + temporal modeling + residual compensation | Gaussian pseudo-heatmap alignment | ✔ | ✔ |
Table 2.
Overall temperature detection performance on the validation set (°C). MAE and RMSE are computed after inverse normalization to the Celsius scale. Lower values indicate better detection accuracy.
Table 2.
Overall temperature detection performance on the validation set (°C). MAE and RMSE are computed after inverse normalization to the Celsius scale. Lower values indicate better detection accuracy.
| Method | Modalities | Temporal Modeling | MAE | RMSE |
|---|
| Baseline 1 | Thermo | None | 8.6563 | 11.7361 |
| Baseline 2 | Image | None | 17.2476 | 24.7732 |
| Baseline 3 | Image | LSTM | 6.1107 | 16.8946 |
| Baseline 4 | Image + Thermo | LSTM | 3.1314 | 8.4193 |
| Proposed | Image + Thermo | LSTM + MHA | 0.9164 | 1.2422 |
Table 3.
Per-roller mean absolute error (MAE) on the validation set (°C). The results evaluate spatial detection consistency across six roller positions.
Table 3.
Per-roller mean absolute error (MAE) on the validation set (°C). The results evaluate spatial detection consistency across six roller positions.
| Method | P1 | P2 | P3 | P4 | P5 | P6 |
|---|
| Baseline 1 | 10.0083 | 9.4110 | 8.7454 | 8.2106 | 7.8909 | 7.6717 |
| Baseline 2 | 17.8761 | 19.9082 | 19.8723 | 16.8757 | 15.2483 | 13.7048 |
| Baseline 3 | 6.8743 | 6.6424 | 6.2383 | 5.5502 | 5.3160 | 6.0426 |
| Baseline 4 | 3.5732 | 4.0030 | 2.9682 | 2.7100 | 2.5227 | 3.0112 |
| Proposed | 0.6575 | 1.4389 | 0.7719 | 0.8950 | 0.8051 | 0.9300 |
Table 4.
Ablation study evaluating the contribution of individual components. Each variant removes one key module (Residual Compensation, Pseudo-Heatmap Alignment, or MHA). MAE is reported in °C.
Table 4.
Ablation study evaluating the contribution of individual components. Each variant removes one key module (Residual Compensation, Pseudo-Heatmap Alignment, or MHA). MAE is reported in °C.
| Variant | Residual | Heatmap | MHA | MAE |
|---|
| w/o Residual | ✘ | ✔ | ✔ | 2.3283 |
| w/o Heatmap | ✔ | ✘ | ✔ | 1.9689 |
| w/o Attention | ✔ | ✔ | ✘ | 1.4584 |
| Proposed | ✔ | ✔ | ✔ | 0.9164 |
Table 5.
Progressive addition study starting from the naïve early-fusion baseline (Baseline 4). Each component is added independently to quantify its individual contribution to performance improvement, with MAE reported in °C.
Table 5.
Progressive addition study starting from the naïve early-fusion baseline (Baseline 4). Each component is added independently to quantify its individual contribution to performance improvement, with MAE reported in °C.
| Model | MHA | Pseudo-Heatmap | Residual | MAE |
|---|
| baseline4 | ✘ | ✘ | ✘ | 3.1314 |
| +MHA | ✔ | ✘ | ✘ | 2.6058 |
| +Heatmap | ✘ | ✔ | ✘ | 2.4585 |
| +Residual | ✘ | ✘ | ✔ | 1.4571 |
| Proposed | ✔ | ✔ | ✔ | 0.9164 |
Table 6.
Cross-temperature generalization performance (°C). Validation data include temperature levels of 800 °C, 900 °C, and 1000 °C that are excluded from training. Results evaluate robustness under temperature distribution shifts.
Table 6.
Cross-temperature generalization performance (°C). Validation data include temperature levels of 800 °C, 900 °C, and 1000 °C that are excluded from training. Results evaluate robustness under temperature distribution shifts.
| Method | Modalities | Temporal Modeling | MAE | RMSE |
|---|
| Baseline 1 | Thermo | None | 12.1494 | 15.3796 |
| Baseline 2 | Image | None | 18.8407 | 23.8339 |
| Baseline 3 | Image | LSTM | 30.8856 | 35.5087 |
| Baseline 4 | Image + Thermo | LSTM | 13.9010 | 17.7788 |
| Proposed | Image + Thermo | LSTM + MHA | 4.7253 | 6.4535 |
Table 7.
Comparison between the proposed method and representative temperature detection approaches reported in the literature. Reported errors are extracted from published experimental validations in high-temperature industrial contexts. Direct numerical comparison should be interpreted cautiously due to differences in datasets and application scenarios.
Table 7.
Comparison between the proposed method and representative temperature detection approaches reported in the literature. Reported errors are extracted from published experimental validations in high-temperature industrial contexts. Direct numerical comparison should be interpreted cautiously due to differences in datasets and application scenarios.
| Reference | Target Application | Modality & Method | Reported Lowest Error (°C) |
|---|
| [5] | Cement rotary kiln | Raw image processing & improved ratio pyrometry | 8.0 |
| [10] | Blast furnace hearth | Multimodal ensemble model (thermocouple + spatial temp) | 7.2 |
| [11] | Blast furnace hearth | Time series neural network with multi-information fusion (Tuyere images + online data) | 5.7 |
| [25] | Blast furnace | DCFANet (CNN for Tuyere images + GRU-Attention for numerical data) | 5.4 |
| Ours | Roller kiln | Physics-aware pseudo-heatmap alignment (Visual + Boundary data) | 0.91 |