1. Introduction
The growing global demand for sustainable and low-carbon energy has heightened interest in the co-combustion of biomass with coal as a transitional strategy for decarbonizing power generation. Co-combustion enables the partial replacement of coal with renewable biomass, thereby reducing net greenhouse gas emissions while utilizing existing coal-fired infrastructure. However, despite its environmental benefits, biomass–coal co-combustion presents a variety of complex technical and scientific challenges that have garnered significant research attention over the past few decades [
1,
2,
3,
4,
5,
6,
7,
8].
A major operational issue in power generation is ash deposition and boiler fouling, caused by unpredictable interactions between ash from coal and biomass feedstocks. Biomass ash components, such as alkali metals and chlorine, can lower melting temperatures, leading to increased slagging and corrosion on heat transfer surfaces, which can compromise plant efficiency and lifespan [
1,
4,
7,
9,
10]. Another primary concern is combustion instability, influenced by significant variations in moisture content, volatile matter, and calorific value between coal and biomass [
11,
12]. These variations impact flame dynamics, ignition characteristics, and pollutant emissions, including nitrogen oxides (NO
x), sulfur dioxide (SO
2), and particulate matter [
11,
13,
14,
15,
16,
17,
18]. Emission control is more complex under varying combustion conditions. Temperature spikes or cold zones can increase NO
x, SO
2, and particulate matter emissions, and traditional models based on uniform combustion assumptions often struggle with co-combustion regimes [
19,
20].
Flame instability is particularly likely during the co-combustion of biomass and coal due to several interacting factors related to the fuels [
20]. Biomass usually has a higher moisture content, which absorbs combustion energy during evaporation and reduces local flame temperatures, thus delaying ignition [
4,
14]. Additionally, it typically shows higher volatile matter and lower ignition temperatures, resulting in a rapid and sometimes unpredictable release of combustible gases, which can destabilize flame structure and affect combustion intensity [
21,
22,
23]. The heterogeneous size, shape, and composition of biomass particles also contribute to uneven burning and inconsistent flame propagation [
10,
11]. At the same time, variable ash chemistry introduces additional instabilities through its influence on deposition and flame temperature [
7,
11]. Moreover, the interactions between biomass and coal can be either synergistic or inhibiting, leading to nonlinear combustion behaviors that depend on the blending ratios utilized [
5,
24]. Finally, the system’s sensitivity to fuel feed rate and air distribution significantly increases during co-combustion, where even slight fluctuations can result in local flame quenching or combustion issues [
11,
12,
25,
26]. These factors collectively contribute to increased unpredictability and control difficulty in co-combustion systems compared to coal-only combustion [
10,
11,
14].
Due to combustion instabilities and ash-related issues, effective real-time monitoring and control mechanisms are crucial [
23,
27,
28,
29]. Traditional combustion sensors often lack the precision and responsiveness needed for co-combustion environments, which are marked by rapid changes and fuel variability. As a result, an increasing amount of research is dedicated to developing advanced flame detection and diagnostic methods that leverage modern imaging techniques, deep learning, and the integration of multiple sensors [
18,
21,
30,
31,
32,
33].
Notably, deep convolutional neural networks (CNNs) such as VGG16 have demonstrated exceptional performance in classifying combustion states from high-resolution flame images, often surpassing conventional sensor-based approaches in accuracy and adaptability [
21,
29,
30,
31,
32,
33,
34]. By utilizing adaptive image preprocessing techniques such as gamma correction and segmentation, these models effectively extract relevant features in varying lighting conditions and dynamic backgrounds typical of biomass–coal co-combustion [
17,
27,
30,
33,
35]. Additionally, real-time processing capabilities enable the classification of combustion states and instability detection within milliseconds, supporting prompt feedback and automated control [
19,
27,
32].
In addition to image-based analysis, multi-sensor fusion strategies combine visual data with traditional measurements, such as temperature, gas concentrations, and chemiluminescence, to provide a more comprehensive understanding of combustion dynamics [
21,
36,
37]. Using predictive monitoring with deep neural networks trained on historical process data allows for early detection of instability risks and supports condition-based maintenance strategies, ultimately reducing operational downtime [
21,
32,
38].
To improve robustness and reduce false alarms—an essential concern in highly variable optical environments—researchers have introduced improved flame segmentation algorithms, such as active contour models, and applied hyperparameter optimization techniques (e.g., random search, Bayesian tuning) to enhance detection accuracy under different combustion modes [
21,
27,
31,
39].
In response to these challenges, recent advances have exploited flame imaging as a non-intrusive, real-time diagnostic tool. High-resolution visual, thermal, or infrared images of the combustion flame provide detailed spatial and temporal information about the combustion regime. These images can be continuously analyzed using deep learning methods, particularly CNNs, which have emerged as a state-of-the-art approach for combustion state classification [
40,
41]. CNNs autonomously learn hierarchical visual features from raw flame images, enabling precise detection of operational states such as stable combustion, ignition failure, onset of slagging, or fuel imbalance. Unlike traditional rule-based image processing, CNNs can handle noisy, high-dimensional data and generalize previously unseen process variations [
23,
30,
41].
Additionally, autoencoders have been employed to detect anomalies by learning baseline flame behaviors and flagging deviations indicative of process disturbances [
22,
30]. Generative Adversarial Networks are used for synthetic flame image generation to augment training datasets, especially when real-world failure data are scarce [
22,
30,
40,
42].
Modern combustion optimization frameworks now incorporate CNN outputs into closed-loop control algorithms. These systems adjust air-fuel ratios, burner tilts, or feedstock blending rates in real time, based on detected flame characteristics. This feedback mechanism effectively mitigates the issues caused by fuel stratification and enables more uniform, efficient, and cleaner combustion [
40,
43].
Conclusions drawn from the work of many researchers point to the superiority of convolutional networks in classification applications compared to classical methods. Studies such as Hasan et al. (2019) [
44] and others show that CNNs outperform SVMs, random forests, and classical artificial neural networks in flame and other image classification tasks, both in terms of accuracy and robustness, especially in the case of process variability and large heterogeneous datasets [
45]. Classic methods require manual design of domain-specific features, while CNNs learn hierarchical features directly from the data. Another advantage of CNNs is that they do not require extensive preprocessing and offer better scalability for larger datasets [
44].
Recent studies demonstrate that such CNN-based solutions can outperform classical machine learning methods, offering real-time classification accuracy exceeding 90%, reduced unplanned downtime, and measurable reductions in pollutant emissions [
23,
41,
43,
46]. Practical deployments in industrial furnaces and fluidized-bed reactors validate the scalability of these methods [
20,
43]. Ongoing research now focuses on reducing data labeling costs through self-supervised and transfer learning [
47]; improving model interpretability, essential for industrial trust and compliance; and implementing real-time embedded CNNs on edge devices for on-site diagnostics [
48].
The presented work highlights how combining high-speed flame imaging with deep learning, specifically convolutional neural networks (CNNs), enables the rapid and accurate detection of potentially critical combustion states in biomass–coal co-combustion. The paper builds upon the research presented in [
29]; however, rather than performing additional operations on an image, including gamma correction and ROI (region of interest) determination, the whole frame was the model’s input with no preprocessing. The presented research involved a wider set of CNNs tested with a broader range of input images. It is also worth mentioning that the image dataset was substantially larger, which potentially led to better performance of the CNN model.
2. Materials and Methods
2.1. Co-Combustion Test Facility
The co-combustion tests were performed in a laboratory-scale test facility with a maximum thermal power of 0.5 MW. The main part is a combustion chamber, 2.5 m long and approximately 0.7 m in inner diameter. The facility is equipped with a horizontally mounted low-emission swirl dust burner, an oil burner for auxiliary heating, and a gas igniter. The primary role of the oil burner is to quickly elevate the combustion chamber temperature to a level sufficient for the release and ignition of the fuel’s volatile components.
The combustion chamber features inspection openings on both sides, providing visual access to the flame. The imaging system was a water-cooled endoscope by Cesyco with a lateral view and a field of view of 40° in both horizontal and vertical axes. The endoscope was inserted into the chamber at a 60-degree angle. Due to the endoscope’s mounting location and the relatively short distance from the burner, the monitored area included the zone near the burner outlet. Details of endoscope mounting, revealing the geometric relations, are shown in
Figure 1.
The flame images were captured by a high-speed color camera (by Mikrotron GmbH, Germany, model MC1311) attached to the borescope and then transmitted to a mass storage with CameraLink interface. The camera is equipped with an area scan CMOS sensor capable of capturing an image series of 1280 × 1024 resolution at a rate of 500 frames per second (fps). The sensor has a dynamic range of 59 dB, with peak sensitivity at 550 nm. The image sequences were stored in a lossless format.
The fuel blend was prepared in advance and stored in a solid fuel tank, from which it was delivered to the burner via a screw feeder and transporting air (called the primary air). The fuel delivery rate is directly proportional to the feeder’s rotational speed. The fuel flow was calculated as the difference in the weight of fuel stored in the tank over a specified time. The burner has a separate inlet for the secondary air, which is a factor determining the stoichiometry of the combustion process for a given fuel flow. The total airflow is the sum of the primary and secondary airflows.
The combined airflow and fuel flow set the stoichiometry of the combustion process. To minimize the emission of nitric oxides, the fuel is usually burned outside the stoichiometric proportions, which is described by the excess air coefficient, λ (lambda). λ is defined as a quotient of the actual air mass required to combust a unit mass of the fuel to a mass of the air, ensuring complete combustion.
The fuel mixture comprised hard coal with a 10% proportion of straw by weight. Both fuel and air flows were adjusted to achieve the defined states, known as variants of the co-combustion process, characterized by fixed values of thermal power (Pth) and λ. The thermal power generated during the combustion process depends on the calorific value of the tested fuel blend.
2.2. Co-Combustion Test and Data Acquisition
Nine settings of the co-combustion process with three settings of Pth (250, 300, and 400 kW) and three settings of λ (0.65, 0.75, and 0.85) have been defined. For the given fuel blend with known calorific value, it corresponded to exact values of fuel and air flows, as listed in
Table 1.
The experiment began by preheating the combustion chamber to a temperature suitable for igniting the fuel mixture. Once the target temperature was reached, the fuel mixture began to be delivered to the burner. The combustion test started with a thermal power setting of 400 kW, which was subsequently decreased to 300 kW and then to 250 kW. For each Pth setting, the fuel and air flows were adjusted to achieve the required values of λ, as shown in
Table 1. The order of the combustion variants was as follows: #7, #9, #8, #5, #4, #6, #2, #1, #3.
Figure 2 presents the timeline of the co-combustion experiment, showing the actual readouts of total air flow and fuel flow, as well as the temperature inside the combustion chamber. The temperature was measured ~1.2 m off the burner’s nozzle. The lack of fuel flow readouts, as shown in
Figure 2, corresponds to the moments in time when the fuel mixture was added to the fuel tank. It is also the reason for abnormally high readouts of the fuel flow, which exceeds 200 kg/h.
A single combustion variant was maintained for about 4 to 5 min, during which a series of flame images was recorded. Recordings for each variant took 80 s, except variant #3, which took 99.2 s. During the entire experiment, 24-bit RGB images were captured at a rate of 150 frames per second (fps) with an exposure time of 4.3 milliseconds. All other camera settings, such as analog gain and white balance, have remained unchanged.
The exact moments at which recordings for each combustion variant were made are depicted in
Figure 2 as the mean values of the R, G, and B components of individual image frames. The extra flame images have been captured as well. The additional data marked as “other” were collected throughout the entire experiment timeline in a total of 11 separate recordings of various lengths. All the data were merged into one set that represents not any of the defined variants and is the so-called “other” variant. The recordings were made shortly after each of the defined co-combustion variants or during transition from one of the defined process states to the other. Detailed information on their occurrence in time is presented in
Figure 2.
The actual readouts of air and fuel blend flows differed from those of the defined combustion variants. Moreover, they were not constant. The variability of the air and fuel flows, as recorded simultaneously with flame images, is shown as boxplots in
Figure 3a,b, respectively. Typically, the boxplots visualize the main statistical parameters of the variable, such as the upper and lower quartiles, the interquartile range (IQR), the median, the smallest and largest values within 1.5 times the IQR, and outliers. It can be observed that the actual readouts of airflow differ from the reference values, especially for variant #5, where the median is slightly less than 20 Nm
3/h, which corresponds to approximately 10% of the relative difference.
Fuel flow values are close to the reference value for the variants #1, #2, and #3. In the case of the other variants, the actual fuel flow values are higher, indicating a higher thermal power compared to the desired settings (
Table 1). The relative difference between the median and reference value is not greater than about 20% (variant #6). It can be observed that air and fuel flow values marked as “other” have a much greater spread, as the data were collected throughout the experiment. Moreover, lower air and fuel flow values have a stronger representation compared to their higher counterparts.
The data, comprising a total of 110,880 images with timestamps and co-combustion variant identifiers (variant #1–variant #9), were stored in a single file. Each variant was represented by 12,000 frames, except variant #3, for which 14,880 frames were collected. Images belonging to the “other” variant were collected in a separate file, containing 139,632 images. Because the dataset containing additional image frames has a significantly higher number of entries, 12,320 were randomly selected from the dataset and merged with the data containing images of the defined nine variants. The number of images for each variant, including the “other” category, was similar. All the images have been stored as 24-bit RGB, with a resolution of 800 × 800 pixels.
The division into training and validation sets was performed using the train_test_split() function included in scikit-learn library, which, according to the user guide, prevents data leakage between training and validation partitions. The “stratify” parameter was a vector of target class labels. It ensures that the class distributions are nearly identical in both the training and test datasets. The “stratify” parameter was set (by convention) to 42 (any integer is acceptable). It guarantees that the same indices go into the training and test sets for every execution. The train_test_split() function was applied to the dataset containing the images and class labels for all nine defined combustion variants. The same function was also used to extract the training and validation sets containing only flame images labeled as the “other” class. Then, the two parts of the training and validation sets were joined into a single training and a single validation set. Additional item shuffling within the training and validation set ensured uniform distribution of all 10 classes within the corresponding datasets.
Figure 4 illustrates the variability of the RGB components in flame images. In all the cases, the dominance of the R component can be observed. Although lower values of all components are noticeable for variants #1, #2, and #3, this does not mean that an increase in their amplitudes accompanies an increase in thermal power. The values of all RGB components are far from saturation and are far from reaching their maximum value (255).
Figure 5 shows series of flame images captured every 6.67 s (every 1000th frame). It reveals how the flame’s brightness, shape, and structure evolve, providing a qualitative comparison of the co-combustion process under each variant. Each row represents a single experimental variant (from variant #1 to variant #9), with the bottom row marked as “other”.
Figure 4 and
Figure 5 illustrate the variability of flame images for the 10 classes considered, but in different ways.
Figure 4 shows basic descriptive statistics of RGB component brightness in the form of a box plot, where the median, brightness spread in the form of interquartile range, and outliers are visible.
2.3. Tested Neural Network Architectures
The dataset was split into training and validation sets at a ratio of 2:3 and 1:3, respectively. The total number of images in the training and validation datasets were equal to 82,543 and 40,657, respectively. Before entering the neural network, the images were scaled down to the appropriate resolution, and the RGB values were mapped to the [0, 1] interval.
The idea behind choosing convolutional neural network (CNN) architectures was to achieve diversity in terms of their structure, complexity, and resource requirements, making them potentially suitable for use in memory- or computational power-constrained systems. The architectures range from well-known, having relatively simple architectures such as VGG16/VGG19, to more sophisticated ones with residual connections, including ResNet50/ResNet152, and with inception modules, as found in InceptionV3 and Xception. The other neural network architectures included MobileNet, DenseNet-121, NASNet-Large, EfficientNet-B0, and ConvNeXt-Base. All the CNNs were tested with a broad range of input image resolutions, 75 × 75, 100 × 100, 150 × 150, and 224 × 224.
The learning phase is followed by applying the convolutional base of one of the mentioned CNN architectures atop a separate classifier, as shown in
Figure 6. The classifier consists of two dense and one dropout layers. The first approach was the transfer learning (TL) technique, which consists of “freezing” the pre-trained convolutional base and training the classifier only. It significantly reduces the time needed for training, especially for complex architectures. The convolutional database was trained using the ImageNet dataset, containing 1.4 million images of everyday objects (e.g., chairs, cars, people) grouped into 1000 categories [
49]. The image dataset collected during the co-combustion experiment has completely different content compared to ImageNet. That was the reason for examining the second approach of training the CNNs from scratch [
50].
All the CNN architectures were implemented in Python and TensorFlow library. The maximum number of training epochs was set to 150, with the possibility of early stopping after 20 epochs if the model had not shown improvement. The initial value of the learning rate was arbitrarily set to 0.0005, and it was halved if the loss after four epochs did not decrease. The model was trained using the Adam optimizer to minimize the categorical cross-entropy loss function, a commonly applied approach in multi-class classification problems [
51]. After each training epoch, the model was saved in HDF5 format, provided that the loss value calculated for the validation dataset improved.
The models were trained on a machine equipped with A100 GPUs, running under Ubuntu 22.02 operating system, with Python 3.12.3, CUDA 12.8, TensorFlow 2.17, Scikit-learn 1.5.2, and developed in Jupyter 4.3.5. To summarize, eleven different CNN architectures were tested, each for four input image resolutions, with and without transfer learning.
3. Results
Figure 7 presents validation loss plots for the lowest (75 × 75) and the highest (224 × 224) tested input image resolutions. The figures reveal the effects of applying the transfer learning paradigm for all the CNNs tested.
In cases of applying transfer learning, the validation loss declines smoothly as shown in
Figure 7a,b, typically stabilizing after approximately 20–40 epochs of learning, for both resolutions. For an input resolution of 75 × 75, the validation loss drops below 0.3 for the three networks (MobileNet, DenseNet121, Xception), with even better results for higher resolutions, reaching below 0.2 for 224 × 224. On the other hand, higher validation losses are observed for ConvNeXtBase, and particularly for EfficientNetB0.
When learning from scratch, most models initially have very high validation losses, as shown in
Figure 7c,d, which exceed 10 and even 12 for higher resolutions. After ~15–20 epochs, losses plummet and stabilize at low values for many architectures, with values below 0.1 for Xception and ResNet50 at a resolution of 75 × 75. Increasing the resolution, validation loss drops below 0.05 for EfficientNetB0, Xception, InceptionV3, and MobileNet.
Evaluation of multi-class classification models can be achieved by many different parameters, enabling a comparison of their performance. The number of items belonging to 10 classes (nine variants with an additional “other” class) is not equal. The most considerable count difference between the class with the highest and the lowest number of items is ~24%. Therefore, metrics as accuracy, weighted average precision, recall, and F1-score were used for qualitative assessment of the tested CNN models.
Table 2 presents the learning score values for the best models, with a weighted average precision of not worse than 0.95. Many models, particularly EfficientNetB0, InceptionV3, MobileNet, ResNet152, Xception, VGG16, and VGG19—achieve perfect or near-perfect scores for all the tested resolutions. It is worth noting that the transfer learning paradigm yields worse score results, accompanied by highly unsteady learning curves.
The excellent classification metrics are always suspicious. In this case, the relatively short time in which the flame images were recorded, combined with high FPS, may have had an impact on the variability of the image data. It should be emphasized that such good classification results were obtained under laboratory conditions. In industrial conditions, where the visual flame recognition system operates continuously, dust settling on the optical elements of the borescope and slagging are serious challenges. According to the author’s experience in applying optical methods for diagnosing combustion processes in power boilers, these factors cause visible blurring in the captured flame images and often limit the field of view. However, the results obtained allow for the selection of potentially the best architectures in the application presented.
More detailed information on model performance can be found by analyzing the error matrix.
Figure 8 shows the error matrices for two example models that achieved the best results for the highest and lowest input resolutions, InceptionV3 and ResNet50, as shown in
Table 2. They were learned from scratch.
Both confusion matrices indicate near-perfect classification between the nine defined co-firing variants. Incorrect classifications can be observed for the “other” variant. It is often confused with variants #2 and #5. The total percentage of incorrect indications for the “other” variant for the cases shown in
Figure 8a,b is 3.5% and 9.5%, respectively. On the other hand, the percentage of erratic indications as the “other” variant for the same cases equals 0.9% and 3.3%, respectively.
The earlier discussed CNN models with the corresponding parameters (InceptionV3, 224 × 224, no TL, and ResNet50, 75 × 75, no TL) were examined for all the data representing the “other” variant. As mentioned in
Section 2.2, the image data were collected throughout the co-combustion experiment shortly after each of the defined co-combustion variants or between them. The total number of images representing the “other” variant (139,632) was larger than the sum of the images belonging to all nine defined ones (110,880). Although only ~10% of the images of the “other” variant were used for model training, the entire dataset was utilized for inference, and the results are presented in
Table 3.
It can be noticed that in both tested cases, the percentage of true positives exceeds 90%, even with an input resolution of 75 × 75. A larger number of false positives can be spotted for variants #2 and #5.
The “other” class was constructed from transition states and post-variant images, representing a wide range of combustion parameters. Many of these overlap with the operational regions for variants #2 and #5 (as confirmed by the statistical spread and boxplots of air and fuel flows). Although efforts were made to balance the number of images in each class, the inherent variability and diversity inside the “other” category may have led to underrepresentation of its full spectrum in the training data, increasing the likelihood of confusion with classes that are parameter adjacent.
4. Discussion
The research revealed that modern CNNs, particularly those listed in
Table 2, exhibit very good or near-perfect recognition of the defined nine combustion states. It follows that the examined classes of flame images corresponding to the combustion variants are easily separable, even though the captured flame shapes during each variant vary in time, as shown in
Figure 5. The actual air and fuel flow readouts were slightly different from the desired values of the defined combustion states. Data quality is always crucial, and it largely depends on the setup of the imaging system, particularly in how it is targeted at the flame; however, it was not possible to examine.
It is worth noting that the outstanding performance of some models is evident across all input image resolutions tested, including 75 × 75. It means that for classification tasks, the shape of the flame is more important than the details of the flame structure present in higher-resolution images. It would be essential in the case of model deployment on memory-constrained devices.
Analyzing the metrics presented in
Table 2, one can conclude that learning from scratch, although more time-consuming, leads to better performance. In typical machine learning scenarios, TL is expected to yield better or at least the same results, especially with limited data. The learning curves obtained when transfer learning was applied are generally smoother compared to those when models were trained from scratch.
Many deep models pre-trained on ImageNet learn high-level object features (e.g., edges, shapes, parts) that do not translate well to texture-based information in flame images. Classification in combustion conditions relies on subtle changes in spatial brightness, local patterns, and flame boundaries—areas where object-based filters may be less informative, as pointed out in [
52,
53]. It is even more relevant given the presence of dust inside the combustion chamber. On the other hand, however, structural features, especially in the case of turbulent flames, depend on the exposure time of the image sensor, which obviously affects the classification metrics. Since the image data were acquired at a fixed exposure time, it is not possible to determine its impact on the quality of flame image classification. Self-supervised and domain-adaptive approaches could provide improved generalization to new fuel blends or operating conditions or enhanced robustness in real-world deployment scenarios.
It is challenging to compare the results with those obtained by other researchers due to different experimental setups, fuel properties, and combustion conditions. Another factor is the parameters of the imaging systems, among others, such as the field of view, spatial resolution, or the image sensor’s acquisition time. Additionally, the use of diverse data preprocessing methods, model architectures, and evaluation metrics further complicates cross-study comparisons.
CNNs are established as the standard for high-fidelity classification of combustion states where annotated image data can be obtained, offering direct interpretability and transferability to real-time monitoring tasks [
54]. On the other hand, autoencoders excel in anomaly detection and stability monitoring. Still, they are less effective for fine-grained discrimination between similar process classes compared to CNNs, due to reliance on reconstruction error rather than class discrimination [
55,
56]. Generative adversarial networks are primarily used to generate synthetic flame or combustion data, thereby enhancing data diversity and model generalization. They are not typically used alone for classification but can supplement training by providing balanced datasets for CNNs [
42,
57].
Unlike some of the literature sources, which rely on pre-filtering (gamma correction, segmentation, ROI extraction), this study demonstrates that CNNs can accurately classify combustion states from raw, unprocessed, full-frame flame images. It proves that deep networks autonomously identify discriminative spatiotemporal features—such as overall flame morphology and time-evolving structure—that correlate with underlying process regimes, even under significant optical and process variability. The results reveal that combustion state separability relies more on global flame shape than on fine-grained texture or high-resolution detail. This crucial insight enables viable deployment of CNN-based classifiers on memory- and computer-constrained embedded systems, which would be beneficial for real-time, edge-based combustion diagnostics in harsh industrial environments. The precise analysis showing transfer learning underperforms based on domain mismatch (object vs. texture features) in high-temperature process monitoring offers new explanatory insights, motivating future research into domain-adaptive or self-supervised pretraining tailored for non-traditional image domains like combustion imaging. The study establishes the feasibility of using CNNs for real-time, simultaneous classification of multiple combustion regimes, not just binary or anomaly detection. The validated robustness across input resolutions and architectures supports scalable deployment in modern industrial automation pipelines.
5. Conclusions
The successful combination of deep learning, image processing, and multi-sensor data fusion demonstrates significant promise for overcoming diagnostic and control challenges in biomass–coal co-firing. These advanced technologies not only facilitate earlier and more accurate anomaly detection but also enable real-time system optimization, enhance plant safety, and boost combustion efficiency. The research thus addresses key scientific and operational barriers to the environmental and technical viability of biomass–coal co-combustion by leveraging state-of-the-art CNN-based image diagnostics.
The models described in the article were tested on four high-end A100 cards. We did not examine inference time parameters because, in real-world conditions, and even more so in industrial conditions, edge devices are typically used, whose computing power and hardware resources are significantly lower than those of the machine used to train the models. Unfortunately, at the time of writing, it was not possible to implement the models on edge devices. Due to their limitations, it can be assumed that achieving such high recognition quality indicators for the combustion process or inference time would not be possible. It is an essential aspect that the authors are aware of and intend to address in the future.
The presented results show the potential for effective diagnostics of the co-combustion process of biomass and coal dust. It should be emphasized that they refer exclusively to a laboratory stand designed for the combustion of pulverized fuels, in which a single swirl burner is installed. Power boilers are multi-furnace systems in a wall or corner configuration. In this case, the combustion process in a single burner cannot be considered in isolation from the influence of the other burners. Therefore, the results obtained cannot be directly referred to the co-combustion process carried out in full-scale facilities.