1. Introduction
The cement industry is one of the pillars of modern infrastructure development, but it is also among the most energy-intensive and polluting sectors globally. The production of cement is responsible of an important share of the global carbon dioxide emissions, contributing to climate change and environmental degradation [
1]. In this context, the cement industry is searching for sustainable practices, with a growing focus on the utilization of alternative fuels like biomass or waste fuels to mitigate environmental impacts and enhance energy efficiency [
2]. Among these, refuse-derived fuel (RDF) has emerged as a promising solution, offering a pathway to reduce reliance on traditional fossil fuels like petcoke and incorporate waste management principles into energy recovery [
3].
RDF is a kind of fuel produced from various types of waste: municipal solid waste (MSW), industrial waste, and commercial waste. The use of RDF in cement kilns has gained traction as a method to improve sustainability in the cement industry.
The global generation of MSW continues to increase, leading to significant environmental concerns due to the large portions that end up in landfills or are incinerated [
4]. Converting MSW into refuse-derived fuel (RDF) offers a sustainable solution, especially for the cement industry, by providing an alternative fuel source that can help reduce the reliance on conventional fossil fuels like coal [
5,
6]. RDF is derived from non-recyclable components of MSW and has a calorific value similar to traditional fossil fuels, making it suitable for use in cement kilns for clinker production. Using RDF in cement kilns not only conserves natural resources but also helps lower greenhouse gas emissions. This is due to RDF’s biogenic content, which is considered carbon-neutral, and the overall reduction in emissions of pollutants like sulfur oxides (SOx), nitrogen oxides (NOx), and particulate matter compared to coal combustion [
7,
8,
9].
Recent studies have emphasized the potential of RDF in enhancing the energy efficiency and environmental sustainability of cement production by decreasing the reliance on traditional fossil fuels like coal, oil, and natural gas, helping to conserve these non-renewable resources [
3,
7,
10]. These investigations reveal that the adoption of RDF can lead to significant improvements in the carbon footprint of cement kilns, with the added benefit of addressing the escalating issue of waste disposal. Furthermore, the integration of RDF into cement manufacturing aligns with the principles of the circular economy, promoting the reuse of waste materials and the optimization of energy consumption [
11]. Some authors have calculated that, economically, integrating 15% RDF into cement kiln fuel can save more than 20,000 tons of petcoke annually, reduce CO
2 emissions by more than 16,000 tons/year, and result in net savings of approximately USD 3 million per year due to decreased fuel and CO
2 costs in the case of the cement industry in Jordan [
12]. However, the transition to RDF as a primary fuel source in cement kilns is not without challenges. Technical, economic, and regulatory hurdles must be overcome to facilitate the widespread adoption of this practice. The heterogeneity of RDF, along with the need for pre-treatment processes to meet industry standards, poses technical challenges that require innovative solutions for the efficient combustion of RDF in rotary kilns [
5]. The use of RDF as a substitution fuel in cement kilns poses potential risks due to the presence of heavy metals, particularly the more volatile ones, which can transfer to gaseous emissions, so the suitability of RDF as a fuel is contingent on its quality and the use of RDF must be assessed on case-by-case basis to ensure environmental safety [
13].
Regarding combustion stability, the maximum thermal substitution rate (TSR) achieved with RDF can reach 80–100% in the calciner, while it is limited to 50–60% in the kiln burner. Advances in pre-combustion technologies, multi-channel burners, and new satellite burners have facilitated high TSR. Extensive modeling of kiln burners and calciners has further enhanced TSR [
8]. Other authors analyzed the use of some alternatives for the direct firing of RDF in rotary kilns, like the gasification of the RDF [
8].
Rotary kilns, essential for producing lime and cement clinker, utilize direct flame heating for multiple chemical processes. Effective monitoring and control of this process are essential to ensure efficient fuel utilization, maintain product quality, and comply with environmental regulations. Effective monitoring includes maintaining the appropriate temperature profile, which is crucial for complete RDF combustion and the prevention of pollutant formation, achieved through the use of sensors and thermocouples along various zones of the kiln [
8]. Infrared pyrometers and optical sensors are employed to monitor flame temperature, ensuring it remains within the optimal range for efficient combustion [
14,
15]. Continuous emissions monitoring systems track CO and CO
2 levels to ensure complete combustion and monitor process efficiency. Monitoring of NOx and SOx emissions is critical due to regulatory requirements, with some systems which provide real-time data facilitating process adjustments to minimize emissions [
12]. Due to the higher content of volatile heavy metals in RDF, it is essential to monitor their levels in gaseous emissions, and specialized filters and detectors measure concentrations of elements such as Sb, Hg, Cd, and Pb [
12]. Automated feed systems regulate the rate at which RDF is fed into the kiln, maintaining a consistent fuel supply and preventing fluctuations that could impact combustion efficiency [
14]. Other studies present advanced models with prediction of the temperature inside the rotary kiln [
16,
17], predictive control techniques [
18,
19,
20], or the analysis of the features of the flame video to recognize patterns [
20].
These methodologies enable operators to make informed decisions regarding adjustments to the operation of the kiln. By analyzing data from these various sources, operators can optimize fuel consumption, enhance clinker quality, extend the lifespan of the kiln, and reduce the risk of unplanned downtime [
21].
The literature indicates the crucial relationship between flame visual representation and cooking zone temperature [
22,
23,
24]. Flame images and various internal components of the rotary kiln provide critical operational condition information. Thus, some methods utilizing computer vision techniques have been developed to achieve intelligent control of rotary kilns [
24,
25,
26,
27,
28]. Among these techniques are algorithms for segmenting the flame in the combustion zone of a rotary kiln based on texture granularity through Gabor transform and Fuzzy C-MEANS clustering [
25]. Other approaches use heterogeneous features, such as color and global and local configuration characteristics extracted directly from pixel values without segmentation. Additionally, machine learning techniques are increasingly applied, including image recognition methods using neural networks for flame state identification, albeit with high processing times [
27,
29].
Other methods analyze flame images to extract texture features like energy, entropy, and inertia, employing singular value decomposition (SVD) [
30], support vector machines (SVMs) [
31], and K-means [
28] for feature extraction and classification of flame images to recognize rotary kiln working conditions. Recent studies have combined filters or image segmentation to highlight regions of interest before using neural networks for condition recognition, slightly improving recognition accuracy [
24,
32,
33].
On the other hand, the application of deep learning techniques in image recognition has made significant progress, with models like the convolutional neural network VGG-16 used for feature transfer, training, and testing flame images in different combustion states to achieve automatic combustion state identification [
34]. However, the application of deep learning in recognizing rotary kiln working conditions still holds substantial development potential for more complex analyses and recognitions.
The lack of labeled sample data complicates feature extraction in neural networks, and the training process is prone to overfitting. To reduce deep learning models’ dependence on training sample size, transfer learning can be applied to classification, detection, or segmentation tasks to accelerate training efficiency. Therefore, to overcome limitations due to the lack of massive data, it is proposed to start with state-of-the-art deep learning methods previously trained with massive datasets and apply transfer learning strategies to adapt them to the specific problem of instance segmentation within the rotary kiln [
35].
Consequently, the use of deep learning for predictive maintenance and process optimization can lead to significant energy savings. By accurately predicting when maintenance is needed, DL models help prevent unplanned downtime and reduce energy waste associated with inefficient operations. Additionally, optimizing the combustion process ensures that the kiln operates at peak efficiency, minimizing energy consumption and associated costs [
36,
37].
1.1. Instance Segmentation
Computer vision, a rapidly growing interdisciplinary field, prominently features object detection as one of its primary applications. Object detection involves the dual tasks of locating objects within an image and classifying them. This task comprises two challenges: object location and classification in an image.
Advanced object detection methods are divided into two main categories: one-stage detectors and two-stage detectors. Two-stage detectors use a region proposal network (RPN) to generate regions of interest (ROIs), which are subsequently classified in a second step. In contrast, one-stage detectors integrate both tasks into a single process. Generally, two-stage detectors prioritize detection accuracy, while one-stage detectors focus on inference speed and are suitable for real-time applications [
38].
Object detection typically aims to approximate object location in an image, obtaining a bounding box, as shown in the example of
Figure 1. Image segmentation can be approached in various ways, ranging from assigning semantic labels to individual pixels (semantic segmentation) to partitioning individual objects (instance segmentation). Instance segmentation can distinguish isolated objects and separate them into different instances of the same class, providing detailed information about each individual entity in the scene.
The outcome of an instance segmentation model is a set of masks or contours delineating each object in the image, accompanied by class labels and confidence scores for each object. Instance segmentation is particularly useful when it is necessary to determine not only the location of objects within an image but also their exact shape.
Figure 1.
Image analysis techniques: classification, object detection, and segmentation.
Figure 1.
Image analysis techniques: classification, object detection, and segmentation.
1.2. Mask R-CNN
Mask R-CNN [
39], proposed in 2017, is an instance segmentation method that evolved from Faster R-CNN [
40], an advanced object detection model based on convolutional neural networks (CNNs). Mask R-CNN extends Faster R-CNN by adding instance segmentation, allowing precise identification and delineation of objects in images by generating specific masks for each detected object. Its two-stage architecture incorporates a robust mechanism for generating pixel-level instance masks [
39]. In Mask R-CNN, the image is first processed by a convolutional neural network that provides a convolutional feature map. Subsequently, another network (region proposal network, RPN) is used to predict the proposals for regions of interest (ROIs). These ROIs are then refined and classified while high-precision instance masks are simultaneously generated. A critical component of Mask R-CNN is the ROI Align technique, which ensures accurate alignment of the feature maps with the instance masks, significantly contributing to the quality of the resulting masks. This method achieves an inference speed of approximately five frames per second (FPS), which is a step towards real-time performance. Additionally, the impact of this method was significant, especially since the code and model were released after its publication. Furthermore, researchers have introduced variations and extensions, such as Cascade Mask R-CNN [
41] and Panoptic FPN [
42], which enhance its versatility and capabilities.
1.3. YOLO (You Only Look Once)
Unlike some approaches that use two stages for detection and segmentation, YOLO (You Only Look Once) [
43] employs a single-stage architecture integrating both tasks in one step, significantly enhancing efficiency.
Figure 2 shows the evolution of the different versions of YOLO throughout the years. YOLO, launched in 2015, quickly gained popularity for its speed and accuracy, capable of inferring at 45 FPS. Subsequent versions YOLOv2 [
44] and YOLOv3 [
45] introduced significant improvements. A highly efficient model with outstanding performance in real-time instance segmentation, known as YOLACT [
46], was developed based on an encoder–decoder architecture.
In 2020, two new versions of YOLO were published: YOLOv4 by Alexey Bochkovskiy on Darknet [
47] and YOLOv5 by Glenn Jocher in a PyTorch implementation [
48]. Later, YOLOv6 [
49] was developed by Meituan in 2022, but until the release of YOLOv7 [
50], segmentation models and additional tasks, like pose estimation on the MS COCO keypoints dataset, were not added [
51]. YOLOv8 [
52] and YOLOv9 [
53] are the most recent and advanced versions of the real-time object detection and segmentation algorithm, building on the success of previous versions and introducing new features and improvements to enhance performance, flexibility, and effectiveness.
In this study, the YOLOv8 algorithm has been used for monitoring the combustion zone of a rotary kiln. This selection has been based on several factors, including performance, adaptability, technical integration, and application-specific benefits. YOLOv8, similar to its predecessors, is distinguished by its real-time performance capabilities, offering high frame rates essential for continuous monitoring in industrial settings. Previous studies have demonstrated that YOLO-based models can achieve inference speeds upwards of 45 FPS, making them highly suitable for real-time applications [
45,
47]. The need for prompt detection and segmentation in a dynamic environment such as a rotary kiln makes YOLOv8 particularly advantageous. Additionally, YOLOv8 benefits from transfer learning, leveraging pre-trained models on extensive datasets. This allows for effective fine-tuning with smaller, application-specific datasets, thereby reducing the overall training time while maintaining high accuracy [
54]. In comparison, other algorithms might require more extensive data and longer training periods to achieve comparable performance. This efficiency is crucial in industrial applications where rapid deployment and adaptation are necessary. The practical deployment of YOLOv8 in industrial environments is facilitated by its ease of implementation and resource efficiency. YOLOv8 is designed to run efficiently on standard hardware, making it feasible for real-time monitoring without requiring specialized computational resources [
49]. The dynamic and high-temperature environment of the combustion zone presents unique challenges, such as fluctuating light conditions and high levels of particulate matter. The robust single-stage architecture of YOLOv8 is well suited to handle these challenges, providing reliable segmentation and detection under varying conditions [
54,
55]. Additionally, the capability of YOLOv8 to handle real-time inference with high accuracy ensures that it can effectively monitor and control the combustion process, enhancing operational efficiency and safety.
1.4. Segment Anything Model (SAM)
Segment Anything (SAM) [
56] is an innovative object segmentation model based on convolutional neural networks (CNNs) and reinforcement learning. SAM is a segmentation system that can be activated through prompts and possesses a global understanding of the nature of objects (zero-shot generalization). This means it can segment any element in an image without necessarily having ever seen objects of the same class before.
In the SAM model, the image is initially processed by an encoder. Subsequently, another network (Prompt Encoder Network, PEN) encodes prompts provided by the user or generated automatically. These prompts can take various form, such as words, boxes, masks, or dots. Next, a lightweight network (Mask Decoder Network, MDN) decodes the features and prompts into pixel-level instance masks. A key component of SAM is the Prompt Align technique, which enhances feature extraction from the prompts, ensuring precise alignment of the feature maps with the instance masks, significantly contributing to the quality of the resulting masks.
This paper aims to extend the role of computer vision technologies in monitoring and optimizing the use of RDF in rotary cement kilns. By leveraging advanced imaging and data analysis techniques based on deep learning, it is possible to acquire and segment the most important parts in the flames of rotary cement kilns. To address the machine learning problem of detecting and classifying key elements in rotary kiln operation images using deep learning, some independent and dependent variables have been defined. The independent variables are the frames of the images captured from the kiln operation. Each frame serves as an input to our deep learning model, providing the visual data required for analysis. The dependent variables are the classes that the model predicts for each frame. These classes include Plume, Flame, and Clinker as defined in
Figure 3. The class Plume corresponds to the mix of RDF and fossil fuel when it enters the rotary kiln from the burner, prior to its combustion. The class Flame refers to the air and fuel mix that is in the combustion phase and therefore at a high temperature. Finally, the class Clinker corresponds to the raw material that exits the rotary kiln at the lower part of the image.
This work is the first step towards the development of control algorithms based on images and process data that allow operators to make the right decisions when using fuels like RDF with high variability in their calorific values.
3. Results and Discussion
To evaluate and select the final model for use in the monitoring tool in this work, two comparisons were conducted. The first comparison involves analyzing the overall performance of three different object segmentation models using the metrics previously described. The second comparison examines the effect of increasing the number of epochs used in training the same model.
3.1. Comparison between the Three Variants of YOLOv8
The object classes to be detected and segmented in this context include the categories of Flame, Clinker, and Plume. Performance evaluation is carried out using specific metrics such as
B(
P),
B(
R), or
B(
), obtained through the use of bounding boxes. Additionally, metrics such as
M(
P),
M(
R),
M(
), and
M(
) are considered, which evaluate instance segmentation using masks. In this analysis, particular focus is placed on metrics starting with
M, associated with the use of masks, used by YOLOv8, for instance, segmentation.
Table 3,
Table 4 and
Table 5 show the results of the experiments E1, E2, and E3 carried out with YOLOv8-small, YOLOv8-medium, and YOLOv8-large models.
It is observed that all models exhibit high precision and recall for each class, indicating remarkable performance in instance segmentation for the three categories. However, both E3 (
Table 5) and E2 (
Table 4) present superior values in
M(
) compared to E1 (
Table 3). Furthermore, the models achieve high values in
M(
) for all classes, suggesting an effective capability to distinguish instances when the overlap between detection and ground truth is at least 50%. Notably, the performance in the Flame class shows higher values compared to the other two classes. Despite this, a decrease in the values of
M(
) for all classes is evident as the
IoU threshold increases, indicating a loss of precision. This phenomenon can be attributed to the difficulties of the models in capturing details and edges of instances, especially in the more irregularly shaped Clinker and Plume classes.
Another essential requirement in this project is the ability to work in real time, which involves achieving an optimal inference time to obtain the maximum number of frames per second (FPS). Therefore, the inference time for the three models and the number of FPS they can achieve were measured. According to the results presented in
Table 6, both E2 and E1 achieve a higher FPS value compared to E3. In conclusion, after evaluating the two fundamental requirements of the project and analyzing the results, we opted to select the model trained in E2. This model not only offers higher precision but also provides elevated FPS times, thus meeting the established requirements.
3.2. Impact of Training Epochs on Model Performance
Following the selection of the model to be used, with the goal of improving precision, the same model was trained varying the number of epochs, seeking a balance between enhancing performance and avoiding issues such as overfitting.
In
Figure 12, the charts are divided into two sets: those starting with train exhibit metrics calculated with the training dataset, while those starting with val show metrics calculated with the validation dataset. On the left side of the image, there is a consistent decrease in losses throughout the epochs in relation to the training phase; these also show a decrease during the validation phase, suggesting that the model is achieving good generalization, avoiding overfitting to the training data. On the right side, it is highlighted that the training of the model progression is as expected, with a big growth in the early epochs and stabilization from epoch 40 onwards. Therefore, a new experiment (Experiment 4, or E4) has been decided upon with the same model and dataset but this time increasing the number of epochs to 50 since the gain in
mAP is practically insignificant beyond that point.
In
Table 7, it can be seen that E4 achieves a higher value of
M(
) compared to E2, indicating better model performance. This translates into an increase in average precision across all classes, with the Flame class again showing outstanding performance, while the Plume class presents greater challenges in segmentation.
In
Figure 13, the results of the model inference on various validation images are presented. In these representations, the assigned class is shown along with the corresponding confidence percentage, as well as the predicted segmentation mask.
3.3. Implementation of the System in Real Time
Once the model was selected, the final integration of all tools was carried out with the objective of obtaining the monitoring system. This system encompasses both the acquisition of images in real time and the trained model.
As illustrated in
Figure 14, the acquisition system is responsible for capturing images from inside the rotary kiln, which will serve as input to the selected model. The latter performs the inference of the images, generating a resulting image with the various segmented elements. Additionally, to assist the operator in decision-making regarding the control of the rotary kiln, a characterization of the different segmented elements is carried out. This characterization provides a quantitative value to the operator through the percentage of the area occupied by each instance within the image.
Following the implementation and startup of the system at the cement production facilities, it was confirmed that the inference of the model time is 23 ms, as previously indicated, without causing a delay in image acquisition. This ensures that the system operates at 25 FPS, thus meeting the two essential requirements of this project.
3.4. Adaptation to Different Boundary Conditions
Given the changing nature of the industry, it is crucial that the models are robust enough to adapt to modifications. In this context, during the cement production process some adjustments to the burner of the rotary kiln are made, causing changes in the boundary conditions. For this reason, it will be necessary to validate whether the previously trained model maintains its performance or, if not, to evaluate the need to make the appropriate adjustments to adapt to the new conditions. In this regard, the collection of a new dataset (dataset 2) under the mentioned updated conditions was planned. To carry out this process, the previously described steps will be followed, including a campaign to collect videos, subsequent cleaning, and finally, the corresponding labeling. This new dataset encompasses a total of 106 images. Of these, 75 have been assigned for the training process, 20 for validation, and 11 reserved for testing.
As detailed in
Section 3.2, the model initially trained exhibits very good performance metrics, with high scores in
precision,
recall, and
mAP. Specifically, in E4, an average of 98.6% for
and 71.8% for
is achieved, as shown in
Table 7. However, the results of the model validation against the new dataset 2, as evidenced in
Table 8, are not optimal in this case, with an average of 86.5% for
and 47.2% for
across all classes. A clear deterioration in model performance is observed with this variation.
Therefore, it has been determined that it is crucial to make adjustments to the model to enhance its robustness in the face of new boundary conditions. Two different strategies have been evaluated for this purpose. The first involves retraining the model using both the original dataset 1 and the dataset 2. The second strategy involves applying fine-tuning to the model using exclusively dataset 2.
Once the training of the new models with these respective strategies is completed, an evaluation of them will be conducted. For this, four additional experiments will be carried out. On one hand, Experiment 6 (E6) and Experiment 7 (E7), which implement the retraining strategy, evaluate the performance of the model on dataset 1 and dataset 2, respectively. On the other hand, Experiment 8 (E8) and Experiment 9 (E9), based on fine-tuning, will assess performance on dataset 1 and dataset 2, respectively.
The results obtained after the adjustment process are highly noteworthy. The models experience significant improvements in
precision,
recall, and
mAP. Notably, Experiment 9 (E9) achieves an average across all classes of 99.5% for
and 72.8% for
, as detailed in
Table 9. However, when evaluated with the dataset 1 (E8), it is observed that its performance is the lowest compared to the other experiments, which could be due to a possible overfitting of the model to the new dataset.
On the other hand, Experiment 7 (E7) demonstrates improved performance on the new dataset, in addition to maintaining virtually the same performance with the original dataset 1 as observed in Experiment 6 (E6), indicating a notable capacity for generalization. Ultimately, both strategies can be used successfully, and the choice between them will depend on specific requirements. That is, if the objective is to achieve a model that can generalize across different datasets, the preferred choice would be to perform retraining with all the data. On the other hand, if the goal is to tailor the model to specific boundary conditions, fine-tuning emerges as the most suitable option. Both strategies ensure that the model maintains its effectiveness, robustness, and adaptability, becoming a fundamental asset for the rotary kiln monitoring system. As final result,
Figure 15 presents an example of the final prediction of the model using dataset 2 compared with the ground truth obtained from the labeling of the dataset.
It is worth mentioning that optimizing hyperparameters such as learning rate, batch size, and network architecture can lead to significant improvements in model performance. For example, implementing learning rate schedules can reduce the learning rate as training progresses and can help achieve a better convergence. Moreover, experimenting with different batch sizes to find the optimal balance between memory usage and model performance and trying different optimizers like Adam or RMSprop can help to find the best fit for the specific application. Several post-processing techniques can refine the outputs of the YOLOv8 model to improve accuracy and reduce false positives: Non-Maximum Suppression (NMS) can eliminate multiple detections of the same object by keeping only the highest-confidence detection or setting a confidence threshold to filter out low-confidence detections, thereby reducing false positives.
Finally, future adaptations of the results for practical implementation may need to address some challenges associated with a small data sample, the need for data labeling, and the limitations of transfer learning. Techniques like data augmentation and synthetic data generation can expand the training dataset, improving model robustness, while utilizing semi-supervised and active learning techniques can reduce dependency on labeled data and maximize labeling efficiency.
4. Conclusions
Throughout this paper, the successful development and implementation of a monitoring system for rotary kilns in a real environment has been demonstrated using advanced computer vision and deep learning techniques. The operation of the developed system under real working conditions can be observed in
Supplementary Video S1. The implementation of this monitoring system not only can enhance the efficiency of the kiln operators but also enables comprehensive supervision and control of the cooking process. These improvements are expected to optimize fuel consumption and contribute to an increase in the quality of the final product while decreasing the consumption of fossil fuels and thus reducing pollutant emissions.
The selection of the YOLOv8 model has been supported by its demonstrated capacity to detect and segment instances of flame, clinker, and plume with high levels of precision in real time. This endorsement validates the suitability of the adopted approach and ensures the fulfillment of the project requirements. Additionally, two strategies have been proposed to adapt the model to changes in boundary conditions, substantially improving both its precision and segmentation capability.
Future developments can be pursued in two main directions: First, to enhance the precision of the model, it is suggested to explore the possibility of increasing the dataset size or balancing the classes to achieve more homogeneous results. In this context, training the model with a greater number of epochs could be considered, maintaining a cautious balance between improving precision and mitigating overfitting. Furthermore, to enhance the robustness of the models, it is necessary to evaluate them under different working conditions, different plants, and various compositions of RDF. This requires a deep interaction between expert operators and the model developers. This collaborative approach ensures that the models can adapt to real-world scenarios and handle the variability inherent in RDF compositions, leading to more reliable and efficient performance in practical applications.
Second, the extraction of features from each instance and the correlation of them with process data allow the construction of predictive models from the images to anticipate events in the rotary kiln or to develop automatic control of the process, relating variables like the quality of the product, RDF composition, or pollutant emissions. In this way, techniques like reinforcement learning can optimize continuous processes by learning from interactions with the environment, improving quality control, fuel optimization, and emissions reduction.