An Improved Lithium-Ion Battery Fire and Smoke Detection Method Based on the YOLOv8 Algorithm

Li Deng; Di Kang; Quanyi Liu

doi:10.3390/fire8060214

,

and

¹

College of Civil Aviation Safety Engineering, Civil Aviation Flight University of China, Guanghan 618307, China

²

Sichuan Provincial Key Laboratory of Civil Aircraft Fire Science and Safety Engineering, Guanghan 618307, China

^*

Author to whom correspondence should be addressed.

Fire2025, 8(6), 214;https://doi.org/10.3390/fire8060214

Version Notes

Order Reprints

Abstract

This paper introduces a novel algorithm—YOLOv8 (You Only Look Once version 8) + FRMHead (a multi-branch feature refinement head) + Slimneck (a lightweight bottleneck module), abbreviated as YFSNet—for lithium-ion battery fire and smoke detection in complex backgrounds. By integrating advanced modules for richer feature extraction and streamlined architecture, YFSNet significantly enhances detection precision and real-time performance. A dataset of 2300 high-quality images was constructed for training and validation, and experimental results demonstrate that YFSNet boosts detection precision from 95.6% in the traditional YOLOv8n model to 99.6%, while the inference speed shows a marked improvement with FPS increasing from 49.75 to 116.28. Although the recall rate experienced a slight drop from 97.7% to 93.1%, the overall performance in terms of F1-score and detection accuracy remains robust, underscoring the method’s practical value for reliable and efficient battery fire detection in fire safety systems.

Keywords:

lithium-ion battery; fire detection; YOLOv8; FRMHead; slimneck; fire prevention; real-time performance; accuracy

1. Research Background

Lithium batteries are popular due to their high energy density, long cycle life, and resistance to memory effects. They play a key role in modern energy storage. These batteries are widely used in consumer electronics like smartphones and laptops, where compact design and efficiency matter. They are also used in electric vehicles and energy storage systems that require high power and long-term reliability. The global lithium battery market has grown rapidly, from about $40 billion in 2015 to over $80 billion in 2020 []. This growth is driven by the increasing demand for green energy and new energy vehicles. Additionally, their small size and high performance make them ideal for drones, portable sensors, and other lightweight devices. These trends continue to push forward advancements in battery materials and electrochemistry, laying the foundation for future energy technologies [,].

As lithium batteries become more common, their benefits come with serious safety risks. Their high energy design can cause thermal runaway, which happens due to overcharging, internal short circuits, mechanical impacts, or extreme conditions like high temperatures and humidity. When thermal runaway occurs, rapid chemical reactions produce intense heat and flammable gases, leading to smoke, fire, battery swelling, explosions, and large fires. For example, in 2019, a major U.S. electric vehicle brand experienced a thermal runaway incident that caused significant damage, injuries, and economic losses []. Similar incidents with laptops and smartphones have raised concerns about safety and brand reputation []. Therefore, developing safety monitoring systems to detect early signs of failure is crucial to reduce accidents and protect lives and property.

Traditional systems using smoke and temperature sensors are slow and have high false alarms due to environmental noise, making early detection of lithium battery fires difficult.

Recent advances in deep learning have enabled intelligent fire monitoring through image processing and object detection. YOLO algorithms, for example, can quickly identify features such as smoke, flames, and abnormal heat from real-time video, accurately detecting and locating fire hazards. Li and Wang [] showed that a YOLO-based system can effectively identify early fire signals in complex environments, improving both detection speed and accuracy for lithium battery fire prevention.

2. Literature Review

2.1. Research Progress

Early vision-based fire detection relied on hand-crafted features. Bu and Gharajeh [] compared three classical methods on standard fire datasets: color-thresholding reached 85.2% detection rate but 32.7% false alarms; edge-detection yielded 78.5% accuracy and 28.5% false alarms; morphological processing delivered 82.4% accuracy with 25.6% false positives. However, all three dropped below 70% accuracy under low-light or complex backgrounds.

Deep convolutional networks overcame many of these limitations. Mozaffari et al. [] trained a CNN on flashover sequences and reported 91.4% precision and 89.8% recall—an improvement of ~12% over classical methods—in real-time forecasts of room fire spread. Feng and Sun [] introduced a multi-scale feature-fusion module that raised mAP on their fire-smoke dataset from 0.812 to 0.874 and improved detection of 8–16 pixel flames by 15%. Jeon et al. [] combined 20 hand-crafted texture features with deep features and achieved F1-scores of 0.85 in both indoor and outdoor tests, reducing false negatives by 18%.

Multimodal and industrial applications have driven further advances. Saponara et al. [] optimized a CNN for antifire surveillance in factories, delivering 92.1% accuracy at 25 FPS. Sousa et al. [] fused thermal-infrared and visible-light frames to reach 94.3% detection accuracy—and a 22% reduction in false alarms—under harsh conditions. Li et al. [] integrated data from temperature sensors and RGB cameras, boosting recall from 80.5% to 88.2% at 20 FPS. Nguyen et al. [] applied dynamic-background modeling to reduce false positives by 30% in video surveillance, achieving 90.8% accuracy. Jana and Shome [] showed multimodal fusion with IoT sensor arrays can improve recall by 6% and maintain 85 FPS on edge devices.

In object-detection research, the YOLO series has been particularly influential. Han et al. []’s YOLOv1 delivered 63.4% mAP on PASCAL VOC at 45 FPS, yet struggled with small targets. Bochkovskiy et al. []’s YOLOv4 combined CSPDarknet and PANet to achieve 43.5% COCO mAP at 65 FPS—nearly a 10% mAP gain over YOLOv3. Amin et al. []’s YOLOv8 introduced a lightweight head and multi-scale losses, reaching 50.8% COCO mAP at 75 FPS, and demonstrated robust detection under low light. Guo and Xu [] applied feature recalibration to improve small-object recall by 7%, while Zhang et al. []’s adaptive anchor-box strategy raised small-object AP from 45.2% to 61.3% on traffic-scenarios. Sahoo and Nanda []’s adaptive feature-fusion reduced false positives by 20% in crowded scenes. More recent improvements include Tao et al. [], whose attention-augmented receptive-field module increased mAP by 2.3%, and Cao et al. [], whose pyramid-network fusion achieved 1.1% mAP gain with 60 FPS on COCO.

Specifically for lithium-battery monitoring, Huang and Li [] found that BMS temperature and pressure-differential indicators could detect only 30% of thermal runaway precursors, with average warning delays of 5 s. Pu et al. [] fused infrared and RGB imagery with a deep network, achieving 85.6% detection accuracy and 30 FPS—surpassing traditional methods by 25.4%. Azzabi et al. []’s multimodal AI raised sensitivity to early smoke by 8%. Hu et al. []’s multi-sensor fusion system cut response time from 5 s to 0.8 s. Su et al. [] deployed a real-time detector on edge-computing hardware, obtaining 95.2% accuracy at 20 FPS. Titu et al. []’s hardware-accelerated, compressed model sustained 90.4% mAP at 30 FPS, and Murthy et al. [] summarize that legacy vision methods seldom exceed 85% accuracy or 10 FPS. Finally, Hu et al. [] explored data-driven anomaly detection to improve precision by 4% and recall by 3% in battery-fire early warning.

2.2. Issues and Research Motivation

Research shows that smoke and fire detection has improved worldwide. However, challenges remain for lithium-battery monitoring. The challenges are as follows:

1.: High false alarm and miss rates.

Traditional vision-based methods and current deep-learning models do not work well with small local targets, low-contrast scenes, and noisy backgrounds in lithium-battery monitoring. Conventional methods perform poorly in low-lighting conditions. Using only one sensor makes it hard to capture early weak signals from smoke and fire. This often leads to false alarms or missed detections.

2.: Insufficient real-time performance.

Some systems use complex computational structures to improve detection accuracy. These complex structures cause delays in processing. Even with edge computing, the system does not respond quickly enough for practical use.

3.: Inadequate feature extraction capabilities.

Current detection methods often fail to extract fine-grained features in complex environments. Basic object detection models do not detect small targets well. Although techniques like feature recalibration and joint attention mechanisms help, they add more complexity and computation overhead.

To solve these problems, this paper focuses on four research directions:

1.: Feature Extraction Optimization

Inspired by deep convolutional networks and multi-scale feature fusion techniques, this study builds a more efficient convolutional architecture. A multi-scale fusion strategy is used to capture low-contrast and small local targets at the source.

2.: Specialized Dataset Design

A high-quality training dataset will be built specifically for lithium-battery fire and smoke scenarios. Data augmentation strategies will expand the dataset size and diversity. These steps will improve model generalization and adaptability to complex real-world environments.

3.: Loss Function and Anchor Box Adjustment

The method builds on state-of-the-art object detection frameworks. It uses adaptive anchor box generation techniques with an optimized loss function. This is expected to improve detection accuracy for small targets and handle fine-grained details in complex backgrounds.

4.: Enhancement of Real-Time Performance

The system will use hardware acceleration, model compression, lightweight network design, and edge computing.

Efficient data augmentation strategies will also be integrated. These measures will improve the system’s real-time response in complex backgrounds. The goal is to keep detection accuracy high while reducing processing delays.

Through these improvements, this research aims to achieve high target detection accuracy and fast real-time processing. This will develop an efficient and precise early-warning system for lithium-battery fires. The improved system will enhance both safety and practicality for on-site monitoring. It may also be deployed on edge devices to support real-world applications.

3. Algorithm Research

3.1. Single-Model Approach

YOLOv8 is a mature version in the YOLO series. It works in real time and uses modern design methods. Figure 1 illustrates the network architecture of YOLOv8. Its Backbone uses the C2f module. This module improves on traditional residual structures. It helps with better information flow and feature reuse. It makes the network deeper and reduces computational complexity. The C2f module also captures the semantic information in an image. The Spatial Pyramid Pooling—Fast (SPPF) module uses multi-scale pooling. It fuses context from different scales without much extra cost. This gives a global view that helps object detection.

Figure 1. The overall network structure of YOLOv8.

YOLOv8’s neck follows a path feature pyramid (FPN/PAN) design, fusing low-level details with high-level semantics to help detect small objects. Its head is deliberately lightweight, delivering accurate classification and regression while enabling real-time inference. However, in cluttered backgrounds or low-resolution images, tiny smoke and flame targets can be mistaken for noise. Although C2f and SPPF improve feature extraction and context fusion, they introduce extra non-linear layers, making parameter tuning harder, increasing computation, and risking overfitting when data are scarce or noisy.

The FRMHead module adds multi-branch convolutions and cross-layer fusion to refine detection. It takes the neck’s multi-scale features, splits them into parallel branches with different kernel sizes, then uses 1 × 1 convolutions to reduce channels and reorganize information. Cross-layer skip connections merge deep semantics with shallow details, improving recall and precision on faint or low-contrast targets. Compared to the original YOLOv8, FRMHead captures subtle signals lost in noise, though it raises non-linear complexity and tuning/computation demands.

Slimneck sits between the backbone and head to compress and reorganize features. It uses bottlenecks built from depthwise-separable and grouped convolutions to shrink feature dimensions, then 1 × 1 convolutions and activations to expand key representations. This removes redundancy, lowers computation and latency, and highlights target signals against background noise. As a result, Slimneck boosts anti-interference and generalization for smoke-and-flame detection, at the cost of extra non-linear layers that require careful hyper-parameter tuning.

3.2. Model Fusion

To boost early warning of lithium-battery smoke and flame, we integrate two complementary modules—FRMHead and Slimneck—into the original YOLOv8, overcoming the limits of single-module designs. FRMHead enhances local feature extraction by first taking the Neck’s multi-scale outputs and splitting them into parallel branches with different convolution kernels, so that subtle details at varied scales are captured. It then applies 1 × 1 convolutions to reduce channel dimensions and reorganize information, amplifying weak smoke/flame cues while suppressing background noise. Finally, cross-layer skip connections merge shallow textures with deep semantics, improving both localization and classification accuracy. This multi-branch, cross-layer fusion, however, raises non-linear complexity and computational demands and may require careful tuning to avoid overfitting or convergence issues when data are scarce or noisy.

Slimneck, placed between the backbone and the detection head, streamlines information flow through a lightweight bottleneck structure. Depthwise-separable and grouped convolutions first compress feature maps; then 1 × 1 convolutions with non-linear activations expand and refine the key representations. This design removes redundancy, lowers computation and latency, and boosts the signal-to-noise ratio—critical for isolating local smoke and flame patterns in cluttered, low-contrast environments.

By fusing FRMHead’s fine-grained, multi-scale refinement with Slimneck’s efficient feature compression, the enhanced YOLOv8 delivers higher recall and precision on early-stage smoke and flame detection while keeping inference costs within practical limits.

In summary, combining YOLOv8 with FRMHead enhances local feature detection. Yet, its multi-branch and cross-layer fusion increases non-linear complexity. This can reduce the model’s generalization and real-time performance. On the other hand, YOLOv8 with Slimneck improves feature reorganization and noise suppression. Its lightweight design may risk losing fine details in low-resolution or blurred images.

Consequently, based on the analysis of the strengths and weaknesses of the aforementioned individual model fusion approaches, this paper proposes a composite fusion model—YOLOv8 + FRMHead + Slimneck, abbreviated as YFSNet—with the structural framework illustrated in Figure 2.

Figure 2. The overall network structure of YFSNet.

First, Slimneck uses depthwise-separable and grouped convolutions as a bottleneck, plus 1 × 1 convolutions and nonlinear activations, to compress and then reorganize the backbone’s intermediate features. This removes redundancy, suppresses background noise, cuts computation and latency, and speeds up feature flow.

Second, FRMHead introduces a bidirectional P3→P4→P5 reconstruction path in the head: GSConv downsamples and fuses features at each scale; parallel branches with different kernels extract multi-scale details; 1 × 1 convolutions reorder channels; and cross-layer skip connections merge shallow textures with deep semantics. This significantly improves localization and classification of low-contrast, weak signals.

Such a dual-improvement fusion design not only compensates for the deficiencies of individual modules in feature extraction and data refinement but also enhances the network’s overall robustness and real-time response capability in complex environments, such as those encountered in lithium-battery smoke and flame detection scenarios.

4. Experimental Preparation

4.1. Data Collection

To construct the dataset for the proposed research model, four publicly available videos from the Internet were selected (https://b23.tv/nRcKdEZ, https://b23.tv/X7YUzIH, https://b23.tv/ANtQIvF, https://b23.tv/uYkZMBb, accessed on 25 May 2025). These videos capture the entire process of lithium battery thermal runaway—from the normal state to smoke emission, ignition, explosion combustion, and finally, the residual heat after burning—covering the full spectrum of its development. Two videos come from laboratory experiments (280 Ah and 310 Ah batteries), one shows an explosion combustion test in the lab, and one captures a real-world thermal runaway ignition process. This multi-source, multi-scene dataset provides abundant authentic feature information by comprehensively reflecting the evolution of lithium battery thermal runaway under various conditions.

In order to ensure a highly balanced temporal distribution and comprehensive coverage of each key stage, a frame extraction tool was utilized to extract one frame every 3 s. This frame extraction strategy balances the continuity of the data with the diversity of the samples, avoiding excessive redundancy among continuous frames while capturing every detail of the state changes.

Using the timed frame extraction method, many images were initially collected. However, due to issues like ghosting, repetition, and blurring in the original videos, all images were strictly screened to remove redundant and low-quality ones. Each remaining image was manually annotated using LabelImg. After careful preprocessing, a final dataset of 2300 high-quality images was constructed. These images cover the complete process of a lithium battery—from a normal state through smoke emission, ignition, explosion combustion, to post-burning residual heat—while performing well under various lighting, angles, and background conditions, thus ensuring the dataset’s diversity and representativeness.

Figure 3 visualizes the dataset’s distribution in the training set. It shows that there are approximately 3000 smoke instances and 500 fire instances (As mentioned above, the acquired dataset consists of a total of 2300 high-quality images. However, during annotation, some images contain multiple smoke or flame targets. The numbers 3000 and 500 represent the total quantities of smoke and flame detection targets annotated across the entire dataset, respectively. There is no conflict between these figures), indicating a higher abundance of smoke samples. The bounding box size chart reveals that these boxes are mainly centralized, reflecting the average positions and sizes of the annotated objects. Similarly, the center point chart shows that most bounding boxes are clustered between 0.4 and 0.6 on both the x and y axes. The width–height ratio chart demonstrates a proportional relationship, with most boxes ranging from 0.2 to 0.6 and increasing together.

Figure 3. Data and visualization processing results. (a) Dataset sample quantity distribution, (b) Bounding box size distribution, (c) Target center point distribution, (d) Target aspect ratio distribution.

The rationality of the dataset selection in this study is reflected in the following aspects:

(1): Comprehensiveness: The dataset covers all key stages of the lithium battery thermal runaway process. From initial smoke emission to the end of the post-burning phase, the complete evolution of thermal runaway is recorded. This provides sufficient data support to distinguish subtle differences between the initial smoke, subsequent fire stages, and later combustion phases.
(2): Multi-scene: The dataset sources include not only controlled laboratory experiments but also real-world scenarios, thereby enhancing the model’s generalization ability and robustness during testing.
(3): Uniform temporal sampling: By extracting one frame every 3 s, the method effectively avoids redundancy among consecutive frames while ensuring balanced sampling of the state changes over time, preventing the model from becoming overly dependent on features from a specific moment during training.
(4): Strict screening: After initially extracting a large number of images, redundant, ghosting, and low-quality images were removed to finally obtain 2300 high-quality images. The images provide clean and effective training samples, and their rich content and reasonable distribution of features lay a solid data foundation for subsequent experiments.

4.2. Experimental Setup and Configuration

In this study, the dataset consists of a total of 2300 images. The selection of a training-to-validation ratio of approximately 85:15 (1932:338) in this study is motivated by the need to strike an appropriate balance between training effectiveness and model generalization. Allocating the majority of the dataset to training (about 85%) ensures that the model has access to sufficient diverse samples, which is critical for learning representative features and achieving robust performance in detecting lithium-ion battery smoke and flame.

At the same time, reserving around 15% of the data for validation provides an unbiased assessment of the model’s ability to generalize to unseen data. This split allows for effective monitoring of the model’s performance during training, enabling the detection of overfitting or underfitting issues and guiding hyperparameter tuning. Through this balanced division, the study aims to optimize the model’s accuracy and reliability, ensuring that it performs well not only on the training data but also in practical real-world scenarios where early detection of smoke and flame is critical.

The number of training steps for the model in this study was set to 500. Table 1 below shows the experimental environment configuration.

Table 1. Experimental environment configuration.

In the experiments, this study focused on data preprocessing, model initialization, optimization, and hyperparameter tuning to ensure effective lithium battery fire and smoke detection. The dataset split and training settings were based on data volume, sample diversity, and task difficulty. This approach ensured that the model fully utilized the data and allowed for a thorough evaluation of its generalization ability in real-world scenarios, crucial for verifying the YFSNet fusion model’s fine-grained detection under complex backgrounds.

4.3. Model Performance Evaluation Metrics

In target detection tasks, especially for firework detection, common evaluation metrics include Precision, Recall, Accuracy, F1-score, AP (Average Precision), and mAP (mean Average Precision). These metrics apply to binary classification, multi-class classification, and object detection tasks.

Precision is defined as the ratio of true positives among the samples predicted as positive (i.e., detected as fireworks) and reflects the reliability of the model’s predictions.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

TP (True Positive) is the count of correctly detected fireworks, while FP (False Positive) counts non-firework objects mislabeled as fireworks. High precision (TP/(TP + FP)) means fewer false alarms—crucial in fire detection.

Recall (TP/(actual positives)) measures the fraction of real fireworks the model successfully finds.

R e c a l l = \frac{T P}{T P + F N}

(2)

FN (False Negative) is the count of missed firework targets (i.e., fireworks that were not detected but actually exist). A higher recall value means fewer missed detections. In firework detection, low false negatives (high recall) are crucial due to the safety risks of missed targets.

Accuracy is the ratio of correct predictions made by the model out of all predictions.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(3)

TN is the count of correctly identified non-fireworks. In imbalanced tasks (fireworks are rare), overall accuracy can be misleading—predicting only negatives still gives high accuracy.

F1-score is the harmonic mean of Precision and Recall, measuring their balance.

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

F1-score is the harmonic mean of Precision and Recall—higher F1 means a better balance between false positives and false negatives.

AP (Average Precision) is the area under the Precision–Recall curve for a class; a higher AP shows the model consistently balances Precision and Recall across different confidence thresholds.

A P = \int_{0}^{1} P (R) d R

(5)

P(R) denotes the function of Precision and Recall. A higher AP indicates better detection performance, and in firework detection, AP is calculated for each target individually.

mAP (mean Average Precision) is the average of AP values for each class, summarizing overall performance in multi-class detection tasks.

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(6)

N

is the total number of classes, and

{A P}_{i}

is the AP value for the

i

-th class. mAP, the average of AP values across classes, is a common metric. In single-class firework detection, mAP is equivalent to AP.

FPS is the number of images processed per second.

F P S = \frac{1}{Reasoning Time (Seconds)}

(7)

FNR, or False Negative Rate, represents the proportion of positive instances that the model fails to identify. This metric is crucial in evaluating the model’s recall performance, as a higher FNR indicates that more positive samples are being missed.

F N R = 1 - R e c a l l

(8)

In fire detection, minimizing the False Negative Rate (FNR) is usually prioritized over the False Positive Rate (FPR). FNR reflects the model’s ability to detect targets like smoke or fire, focusing on missed detections rather than overall accuracy.

GFLOPs (Giga Floating Point Operations) measures the computational load during model inference, expressed in billions of floating-point operations per second. Lower GFLOPs help meet real-time detection needs, especially for embedded systems or drones. Higher GFLOPs can cause delays, reducing efficiency, particularly in large-scale monitoring.

5. Experimental Verification

This section presents both ablation and comparative experiments to validate the proposed algorithm’s overall performance and its suitability for early warning in lithium battery fire and smoke detection. Ablation experiments will assess the contribution of each component or improvement strategy, while comparative experiments will benchmark the algorithm against existing methods in terms of detection accuracy, robustness, and real-time performance.

This experimental design deepens the understanding of the algorithm’s internal mechanisms and demonstrates its effectiveness in real-world early warning scenarios.

5.1. Ablation Experiment

5.1.1. YOLOv8 + FRMHead Model vs. YOLOv8 + Slimneck Model

To verify the effectiveness of the algorithm fusion stage regarding the performance assumptions for the YOLOv8 + FRMHead and YOLOv8 + Slimneck models, ablation experiments were conducted on these models. Key performance indicators were recorded and organized. The evaluation metrics employed include Precision (Pr), Recall (R), mean Average Precision (mAP), Frames Per Second (FPS), and GFLOPs, with the experimental results presented in Table 2.

Table 2. Model ablation results.

Below is a simplified version that preserves all the content and data:

Experimental data show that the two fusion variants differ in Precision, Recall, Computational complexity, and Real-time performance, which clearly guides downstream model selection.

Regarding local-feature refinement versus overall detection, our pre-studies predicted that FRMHead would excel at fine-grained, local feature extraction. Indeed, adding FRMHead to YOLOv8 raises the precision from 0.956 (YOLOv8n) to 0.978, confirming its ability to refine subtle details. However, its recall drops from 0.977 (YOLOv8n) to 0.920, suggesting that the added complexity sometimes hinders the capture of all target appearances. In contrast, the YOLOv8 + Slimneck fusion achieves a very high precision of 0.989 and a solid recall of 0.936, demonstrating that its feature reorganization and noise-suppression strategy reduces both false positives and negatives, thus offering more robust detection for lithium-battery fire/smoke warnings.

In terms of computational complexity versus real-time performance, the FRMHead variant is considerably heavier, demanding 76.3 GFLOPs and achieving only 49.75 FPS compared to 8.2 GFLOPs and 91.74 FPS for YOLOv8n. On the other hand, Slimneck runs at just 7.3 GFLOPs while sustaining 92.59 FPS—slightly above the baseline—which confirms that its design effectively limits computational load without sacrificing speed.

Looking at overall performance consistency, YOLOv8 + FRMHead records an F1-score of 78.63. While this improvement in precision is partly offset by the recall drop, it still highlights the module’s strength for fine-detail detection despite its increased complexity and lower throughput. YOLOv8 + Slimneck, with an F1-score essentially on par with YOLOv8n (around 7.50), delivers competitive accuracy along with low GFLOPs and high FPS, aligning well with the industrial need for fast, reliable early-warning systems.

Furthermore, the higher precision of the model indicates it is more effective at filtering out negatives, thus reducing false positives. The significant improvement in recall shows it can more comprehensively detect targets, which helps in identifying small or complex targets and reducing false negatives. The increased mPA means the model achieves better segmentation and pixel-level prediction detail. However, these benefits come at the cost of a roughly 70% drop in FPS and an increase in GFLOPs by about 9.7 times, which raises hardware demands and deployment costs—a concern for scenarios with strict real-time constraints.

In summary, while the FRMHead module greatly enhances fine-detail precision, it brings higher computational demands and reduced recall and throughput. Conversely, Slimneck strikes a well-balanced trade-off, maintaining high accuracy with low computational overhead and excellent real-time performance—ideal for lithium-battery smoke/fire warning systems. Similarly, the enhanced model improves detection metrics considerably but at the expense of processing speed and computational efficiency, highlighting the need to balance accuracy improvements with real-time operation requirements.

5.1.2. YFSNet Model

Based on the analysis above, our fused model (“YFSNet”) is designed to combine the strengths of both modules: it leverages FRMHead’s high-precision in detail extraction to capture richer local feature information, while Slimneck performs data refinement and redundancy reduction to improve information flow, thereby enhancing overall detection robustness and real-time performance. A comparison of key ablation-experiment metrics for all models is shown in Figure 4.

Figure 4. Key ablation experiment metrics comparison chart. (a) Pr, R, mPA50, mPA50-95-model, (b) Post per image-model, (c) F1-score-model, (d) FPS-model.

After analyzing the experimental data, the results are largely consistent with our expectations, as demonstrated in the following aspects:

1.: Precision and Detail Capture

YFSNet achieves a precision of 0.996, far exceeding all other models.

This confirms that FRMHead excels at fine-grained smoke/fire feature extraction—even in low-contrast, noisy scenes—validating the dual-improvement design.

2.: Recall and Information Balance

YFSNet has a recall of 0.931, lower than YOLOv8n’s 0.977, by design emphasizing precision over raw recall.

Individually, FRMHead scores 0.920 recall, and Slimneck scores 0.936, so the fused model strikes a middle ground, balancing detail extraction with coverage.

3.: Compute Complexity vs. Real-Time Speed

FRMHead alone: 76.3 GFLOPs, 49.75 FPS—too slow for many real-time needs.

Slimneck alone: 7.3 GFLOPs, 92.59 FPS—near baseline speed (YOLOv8n: 8.2 GFLOPs, 91.74 FPS).

YFSNet: 75.5 GFLOPs (≈FRMHead) but 116.28 FPS—a huge speed boost over FRMHead, showing Slimneck’s integration optimizes computation flow without losing detail.

4.: Overall Consistency (F1-Score)

FRMHead: F1 = 78.63

YFSNet: F1 = 78.05

Both deliver similar discriminative power, but YFSNet uniquely combines ultra-high precision (0.996) with superior real-time performance (116.28 FPS).

Key Takeaways:

FRMHead sharpens local-detail learning, boosting precision.

Slimneck suppresses redundancy and maintains speed.

Their fusion organically unites high precision and high FPS, yielding a robust, efficient system for real-time lithium-battery smoke/fire detection. These findings validate our design goals and provide a solid foundation for balancing accuracy with responsiveness in practical deployments.

5.2. Comparison Experiment

To further validate the advantages of the new model’s performance, this study compares the YFSNet model with other classical detection algorithms under identical experimental conditions. The specific evaluation metric comparison results are presented in Table 3, and corresponding trend charts—illustrating how these evaluation metrics vary with increasing training epochs—are shown in Figure 5.

Table 3. Comparison table of evaluation metrics for comparative experiments.

Figure 5. Trend variation comparison chart of evaluation metrics in comparative experiments. (a) precision–epoch, (b) recall–epoch, (c) mPA50–epoch, (d) mPA50_95–epoch, (e) train box_loss–epoch, (f) train clc_loss–epoch, (g) train dfl_loss–epoch, (h) val box_loss–epoch, (i) val clc_loss–epoch, (j) val dfl_loss–epoch.

Based on the charts and tables, the comprehensive advantages of YFSNet over other models can be summarized as follows:

1.: Precision (Pr):

YFSNet achieves a precision of 0.996, which is about 5.3%, 4.2%, and 3.8% higher than UnirepLKNet (0.945), YOLOv8n (0.956), and Unfog (0.960), respectively. This high precision means the algorithm greatly reduces false detections, ensuring very high accuracy.

2.: Recall (R):

YFSNet records a recall of 0.931, slightly lower than YOLOv8n (0.977), Unfog (0.979), and ContextGuided (0.979). However, the design intentionally favors higher precision; this trade-off effectively suppresses false alarms, which is important for safety-critical industrial applications.

3.: mPA50 and mPA50-95:

For mPA50 (IoU ≥ 0.5), YFSNet reaches 0.978, nearly equal to YOLOv8n (0.980) and ContextGuided (0.981). For mPA50-95, YFSNet scores 0.871, a bit lower than YOLOv8n’s 0.906 but still among the leading values, confirming robustness under strict criteria.

4.: Computation Load (GFLOPs) and Post-processing Time:

YFSNet uses 75.5 GFLOPs, similar to FRMHead (76.3), indicating a controlled computational load. The post-processing time is 1.0 ms per image. Although this is slightly longer than FRMHead’s (0.7 ms) and Unfog’s (0.8 ms), overall latency remains low for real-time applications.

5.: F1-score:

YFSNet has an F1-score of 78.05, almost matching FRMHead’s 78.63 and far exceeding UnirepLKNet (16.84) and MobileNetV1 (16.07). Compared to MobileNetV1, YFSNet shows nearly 388% improvement, proving its significant accuracy and robustness.

6.: Inference Speed (FPS):

YFSNet operates at 116.28 FPS, a 133.6% improvement over FRMHead’s 49.75 FPS. Although ContextGuided reaches 222.22 FPS, its other metrics (precision, F1-score, and mPA50 values) are less competitive. YFSNet thus strikes an optimal balance between speed and precision, making it more practical for real-time video processing and large-scale applications.

7.

Comprehensive Analysis:

The improved model (YFSNet) achieves approximately 4–5% higher precision than traditional networks, significantly reducing false detections.
With a similar F1-score to high-level algorithms like FRMHead and a 133% increase in inference speed, it offers a potent combination of accuracy and efficiency.
Stable mPA50 metrics combined with moderate GFLOPs and minimal post-processing delay make it ideal for industrial applications where both high detection accuracy and operational efficiency are essential.

In summary, by leveraging modules such as YOLOv8, FRMHead, and Slimneck, the YFSNet shows significant advantages: it improves precision by about 5%, maintains a high F1-score, and delivers over a 133% boost in FPS. These features make it especially suited for critical applications like lithium-battery smoke/fire detection, fully demonstrating its advanced design and practical value.

5.3. Model Performance Comparison

Figure 6 shows the output images obtained from testing the dataset for smoke/fire detection using eight different algorithm models. In these images, the numbers indicate the confidence levels.

Figure 6. Output diagrams for each model’s results.

Below is the analysis of the eight models’ output images:

UnirepLKNet detects only a few smoke regions with low confidence scores (around 0.3 to 0.5), exhibiting weakness in identifying small or low-density smoke areas.

In contrast, YOLOv8n successfully identifies two smoke regions with high confidence (about 0.8), although it may produce occasional false positives in complex backgrounds.

Unfog detects multiple smoke regions at roughly 0.8 confidence, yet its overlapping bounding boxes indicate that the localization precision could be improved, potentially affecting discrimination accuracy.

Similarly, FRMHead yields detections for several regions with scores between 0.7 and 0.8—it performs stably for large smoke areas but still has room for enhancing fine-detail detection. Slimneck also records high-confidence detections (around 0.8), though many of its boxes are slightly misaligned, showing a bias toward larger smoke volumes and a diminished sensitivity to small or peripheral smoke.

Among these, ContextGuided stands out by detecting smoke regions at 0.9 confidence with very accurate box placements; thanks to its use of both background and contextual cues, it manages to capture detailed smoke features with high precision and stability.

MobileNetV1, while detecting multiple regions with strong confidence (approximately 0.8), struggles with precise localization when smoke density varies.

Notably, our model—YFSNet, combining V8, FRMHead, and Slimneck—achieves detection of multiple smoke spots (with confidence levels around 0.3–0.4) while delivering highly accurate bounding boxes. This integrated module effectively enhances the detection of subtle, low-density smoke, thereby reducing missed detections and overall improving performance. Although confidence scores are slightly lower, localization accuracy in complex backgrounds is markedly improved, offering a more reliable solution for lithium-battery fire monitoring.

Overall, the YFSNet leverages YOLOv8’s richer feature extraction, FRMHead’s optimized detection, and Slimneck’s streamlined architecture in one innovative smoke-detection ensemble, offering distinctive advantages in detecting small-scale and low-density smoke with superior bounding-box accuracy compared to other models, consequently fulfilling the stringent demands of high-precision, real-time industrial safety monitoring.

6. Conclusions

6.1. Summary

This study introduces an accuracy-enhancement scheme for lithium-battery smoke and flame recognition based on the YOLO algorithm. It targets improved fire-warning capabilities under challenging conditions like complex backgrounds, low contrast, and small, localized targets. By integrating FRMHead and Slimneck modules into YOLOv8, we created a novel fusion model—YFSNet—that delivers high speed and significantly better detection accuracy and robustness.

Experimental results show that YFSNet outperforms all compared models in precision, recall, and inference speed. With a precision of 0.996, YFSNet markedly improves upon YOLOv8n and other classical detectors. Although its recall is slightly lower, it still achieves a good balance, demonstrating the effectiveness of fine-grained feature extraction and information refinement. Additionally, YFSNet exhibits a substantial boost in inference speed, meeting the real-time demands of lithium-battery fire detection.

Thanks to the dual optimizations provided by FRMHead and Slimneck, YFSNet accurately captures faint smoke and flame signals in cluttered or low-contrast environments. This reduces both false negatives and false positives, offering an efficient, reliable solution for industrial fire-monitoring applications.

6.2. Improvement Directions

Despite significant improvements, the YFSNet model still faces challenges and areas for future enhancement:

1.: Extreme Conditions:

Even with FRMHead and Slimneck, the model may misidentify signals under very low illumination or high-noise conditions.

Future work may benefit from multimodal data fusion—for example, combining infrared and visible-light imagery—to improve robustness.

2.: Data Diversity:

The current training relies on a single dataset with multiple lithium-battery scenarios but limited diversity.

Expanding data collection and using advanced data cleaning and preprocessing to reduce label noise will improve generalization.

3.: Scalability and Deployment:

While real-time performance is enhanced, deploying at large scale or across many devices requires further computational efficiency.

Future work could integrate hardware acceleration, edge-computing, and model-compression techniques to reduce inference overhead.

4.: Broader Applications:

With the growth of lithium-battery usage in electric vehicles, drones, and smart homes, fire-warning systems must evolve.

Adapting and extending the YFSNet model to various domains can lead to universally applicable, highly adaptive fire-alert solutions.

In summary, by enhancing YOLOv8 with FRMHead and Slimneck, this work presents a faster and more precise method for lithium-battery smoke and flame detection. Ongoing advances in deep-learning architectures and data acquisition promise further improvements in accuracy and responsiveness, strengthening the technological foundation for safe battery use and effective fire prevention.

Author Contributions

Conceptualization, methodology, validation, investigation, data curation, L.D.; Software, formal analysis, data curation, writing—original draft preparation, writing—review and editing, visualization, D.K.; Funding acquisition, supervision, project administration Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China under the Joint Fund of Civil Aviation Research (grant number U2033206), the Key Laboratory Project of Sichuan Province (grant number MZ2022JB01), the Fundamental Research Funds for the Central Universities (grant number 25CAFUC01007), and the Aviation Science Fund (grant number ASFC-20200046117001). The APC was funded by Quanyi Liu.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Acknowledgments

During the preparation of this manuscript, the authors employed ChatGPT (version: o3-mini) to refine the language expression and sentence structures, thereby enhancing the overall fluency of the paper. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bajolle, H.; Lagadic, M.; Louvet, N. The future of lithium-ion batteries: Exploring expert conceptions, market trends, and price scenarios. Energy Res. Soc. Sci. 2022, 93, 102850. [Google Scholar] [CrossRef]
Tarascon, J.M.; Armand, M. Issues and challenges facing rechargeable lithium batteries. Nature 2001, 414, 359–367. [Google Scholar] [CrossRef]
Dunn, B.; Kamath, H.; Tarascon, J.M. Electrical energy storage for the grid: A battery of choices. Science 2011, 334, 928–935. [Google Scholar] [CrossRef] [PubMed]
Zalosh, R.; Gandhi, P.; Barowy, A. Lithium-ion energy storage battery explosion incidents. J. Loss Prev. Process Ind. 2021, 72, 104560. [Google Scholar] [CrossRef]
Wang, Z.; Huang, G.; Chen, Z.; An, C. Accidents involving lithium-ion batteries in non-application stages: Incident characteristics, environmental impacts, and response strategies. BMC Chem. 2025, 19, 94. [Google Scholar] [CrossRef]
Sathishkumar, V.E.; Cho, J.; Subramanian, M.; Naren, O.S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol. 2023, 19, 9. [Google Scholar] [CrossRef]
Bu, F.; Gharajeh, M.S. Intelligent and vision-based fire detection systems: A survey. Image Vis. Comput. 2019, 91, 103803. [Google Scholar] [CrossRef]
Mozaffari, M.H.; Li, Y.; Ko, Y. Real-time detection and forecast of flashovers by the visual room fire features using deep convolutional neural networks. J. Build. Eng. 2023, 64, 105674. [Google Scholar] [CrossRef]
Feng, J.; Sun, Y. Multiscale network based on feature fusion for fire disaster detection in complex scenes. Expert Syst. Appl. 2024, 240, 122494. [Google Scholar] [CrossRef]
Jeon, M.; Choi, H.S.; Lee, J.; Kang, M. Multi-scale prediction for fire detection using convolutional neural network. Fire Technol. 2021, 57, 2533–2551. [Google Scholar] [CrossRef]
Saponara, S.; Elhanashi, A.; Gagliardi, A. Real-time video fire/smoke detection based on CNN in antifire surveillance systems. J. Real-Time Image Process. 2021, 18, 889–900. [Google Scholar] [CrossRef]
Sousa, M.J.; Moutinho, A.; Almeida, M. Thermal infrared sensing for near real-time data-driven fire detection and monitoring systems. Sensors 2020, 20, 6803. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Ye, J.; Wang, C.; Ge, C.; Yu, Y.; Zhang, Q. A fire source localization algorithm based on temperature and smoke sensor data fusion. Fire Technol. 2023, 59, 663–690. [Google Scholar] [CrossRef]
Nguyen, V.T.; Quach, C.H.; Pham, M.T. Video smoke detection for surveillance cameras based on deep learning in indoor environment. In Proceedings of the 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom), Hanoi, Vietnam, 28–29 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 82–86. [Google Scholar]
Jana, S.; Shome, S.K. Hybrid ensemble based machine learning for smart building fire detection using multi modal sensor data. Fire Technol. 2023, 59, 473–496. [Google Scholar] [CrossRef]
Han, X.; Chang, J.; Wang, K. You only look once: Unified, real-time object detection. Procedia Comput. Sci. 2021, 183, 61–72. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Amin, A.; Mumtaz, R.; Bashir, M.J.; Zaidi, S.M.H. Next-generation license plate detection and recognition system using yolov8. In Proceedings of the 2023 IEEE 20th International Conference on Smart Communities: Improving Quality of Life using AI, Robotics and IoT (HONET), Boca Raton, FL, USA, 4–6 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 179–184. [Google Scholar]
Guo, T.; Xu, X. Salient object detection from low contrast images based on local contrast enhancing and non-local feature learning. Vis. Comput. 2021, 37, 2069–2081. [Google Scholar] [CrossRef]
Zhang, S.; Sun, Y.; Su, J.; Gan, G.; Wen, Z. Adaptive Training Strategies for Small Object Detection Using Anchor-Based Detectors. In International Conference on Artificial Neural Networks; Springer Nature: Cham, Switzerland, 2023; pp. 28–39. [Google Scholar]
Sahoo, S.; Nanda, P.K. Adaptive feature fusion and spatio-temporal background modeling in KDE framework for object detection and shadow removal. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1103–1118. [Google Scholar] [CrossRef]
Tao, H.; Zheng, Y.; Wang, Y.; Qiu, J.; Stojanovic, V. Enhanced feature extraction YOLO industrial small object detection algorithm based on receptive-field attention and multi-scale features. Meas. Sci. Technol. 2024, 35, 105023. [Google Scholar] [CrossRef]
Cao, J.; Chen, Q.; Guo, J.; Shi, R. Attention-guided context feature pyramid network for object detection. arXiv 2020, arXiv:2005.11475. [Google Scholar]
Huang, Y.; Li, J. Key Challenges for grid-scale lithium-ion battery energy storage. Adv. Energy Mater. 2022, 12, 2202197. [Google Scholar] [CrossRef]
Pu, Z.; Yang, M.; Jiao, M.; Zhao, D.; Huo, Y.; Wang, Z. Thermal Runaway Warning of Lithium Battery Based on Electronic Nose and Machine Learning Algorithms. Batteries 2024, 10, 390. [Google Scholar] [CrossRef]
Azzabi, T.; Jeridi, M.H.; Mejri, I.; Ezzedine, T. Multi-Modal AI for Enhanced Forest Fire Early Detection: Scalar and Image Fusion. In Proceedings of the 2024 IEEE International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, 27–29 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Hu, D.; Huang, S.; Wen, Z.; Gu, X.; Lu, J. A review on thermal runaway warning technology for lithium-ion batteries. Renew. Sustain. Energy Rev. 2024, 206, 114882. [Google Scholar] [CrossRef]
Su, L.; Lee, Y.H.; Chen, Y.L.; Tseng, H.W.; Yang, C.F. Using Edge Computing Technology in Programmable Logic Controller to Realize the Intelligent System of Industrial Safety and Fire Protection. Sens. Mater. 2023, 35, 1731–1740. [Google Scholar] [CrossRef]
Titu, M.F.S.; Pavel, M.A.; Michael, G.K.O.; Babar, H.; Aman, U.; Khan, R. Real-Time Fire Detection: Integrating Lightweight Deep Learning Models on Drones with Edge Computing. Drones 2024, 8, 483. [Google Scholar] [CrossRef]
Murthy, C.B.; Hashmi, M.F.; Bokde, N.D.; Geem, Z.W. Investigations of object detection in images/videos using various deep learning techniques and embedded platforms—A comprehensive review. Appl. Sci. 2020, 10, 3280. [Google Scholar] [CrossRef]
Hu, Z.; Chen, W.; Wang, H.; Tian, P.; Shen, D. Integrated data-driven framework for anomaly detection and early warning in water distribution system. J. Clean. Prod. 2022, 373, 133977. [Google Scholar] [CrossRef]

Figure 1. The overall network structure of YOLOv8.

Figure 2. The overall network structure of YFSNet.

Figure 3. Data and visualization processing results. (a) Dataset sample quantity distribution, (b) Bounding box size distribution, (c) Target center point distribution, (d) Target aspect ratio distribution.

Figure 4. Key ablation experiment metrics comparison chart. (a) Pr, R, mPA50, mPA50-95-model, (b) Post per image-model, (c) F1-score-model, (d) FPS-model.

Figure 5. Trend variation comparison chart of evaluation metrics in comparative experiments. (a) precision–epoch, (b) recall–epoch, (c) mPA50–epoch, (d) mPA50_95–epoch, (e) train box_loss–epoch, (f) train clc_loss–epoch, (g) train dfl_loss–epoch, (h) val box_loss–epoch, (i) val clc_loss–epoch, (j) val dfl_loss–epoch.

Figure 6. Output diagrams for each model’s results.

Table 1. Experimental environment configuration.

Equipment	Parameters
Processor	13th Gen Intel(R) Core(TM) i7-13620H 2.40 GHz
RAM	16 GB
Operating system	Windows11
GPU	NVIDIA GeForce RTX 4060
Programming tools	PyCharm
Programming languages	Python

Table 2. Model ablation results.

Model	Performance Metrics
Model	Pr	R	mPA50	mPA50-95	GFLOPs	PPI	F1-Score	FPS
Yolov8n	0.956	0.977	0.98	0.906	8.2	1.6	8.1109	91.7431
V8 + FRMHead	0.978	0.92	0.976	0.88	76.3	0.7	78.6316	49.7512
V8 + Slimneck	0.989	0.936	0.977	0.868	7.3	1.5	7.5009	92.5926

Table 3. Comparison table of evaluation metrics for comparative experiments.

Model	Performance Metrics
Model	Pr	R	mPA50	mPA50-95	GFLOPs	PPI	F1-Score	FPS
UnirepLKNet	0.945	0.896	0.942	0.799	16.4	1.9	16.8365	44.8430
YOLOv8n	0.956	0.977	0.98	0.906	8.2	1.6	8.1109	91.7431
Unfog	0.96	0.979	0.979	0.859	9.6	0.8	9.5059	123.4568
FRMHead	0.978	0.92	0.976	0.88	76.3	0.7	78.6316	49.7512
Slimneck	0.989	0.936	0.977	0.868	7.3	1.5	7.5009	92.5926
ContextGuided	0.99	0.979	0.981	0.877	7.7	2.4	7.7430	222.2222
MobileNetV1	0.993	0.947	0.979	0.896	15.7	2.3	16.0723	34.1297
YFSNet	0.996	0.931	0.978	0.871	75.5	1.0	78.0467	116.2791

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

An Improved Lithium-Ion Battery Fire and Smoke Detection Method Based on the YOLOv8 Algorithm

Abstract

1. Research Background

2. Literature Review

2.1. Research Progress

2.2. Issues and Research Motivation

3. Algorithm Research

3.1. Single-Model Approach

3.2. Model Fusion

4. Experimental Preparation

4.1. Data Collection

4.2. Experimental Setup and Configuration

4.3. Model Performance Evaluation Metrics

5. Experimental Verification

5.1. Ablation Experiment

5.1.1. YOLOv8 + FRMHead Model vs. YOLOv8 + Slimneck Model

5.1.2. YFSNet Model

5.2. Comparison Experiment

5.3. Model Performance Comparison

6. Conclusions

6.1. Summary

6.2. Improvement Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics