LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection

Huang, Yijie; Ouyang, Huimin; Miao, Xiaodong

doi:10.3390/app15073961

Open AccessArticle

LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection

by

Yijie Huang

¹

,

Huimin Ouyang

^1,*

and

Xiaodong Miao

²

¹

The School of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211816, China

²

The School of Mechanical and Power Engineering, Nanjing Tech University, Nanjing 211816, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3961; https://doi.org/10.3390/app15073961

Submission received: 16 February 2025 / Revised: 31 March 2025 / Accepted: 2 April 2025 / Published: 3 April 2025

Download

Browse Figures

Versions Notes

Abstract

Cigarette detection is a crucial component of public safety management. However, detecting such small objects poses significant challenges due to their size and limited feature points. To enhance the accuracy of small target detection, we propose a novel small object detection model, LSOD-YOLOv8 (Lightweight Small Object Detection using YOLOv8). First, we introduce a lightweight adaptive weight downsampling module in the backbone layer of YOLOv8 (You Only Look Once version 8), which not only mitigates information loss caused by conventional convolutions but also reduces the overall parameter count of the model. Next, we incorporate a P2 layer (Pyramid Pooling Layer 2) in the neck of YOLOv8, blending the concepts of shared convolutional information and independent batch normalization to design a P2-LSCSBD (P2 Layer-Lightweight Shared Convolutional and Batch Normalization-based Small Object Detection) detection head. Finally, we propose a new loss function, WIMIoU (Weighted Intersection over Union with Inner, Multi-scale, and Proposal-aware Optimization), by combining the ideas of WiseIoU (Wise Intersection over Union), InnerIoU (Inner Intersection over Union), and MPDIoU (Mean Pairwise Distance Intersection over Union), resulting in a significant accuracy improvement without any loss in performance. Our experiments demonstrate that LSOD-YOLOv8 enhances detection accuracy for cigarette detection specifically.

Keywords:

lightweight small object detection; pyramid pooling layer 2; shared convolutional; batch normalization; loss function; detection accuracy

1. Introduction

Target detection is a crucial aspect of computer vision, focusing on identifying the location and classification of specific objects within videos or images. With the continuous advancements in computer hardware and various intelligent algorithms, target detection has found widespread applications in fields such as smart factories [1], facial recognition [2], agricultural monitoring [3], and medical analysis [4].

As urbanization accelerates, uncivilized smoking behavior remains prevalent in public spaces, posing significant risks to public health and safety. This highlights the need for effective cigarette detection as a foundation for in-depth research on small target detection technologies. Traditional manual inspection methods are often inadequate due to limitations in manpower and resources, making smoking detection technology essential for managing public areas.

Currently, smoking-detection technologies can be categorized into four main types: smoke-feature-based detection [5], wearable-device-based detection [6], sensor-network-based detection [7], and computer vision detection [8]. Smoke-feature-based detection and sensor network methods rely on traditional sensors but face challenges such as environmental interference and false detections. Wearable devices assess smoking behavior by recognizing hand movements; however, they tend to be costly and inconvenient to use. In contrast, computer vision techniques offer greater potential by directly identifying cigarettes through image processing.

The evolution of the YOLO series [9,10,11,12,13,14,15,16,17,18] has significantly advanced object detection capabilities. YOLOv8, as one of the more refined versions, has improved detection accuracy and real-time performance through its lightweight design and optimized training strategies. Wang et al. [19] proposed an enhanced YOLOv8 neural network model based on deep learning, which incorporates an improved ShapeIoU (Shape Intersection over Union) and receptive field attention convolution for the efficient and precise detection of foreign objects in Pu’er sun-dried green tea. Guo et al. [20] proposed a YOLO-SGF lightweight network for detecting complex infrared images. This model employs GSConv (Global Spatial Convolution) and convolution to process deep and shallow features, respectively, while combining ShuffleNetV2-block1 with C2f for infrared feature extraction. Additionally, FIMPDIoU (Focal Mean Pairwise Distance Intersection over Union) is used to focus on objects that are easily overlooked in infrared imagery. Niu et al. [21] developed a YOLOv8-ECFS lightweight algorithm for weed species detection in soybean fields. This model integrates a coordinate attention module and EfficientNet (Efficient Convolutional Neural Network) and optimizes the regression accuracy of bounding boxes using Focal_SIoU, ensuring precise identification of various weed species. Yang et al. [22] introduced the LS-YOLOv8s model for detecting and grading the maturity of strawberries. This model, based on the YOLOv8s algorithm, incorporates the Lightweight Swin Transformer module and two new random variables to control the effects of data augmentation. Wang et al. [23] introduced an algorithm called CoT-YOLOv8 to enhance small target detection in aerial images. This model integrates the Convolutional Block Attention Module (CBAM) and the dynamic convolution module, utilizing a context transformer to assist in target detection. Cui et al. [24] expanded on YOLOv8 by incorporating the Attention Module convolution and the Context Information Enhancement Module (CIEM), achieving impressive results in detecting foreign objects on railway tracks. Zhu et al. [25] developed the CDD-YOLOv8 algorithm for identifying defects on cigarette packaging, which features a small target detection head and the convolutional attention module CBAM, significantly improving the model’s capability to locate and identify small defects while minimizing both missed and false detections. However, cigarette detection still poses several challenges, such as interference in complex environments, the precision of small object detection, insufficient datasets, and hardware limitations.

Chen et al. [26] addressed the overlooked challenge of small object detection by creating a specialized benchmark dataset and enhancing the R-CNN algorithm with a context model and small region proposal generator, achieving a 29.8% improvement in mean average precision over the original R-CNN. Cao et al. [27] enhanced Faster R-CNN for small object detection by introducing an improved IoU-based loss function, bilinear-interpolated RoI pooling, multi-scale feature fusion, and refined NMS, achieving 90% recall and 87% accuracy on low-resolution traffic signs, significantly outperforming the original model. Lim et al. [28] proposed a context-aware object detection method with multi-scale feature fusion and attention mechanisms to address the challenges of detecting low-resolution small objects, achieving 78.1% mAP on the PASCAL VOC2007 dataset, outperforming traditional SSD models. Benjumea et al. [29] enhanced YOLOv5 through structural modifications to develop YOLO-Z models, achieving a 6.9% mAP improvement for small object detection at 50% IoU with only a 3 ms increase in inference time, demonstrating optimized performance for autonomous racing applications and informing future detector adaptations in autonomous systems. Liu et al. [30] proposed DNTR, a novel framework combining DeNoising Feature Pyramid Network (DN-FPN) with Transformer-based Trans R-CNN for tiny object detection in geoscience, employing contrastive learning for feature denoising and self-attention mechanisms, achieving 17.4% AP_vt improvement on AI-TOD and 9.6% AP gain on VisDrone datasets compared to baselines.

In this paper, we propose a lightweight small object detection algorithm based on an improved version of the YOLOv8 algorithm. Our method enhances the YOLOv8 architecture by introducing a self-created light-weighting module and a new loss function, which emphasizes model efficiency while effectively capturing information from small targets. We refer to our improved algorithm as LSOD-YOLOv8. Specifically, we implement a lightweight adaptive weight downsampling module inspired by the Focus network, which is applied to the backbone network. This module compensates for information loss that occurs during conventional convolution and downsampling operations, facilitating a lightweight architecture. We call this downsampling module LAWDS. Additionally, we introduce the concepts of the P2 layer and shared convolution in the detection head of YOLOv8, leading to the development of a lightweight small-target detection head, referred to as P2-LSCSBD. Furthermore, we combine the ideas of WiseIoU, InnerIoU, and MPDIoU to create a novel loss function, which we designate as WIMIoU. This new loss function enhances model accuracy significantly.

We evaluate our method on a homemade cigarette detection dataset, comparing it with the baseline YOLOv8 target detection model and conducting ablation experiments. Notably, LSOD-YOLOv8 demonstrates a 2.8-point improvement in mAP50 metrics over the original YOLOv8 while achieving a 20% reduction in the number of parameters. Our proposed method exhibits excellent performance in terms of both accuracy and lightweight design, highlighting its effectiveness for smoking detection. The contributions of this paper are as follows:

(1) The proposed model incorporates a lightweight adaptive weight downsampling module that addresses the issue of information loss commonly seen in conventional convolution and downsampling operations while simultaneously achieving a more lightweight architecture.

(2) The proposed model introduces a lightweight small-target detection head that allows the model to fuse multi-scale information while significantly reducing the number of parameters through shared convolutional information and independent batch normalization operations, thereby enhancing overall accuracy.

(3) The model introduces a novel loss function that substantially improves detection accuracy, specifically tailored to enhance the effectiveness of smoking detection.

2. Theory and Methods

In this paper, we propose a lightweight small target detection algorithm based on an enhanced version of the YOLOv8 algorithm. Our approach improves the YOLOv8 architecture by introducing a custom lightweight module, a new loss function, and the integration of multi-scale information. This design emphasizes model efficiency while effectively capturing data from small targets, ultimately leading to enhanced detection accuracy. We refer to our method as LSOD-YOLOv8.

2.1. Baseline

The YOLOv8 algorithm, developed by Ultralytics, represents a mature advancement within the YOLO series, significantly optimizing its predecessors to enhance detection accuracy, speed, and usability. In this paper, we use the YOLOv8 model as our baseline and introduce several improvements and innovations that ultimately enhance its overall performance. As illustrated in Figure 1a, the network architecture of YOLOv8 comprises three distinct components: the backbone, the neck, and the head. Additionally, Figure 1b provides details on some of the specific modules within the YOLOv8 network architecture.

2.2. Model Improvements

The existing YOLOv8 algorithm is effective for target detection. However, it still struggles with small target detection. To enhance the detection of small target objects, this study improves the YOLO algorithm, naming it Lightweight Small Object Detection YOLOv8 (LSOD-YOLOv8). This model accurately extracts target information while maintaining a lightweight network structure. The architecture of the model is illustrated in Figure 1c, with specific algorithm details provided below.

In the LSOD-YOLOv8 model, we introduce a lightweight adaptive weight downsampling module (LAWDS) inspired by the Focus network. This module replaces the standard convolution in the original backbone, compensating for the information loss that occurs during conventional convolution and downsampling operations while achieving a more lightweight architecture. Additionally, we incorporate the concepts of a P2 layer and shared convolution in the head section, resulting in a lightweight small target detection head called P2-LSCSBD, which facilitates multi-scale information fusion while maintaining a lightweight design. Finally, recognizing the limitations of the original loss function, we combine the concepts of WiseIoU, InnerIoU, and MPDIoU to propose a new loss function, WIMIoU. This new loss function allows the entire model to achieve a significant improvement in accuracy without compromising performance.

2.2.1. LAWDS

The task of small object detection requires high sensitivity to the details of target information. The original backbone network utilizes conventional 2 × 2 convolution; however, this standard convolution operation can lead to the loss and omission of crucial information, ultimately degrading target detection accuracy. To address this issue, we introduce the lightweight adaptive weight downsampling module (LAWDS), which mitigates the information loss inherent in conventional convolution and downsampling operations while achieving a lightweight network architecture.

The LAWDS (Light Adaptive Weight DownSampling) module is an adaptive downsampling method based on an attentional mechanism, designed to enhance downsampling efficiency. It introduces an adaptive attention mechanism that calculates attentional weights for local regions of the feature map, allowing the module to preserve important features while reducing spatial resolution. This approach effectively improves the downsampling performance of the model while minimizing computational complexity. As shown in Figure 2, the structure of LAWDS consists of an average pooling layer and a 1 × 1 convolutional layer, which generate spatial attentional weights that are normalized using the Softmax function. The module down-samples the input using grouped convolution, expanding the number of channels to four times the original count, and aligns the result with the attention weights through a dimensional rearrangement operation. The down-sampled feature maps are then multiplied element-wise with the attention weights and summed across the spatial dimensions to yield the weighted down-sampled results. This mechanism allows the model to retain essential feature information while effectively reducing spatial resolution.

2.2.2. P2-LSCSBD

Small-target detection tasks require the integration of multi-scale information to effectively capture features across different scales, which is especially critical for accurately detecting and recognizing small targets. Multi-scale fusion significantly enhances the model’s sensitivity to small targets while maintaining the detection capabilities for larger objects. In this paper, we introduce the P2 layer into the original Neck network and combine it with shared convolution and independent BN (Batch Normalization) operations in the Head layer to design a new lightweight detection head, P2-LSDSBD (P2 with Lightweight Shared Convolutional Separate BN Detection). This design achieves both multi-scale information fusion and lightweight model operation.

The P2-LSDSBD detection head leverages P2 layer information on top of the original YOLOv8 network, enhancing the model’s ability to detect small targets through multi-scale information fusion. Traditional detection heads often use multiple convolutional layers to generate bounding box and category predictions, which can be computationally expensive and limit training and inference efficiency. To address these challenges, the proposed detection head optimizes the processing of feature maps by utilizing lightweight shared convolutional layers alongside an independent BN detection head, improving both detection speed and accuracy.

As shown in Figure 3, the network structure includes Conv_BN, a multi-layer convolutional module using 1 × 1 convolution kernels to adjust the number of input feature map channels. Conv_GN is the shared convolution layer, which applies 3 × 3 convolution kernels to extract local features. The BNAct module represents the independent batch normalization layer, ensuring that different feature maps undergo separate normalization processes. The detection head further incorporates techniques such as shared convolutional layers, separate batch normalization, and distributed focus loss to optimize both bounding box and category predictions. The entire model is designed to minimize computational resource usage while maintaining high detection performance.

2.2.3. WIMIoU

We combine the ideas of WiseIou, InnerIou, and MPDIoU to propose a new loss function that allows the improved model to obtain a lossless accuracy improvement, which we call WIMIoU. The derivation of this loss function is as follows:

MPDIoU is defined as

MPDIoU = \frac{B_{gt} \cap B_{prd}}{B_{gt} \cup B_{prd}} - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}}

(1)

d_{1}^{2} = {(x_{l}^{gt} - x_{l}^{prd})}^{2} + {(y_{t}^{gt} - y_{t}^{prd})}^{2}, d_{2}^{2} = {(x_{r}^{gt} - x_{r}^{prd})}^{2} + {(y_{b}^{gt} - y_{b}^{prd})}^{2}

(2)

where B_prd and B_gt represent the prediction frame and target frame, respectively, and

x_{l}^{gt}, x_{r}^{gt}, y_{t}^{gt}, y_{b}^{gt}, x_{l}^{prd}, x_{r}^{prd}, y_{t}^{prd}, y_{b}^{prd}, d_{1}, d_{2}

represent the left horizontal coordinate, right horizontal coordinate, upper vertical coordinate, lower vertical coordinate of the target frame, the left horizontal coordinate, right horizontal coordinate, upper vertical coordinate, lower vertical coordinate of the prediction frame, the upper-left distance of the target frame and the prediction frame, and the lower-right distance of the prediction frame, respectively; w and h denote the width and the height of the input image, respectively. As illustrated in Equation (1), the first term corresponds to the standard IoU, which quantifies the overlap region between bounding boxes, while the subsequent terms penalize corner misalignments by normalizing positional deviations. This normalization mechanism harmonizes sensitivity across targets of varying scales. After defining the MPDIoU, its corresponding loss can be defined as follows:

L_{MPDIoU} = 1 - MPDIoU

(3)

The following obtains the matrix frame information that can encompass the entire target frame and the prediction frame:

x_{c}^{gt} = \frac{x_{l}^{gt} + x_{r}^{gt}}{2}, y_{c}^{gt} = \frac{y_{t}^{gt} + y_{b}^{gt}}{2}, x_{c}^{prd} = \frac{x_{l}^{prd} + x_{r}^{prd}}{2}, y_{c}^{prd} = \frac{y_{t}^{prd} + y_{b}^{prd}}{2}

(4)

w_{gt} = x_{r}^{gt} - x_{l}^{gt}, h_{gt} = y_{t}^{gt} - y_{b}^{gt}, w_{prd} = x_{r}^{prd} - x_{l}^{prd}, h_{prd} = y_{t}^{prd} - y_{b}^{prd}

(5)

where

x_{c}^{gt}, y_{c}^{gt}, x_{c}^{prd}, y_{c}^{prd}, w_{gt}, h_{gt}, w_{prd}, h_{prd}

represent the center horizontal coordinate and center vertical coordinate of the target box, the center horizontal coordinate and center vertical coordinate of the prediction box, the width and height of the target box, and the width and height of the prediction box, respectively. The inner-box coordinates are obtained as follows:

b_{l}^{gt} = x_{c}^{gt} - \frac{w_{gt} * ratio}{2}, b_{r}^{gt} = x_{c}^{gt} + \frac{w_{gt} * ratio}{2}, b_{t}^{gt} = y_{c}^{gt} + \frac{h_{gt} * ratio}{2}, b_{b}^{gt} = y_{c}^{gt} - \frac{h_{gt} * ratio}{2}

(6)

b_{l}^{prd} = x_{c}^{prd} - \frac{w_{prd} * ratio}{2}, b_{r}^{prd} = x_{c}^{prd} + \frac{w_{prd} * ratio}{2}, b_{t}^{prd} = y_{c}^{prd} + \frac{h_{prd} * ratio}{2}, b_{b}^{prd} = y_{c}^{prd} - \frac{h_{prd} * ratio}{2}

(7)

where ratio is the ratio factor to find InnerIoU;

b_{l}^{gt}, b_{r}^{gt}, b_{t}^{gt}, b_{b}^{gt}, b_{l}^{prd}, b_{r}^{prd}, b_{t}^{prd}, b_{b}^{prd}

represent the left horizontal coordinate, right horizontal coordinate, upper vertical coordinate, lower vertical coordinate of the target frame corrected by the ratio factor, and the left horizontal coordinate, right horizontal coordinate, upper vertical coordinate, and lower vertical coordinate of the prediction frame corrected by the ratio factor, respectively. The detection box range is reduced according to the ratio factor, forcing the model to focus on the target center area and reduce edge noise interference. Define InnerIoU and its loss function solved as follows:

InnerIoU = \frac{inter}{union}, L_{Inner - IoU} = 1 - InnerIoU = 1 - \frac{inter}{union}

(8)

where inter denotes the area of the intersection portion of the target frame and the prediction frame after the INNER operation, i.e., the numerator portion of InnerIoU; UNION denotes the area of the concatenation of the target frame and the prediction frame after the INNER operation, i.e., the denominator portion of InnerIoU; and L_Inner-IoU denotes the loss function of InnerIoU.

L_{Inner - MPDIoU} = L_{MPDIoU} + IoU - InnerIoU

(9)

L_{WIMDIoU} = R_{WIMIoU} L_{Inner - MPDIoU}, R_{WIMIoU} = \exp (\frac{{(x_{l}^{gt} - x_{l}^{prd})}^{2} + {(y_{t}^{gt} - y_{t}^{prd})}^{2}}{w^{2} + h^{2}} + \frac{{(x_{r}^{gt} - x_{r}^{prd})}^{2} + {(y_{b}^{gt} - y_{b}^{prd})}^{2}}{w^{2} + h^{2}})

(10)

where L_WIMIoU denotes the loss function of WIMIoU, L_Inner-MPDIoU denotes the loss function of Inner-MPDIoU, and R_WIMIoU denotes the compensation coefficients of the loss function WIMIoU. The loss weight is dynamically adjusted through the compensation coefficient. When the center of the predicted box deviates from the real box, the loss weight is increased to strengthen the center alignment constraint. The prediction bounding box of WIMIoU is plotted against the ground-truth bounding-box positional information in Figure 4.

As illustrated in Figure 4, the red dashed lines depict MPDIoU’s corner distance penalty mechanism, which quantifies positional deviations between predicted and ground-truth box corners. The two smaller inner boxes represent the contracted regions generated by InnerIoU through ratio-based scaling, emphasizing target-centric feature learning. Their overlapping area corresponds to WiseIoU’s adaptive compensation zone, where dynamic weight adjustments prioritize center alignment accuracy based on offset severity.

The WIMIoU loss function enhances localization accuracy in object detection, particularly for small targets, by integrating three key components: MPDIoU, InnerIoU, and WiseIoU. MPDIoU extends traditional IoU by introducing a corner distance penalty term normalized by image dimensions, which balances sensitivity across objects of varying scales. InnerIoU constrains the model to focus on central regions by adaptively shrinking bounding boxes using a ratio factor, effectively reducing noise from peripheral areas. WiseIoU dynamically adjusts loss weights through a compensation coefficient based on the center offset between predicted and ground-truth boxes, reinforcing alignment precision. These components are integrated through weighted summation to formulate the final loss function. The advantages of WIMIoU lie in its multi-level geometric constraints and adaptive mechanisms: MPDIoU establishes foundational localization constraints, InnerIoU refines sensitivity to central regions, and the dynamic compensation term mitigates class imbalance. This design ensures heightened responsiveness to minor positional deviations while maintaining smooth penalty gradients for large misalignments, ultimately improving robustness and accuracy in small-object detection.

3. Experimental Preparation

3.1. Experimental Dataset

The dataset used in this study is custom-made and consists of images collected specifically for the detection of the target objects. The images were annotated using a labeling tool to ensure accurate labeling of the target objects. The original dataset contains 4858 images, which were split into training, validation, and test sets with a 7:1:2 ratio. This resulted in 3400 images for the training set, 486 images for the validation set, and 972 images for the test set.

As shown in Figure 5, to ensure model convergence, high detection accuracy, and robustness under varying lighting conditions, data augmentation was applied to the original dataset. This involved three rotations at different angles, horizontal and vertical flips, and ten adjustments to the image brightness. As a result, the dataset was expanded by a factor of 26. Consequently, the final dataset includes 88,400 images in the training set, 12,636 images in the validation set, and 25,272 images in the test set. This extensive augmentation helps comprehensively evaluate the model’s performance across various subsets.

3.2. Indicators for Experimental Evaluation

The evaluation metrics used in this study include precision (P), recall (R), mAP50, mAP50-95, the number of parameters, and detection speed. A confusion matrix is employed to calculate the differences between actual and predicted values, serving as a critical tool to assess the performance of target classification. Precision, recall, and the number of parameters are fundamental metrics typically measured using the confusion matrix, which is based on true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). The confusion matrix drawn from the true and predicted values of the dataset is shown in Figure 6, illustrating the model’s classification performance across different classes.

As illustrated in Figure 6, the improved network maintains a 100% classification accuracy for the “background” category while successfully increasing the recognition accuracy of the “cigarette” category from 0.83 to 0.85 and reducing the misclassification rate from 0.17 to 0.15. This indicates that the optimization strategy has effectively enhanced the model’s ability to identify the target category while ensuring stable background detection. Further improvements can be explored to minimize misclassification and enhance recall, ultimately boosting the overall model performance.

The comprehensive performance of the model was evaluated using mAP50, with an IOU accuracy threshold set at 0.5. Accuracy is defined as the ratio of correctly predicted instances to the total number of positive instances in the dataset. The mathematical expression for accuracy is provided below:

Precision = \frac{TP}{TP + FP}

(11)

Recall measures the ability of a model to correctly identify in all samples. The mathematical expression for recall is given below:

Recall = \frac{T P}{T P + F N}

(12)

TP represents the number of positive category samples correctly detected by the model, FN represents the number of positive category samples missed by the model, FP denotes the number of negative category samples incorrectly detected as positive, and TN refers to the number of negative category samples correctly identified by the model. Assuming that the number of detection categories is n, the mathematical expression for the model’s average precision (mAP) across all categories is given as follows:

mAP = \frac{\sum_{i = 1}^{n} {AP}_{i}}{n}

(13)

3.3. Experimental Environment

The experiments in this study were conducted using Pytorch 2.2 and Python 3.9 (64-bit) within a Windows 10 environment. The hardware setup included a computer equipped with 32 GB of RAM, a 13th Gen Intel(R) Core (TM) i5-13490F 2.50 GHz processor, and an NVIDIA GeForce RTX 4070 GPU. The experimental environment was managed via Anaconda3, with all training tasks executed on the GPU to ensure both validity and accuracy. To maintain consistency across all experiments, a batch size of 16 and an epoch count of 1000 were used. The hardware configuration of the computers used in the experiments is detailed in Table 1.

4. Experimental Results and Analysis

4.1. Experimental Validation of LAWDS Module

To assess the effectiveness of our LAWDS module for small target detection tasks, we compared the performance of the YOLOv8n model with the YOLOv8-LAWDS model using various metrics, including precision (P), recall (R), mAP50, mAP50-95, parameters, LAWDS, and GFLOPS. Additionally, we conducted experiments to determine the optimal placement of the LAWDS module, evaluating its impact at different locations within the YOLOv8 base network architecture—specifically, the backbone layer, neck layer, and backbone + neck layer. The results for each metric are presented in Table 2.

The results presented in the table indicate that we introduced the LAWDS module in the backbone layer, neck layer, and the combination of both within the YOLOv8 base network structure for experimental comparison. The experimental findings reveal that incorporating the LAWDS module in the backbone layer yields the most significant enhancement in mAP, reduces the number of parameters, and achieves the fastest computation speed while utilizing the fewest network layers. Furthermore, when we compared the YOLOv8-LAWDS (backbone) model with the baseline model, we observed that our improved model demonstrates superior performance in terms of accuracy, mAP50, mAP50-95, and GFLOPs. Notably, our model also reduces the number of parameters by 10% compared to the baseline model, achieving a lightweight network structure while simultaneously enhancing accuracy.

4.2. Comparative Experiment of Different Detection Models

In order to more intuitively demonstrate the advantages of the improved algorithm in cigarette detection, we compared the existing mainstream algorithms. By comparing the precision value and recall curve of each model in the figure, it can be concluded that the LSOD-YOLOv8n algorithm is better than the YOLOv8n benchmark model and the advanced YOLOv5n and YOLOv10n models in detection performance and also has a significant advantage in detection accuracy.

As shown in Figure 7, the experimental results show that LSOD-YOLOv8 shows significant comprehensive performance advantages in target detection tasks. The model achieves dynamic balance optimization of precision–recall by introducing LAWDS and P2-LSCSBD detection head modules: after 600 iterations, the accuracy, recall, and mAP values of each model tend to be stable, and the final accuracy is above 0.8, the recall is around 0.8, the mAP50 is above 0.8, and the mAP50-95 remains around 0.5, but LSOD-YOLOv8 presents the best detection performance. In summary, LSOD-YOLOv8 achieves the best trade-off between accuracy, robustness, and efficiency in detection tasks through innovative architecture improvements and loss function optimization, providing an important reference for the technical development of lightweight detection models.

From the experimental results in Table 3, it can be seen that compared with YOLOv5n, YOLOv8n, and YOLOv10n, LSOD-YOLOv8n is more suitable for cigarette detection. In terms of key indicators, such as mAP50, mAP50-95, accuracy, and recall, LSOD-Yolov8n has shown obvious advantages, and the number of parameters has been greatly reduced. It is only slightly slower than the YOLOv5n model in detection speed by 0.09s. Considering the application scenario of our cigarette detection algorithm for small targets in public spaces, where public safety requirements are paramount, we prioritize detection accuracy as the primary optimization objective throughout our algorithm development while maintaining reasonable inference speed. As demonstrated in Table 3, we adopt mAP50 as the principal evaluation metric to ensure high detection precision. Concurrently, we select network architectures with minimized parameters to achieve satisfactory inference efficiency, thereby balancing accuracy and computational performance for practical deployment in surveillance environments. Our model achieved a 2.8% improvement in the mAP value but was only 0.09 s slower in detection speed, which is acceptable for our application scenario. The above comparative experimental results clearly show that the LSOD-YOLOv8n algorithm proposed in this paper exhibits superior detection performance in the field of cigarette detection.

In addition, to evaluate the performance of the model in different scenarios, we conducted actual test experiments on each model for three scenarios: crowd, outdoor, and smoking area, and we pointed out the problems of false detection and missed detection in each model. The specific results are shown in Figure 8. As can be seen from the figure, in the first column, when the target object is small in the crowd, YOLOv5n, YOLOv8n, and YOLOv10n cannot detect the small target cigarette on the left, while LSOD-YOLOv8n successfully identifies all targets. In the second column of outdoor situations, only YOLOv5n misdetects the elderly’s fingers as cigarettes, and the other models correctly identify the targets, but the LSOD-YOLOv8 model is the best in detection accuracy. In the third column, the most complex smoking area, the false detection and missed detection of each model increased significantly, but our model still maintained a high accuracy and anti-interference ability. This shows that when facing cigarette detection in complex environments, our proposed model can also reliably identify targets and determine their locations.

4.3. Ablation Experimental

In order to evaluate the impact of LAWDS, P2-LSCSBD detection head, and loss function WIMIoU on the detection performance of LSOD-YOLOv8n, this paper conducted an ablation experiment, and the specific results are shown in Table 4. The ablation experiment results show that when only the LAWDS module is introduced, the mAP50 is increased by 0.7%, the accuracy is increased by 1.7%, the detection speed is accelerated by 0.058 s, and the number of parameters is reduced by 10%; WIMIoU improves both the accuracy and mAP while keeping the speed and number of parameters unchanged; after adding the P2-LSCSBD detection head alone, the mAP50 is increased by 1.9%, and the number of parameters is slightly reduced; finally, compared with the original YOLOv8n, the LSOD-YOLOv8n algorithm, using all the improved methods, has an increase of 2.8 percentage points in mAP50, an increase of 1.9 percentage points in precision, a reduction of about 28% in model parameters, and a slight increase of 0.09 s in detection time. A series of ablation experiments prove the effectiveness of each module and prove that LSOD-YOLOv8n is the most suitable model in this study.

As shown in Figure 9, by comparing the performance curves of different models, LSOD-YOLOv8 demonstrates similar performance in precision, recall, and mean average precision (mAP) to other YOLOv8 models while exhibiting more stable characteristics in certain metrics, particularly after prolonged training. Among all models, LSOD-YOLOv8 reaches a high precision early on and maintains stability throughout subsequent training, indicating its excellent ability to reduce false positives and quickly learn to predict targets accurately. The model’s recall is also commendable, ultimately aligning with other models and stabilizing around 0.8, which signifies its effectiveness in detecting positive samples without missing them. In terms of mAP50, LSOD-YOLOv8 performs similarly to other configurations, consistently remaining above 0.8, showcasing its strong target detection capability at lower IoU thresholds. Notably, LSOD-YOLOv8 exhibits slightly superior average precision at stricter IoU thresholds (0.5:0.95) compared to some other configurations. After approximately 600 epochs, the model shows a smooth convergence trend, ultimately stabilizing around 0.6, which reflects its robustness across various complex scenarios. In summary, LSOD-YOLOv8 optimizes the learning strategy, maintaining high precision while achieving good recall and balanced detection performance across multiple IoU thresholds. This model demonstrates high reliability and adaptability in complex tasks, making it particularly suitable for applications that demand precision and stability in target detection.

As shown in Figure 10, an analysis of the loss curves for the LSOD-YOLOv8 model reveals significant rapid convergence characteristics during both the training and validation phases, particularly in terms of box loss, DFL loss, and classification loss, all of which maintain low values. This indicates that LSOD-YOLOv8 exhibits superior performance in object localization and classification tasks, demonstrating a strong ability to differentiate between foreground and background objects. Additionally, its validation loss curve remains smooth and stable, highlighting the model’s good generalization capability on unseen data, thus avoiding significant fluctuations and ensuring reliable detection performance. Compared to other YOLOv8 configurations, LSOD-YOLOv8 shows more consistent performance in the later stages of training, further confirming its effectiveness in small target detection tasks. These results suggest that the proposed enhancements significantly improve the accuracy and reliability of the small target detection model.

4.4. LSOD-YOLOv8 Robustness Experiment

Considering that detection models are often affected by external environmental changes in real-world scenarios, we conducted research on the model’s performance under varying brightness and blur conditions. In the experiment, we adjusted the image brightness to simulate different lighting conditions and altered the blur level to mimic motion blur or the effects of weather. We then compared the interference resistance and robustness of the YOLOv8n model and the LSOD-YOLOv8 model under these conditions. The specific comparison results are shown in Figure 11, Figure 12, Figure 13 and Figure 14.

By comparing the detection performance of the models before and after improvements under different brightness and blur levels, we observed that the YOLOv8n model is significantly affected by lighting. As brightness increases, it often results in missed detections, while a decrease in brightness leads to false detections. The improved LSOD-YOLOv8 model, however, only experiences a small number of missed detections in cases of extreme or substantial lighting changes, without any false detections. As the blur increases, the detection performance of both models declines sharply. The YOLOv8n model even shows false and missed detections due to the blur, whereas our model, though not completely immune to missed detections under high blur conditions, demonstrates superior accuracy and stability. This further proves the advantages of our model’s performance and interference resistance in complex scenarios.

4.5. Comparison of Model Performance Across Different Datasets

To further validate the superior generalization capability of the proposed model, we conducted training and evaluation on various datasets. Specifically, we performed comparative experiments on the LSOD-YOLOv8 model using a fire detection dataset and a waste classification dataset. Subsequently, considering that the coco dataset contains a large number of small target objects, our LSOD-YOLOv8 model was used to train the coco dataset, further demonstrating the versatility of our model in the field of small-target detection. During the experiment, each dataset was split into training, validation, and test sets in a 7:1:2 ratio. All experiments were run on a machine equipped with 32 GB of RAM, a 13th Gen Intel(R) Core (TM) i5-13490F 2.50 GHz processor, and an NVIDIA GeForce RTX 4070 GPU. The comparative results are shown in Table 5.

As illustrated in Table 5, LSOD-YOLOv8 consistently outperforms YOLOv8n across most metrics. For cigarette detection, LSOD-YOLOv8 demonstrates higher precision and improved mAP50 and mAP50-95, highlighting its superior accuracy and robustness. In fire detection, LSOD-YOLOv8 significantly improves in precision and mAP, showing enhanced capability to reduce false positives. For waste detection, both models perform exceptionally well, but LSOD-YOLOv8 marginally surpasses YOLOv8n across all metrics, particularly in mAP50-95. In the coco dataset detection task, the LSOD-YOLOv8 model shows its excellent detection performance in all metrics, especially in mAP50, which is 4.3% higher. These results underscore the model’s ability to achieve performance gains across different datasets. Overall, our LSOD-YOLOv8 model not only boosts detection accuracy but also exhibits strong precision and generalization, making it a highly effective solution for diverse detection tasks.

5. Discussion

Cigarette detection represents a critical component of public safety management systems. Despite the advancements in object detection algorithms, challenges persist due to the inherent limitations of cigarette-like targets, which exhibit small size and distinctive feature sets. To address these challenges, we propose an optimized lightweight target detection framework based on the YOLOv8 algorithm, aiming to enhance detection accuracy while maintaining computational efficiency. Our experimental results demonstrate that the proposed P2-LSCSBD-YOLOv8 model achieves superior performance across multiple metrics. Compared to the baseline model, our improved architecture reduces the total parameter count by approximately 28%, while maintaining comparable accuracy (mAP50 improves by 2.8% and mAP50-95 increases by 2.2%). The enhanced detection head incorporates multi-scale feature fusion capabilities, which are particularly beneficial for real-world applications where objects may appear at varying scales.

Despite these improvements, several limitations remain. First, the lightweight adaptive weight downsampling module, while compensating for information loss inherent in traditional convolutions, could be further optimized to preserve critical features without significantly increasing computational overhead. Second, although the P2-LSCSBD detection head achieves effective multi-scale integration, detection speed remains a potential bottleneck under high-resolution imaging requirements. Looking forward, we suggest several promising directions for future research. First, integrating more efficient attention mechanisms, such as deformable attention modules or scaled dot-product attention variants, could further enhance computational efficiency while maintaining detection accuracy. Second, extending the framework to handle multi-task learning scenarios, such as simultaneous target classification and bounding-box regression, may enable more resource-efficient solutions with improved performance across multiple detection tasks. Moreover, exploring interdisciplinary applications within public safety management could expand the utility of our proposed architecture beyond traditional surveillance contexts. For instance, integrating cigarette detection with smoke detection systems may provide enhanced capabilities for risk assessment in challenging environments where conventional surveillance methods are ineffective.

In conclusion, this study presents significant advancements in lightweight target detection frameworks, particularly for small object recognition tasks. By addressing existing limitations and exploring novel research directions, we hope to contribute meaningful improvements to the field of public safety management and related domains.

6. Conclusions

In this paper, we proposed a lightweight small-target detection algorithm based on YOLOv8, resulting in the LSOD-YOLOv8 detection model, which demonstrates enhanced performance in our cigarette detection tasks. LSOD-YOLOv8 improves small target detection accuracy by integrating a lightweight adaptive weight downsampling module (LAWDS), a shared convolutional information small target detection head (P2-LSCSBD), and a novel loss function (WIMIoU). This approach enables a lightweight network structure while significantly boosting the model’s accuracy for small target detection. As a result, our improved technique achieved a 28% reduction in model parameters, alongside a 2.8% increase in mAP50 and a 2.2% increase in mAP50-95 compared to the baseline model. These findings underscore the significant value of the LSOD-YOLOv8 algorithm in enhancing cigarette detection, which can effectively mitigate hazards and economic losses associated with smoking in public places. Future research will focus on further optimizing algorithm parameters and testing in diverse scenarios to ensure broader practical application, thereby providing a solid research foundation for the field of small target detection.

Author Contributions

Conceptualization, Y.H. and H.O.; methodology, Y.H. and H.O.; validation, Y.H. and H.O.; formal analysis, Y.H. and H.O.; investigation, Y.H. and H.O.; data curation, Y.H. and H.O.; writing—original draft preparation, Y.H.; writing—review and editing, H.O. and X.M.; visualization, Y.H.; supervision, H.O. and X.M.; project administration, H.O. and X.M.; funding acquisition, H.O. and X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China under Grant 2022YFB3305002, the National Natural Science Foundation of China under Grants 61906088 and 61703202. At the same time, I would like to express my sincere thanks to the relevant professors who provided guidance for this research.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study can be obtained from the corresponding authors.

Acknowledgments

The authors want to thank the editor and anonymous reviewers for their valuable suggestions for improving this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Malburg, L.; Rieder, M.P.; Seiger, R.; Klein, P.; Bergmann, R. Object detection for smart factory processes by machine learning. Procedia Comput. Sci. 2021, 184, 581–588. [Google Scholar] [CrossRef]
Zhang, J.; Xing, L.; Tan, Z.; Wang, H.; Wang, K. Multi-Head attention fusion networks for multi-modal speech emotion recognition. Comput. Ind. Eng. 2022, 168, 108078. [Google Scholar] [CrossRef]
Sharma, A.; Singh, P.K. Applicability of UAVs in detecting and monitoring burning residue of paddy crops with IoT integration: A step towards greener environment. Comput. Ind. Eng. 2023, 184, 109524. [Google Scholar]
Dhayne, H.; Kilany, R.; Haque, R.; Taher, Y. EMR2vec: Bridging the gap between patient data and clinical trial. Comput. Ind. Eng. 2021, 156, 107236. [Google Scholar] [PubMed]
Liu, Y.; Wang, B.; Xu, X.; Xu, J. A new paradigm in cigarette smoke detection: Rapid identification technique based on ATR-FTIR spectroscopy and GhostNet-α. Microchem. J. 2024, 205, 111173. [Google Scholar]
Zhu, B.; Wang, J.; Liu, S.; Dong, M.; Jia, Y.; Tian, L.; Su, C. RFMonitor: Monitoring smoking behavior of minors using COTS RFID devices. Comput. Commun. 2022, 185, 55–65. [Google Scholar]
Imtiaz, M.H.; Ramos-garcia, R.I.; Wattal, S.; Tiffany, S.; Sazonov, E. Wearable sensors for monitoring of cigarette smoking in free-living: A systematic review. Sensors 2019, 19, 4678. [Google Scholar] [CrossRef] [PubMed]
Fu, Y.; Ran, T.; Xiao, W.; Yuan, L.; Zhao, J.; He, L.; Mei, J. GD-YOLO: An improved convolutional neural network architecture for real-time detection of smoking and phone use behaviors. Digit. Signal Process. 2024, 151, 104554. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Volume 1804, pp. 1–6. [Google Scholar]
Wang, C.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13029–13038. [Google Scholar]
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Sohan, M.; Sai Ram, T.; Reddy, R. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
Yue, S.; Zhang, Z.; Shi, Y.; Cai, Y. WGS-YOLO: A real-time object detector based on YOLO framework for autonomous driving. Comput. Vis. Image Underst. 2024, 249, 104200. [Google Scholar]
Xiao, D.; Wang, H.; Liu, Y.; Li, W.; Li, H. DHSW-YOLO: A duck flock daily behavior recognition model adaptable to bright and dark conditions. Comput. Electron. Agric. 2024, 225, 109281. [Google Scholar]
Wang, Z.; Zhang, S.; Chen, Y.; Xia, Y.; Wang, H.; Jin, R.; Wang, C.; Fan, Z.; Wang, Y.; Wang, B. Detection of small foreign objects in Pu-erh sun-dried green tea: An enhanced YOLOv8 neural network model based on deep learning. Food Control 2025, 168, 110890. [Google Scholar]
Guo, C.; Ren, K.; Chen, Q. YOLO-SGF: Lightweight network for object detection in complex infrared images based on improved YOLOv8. Infrared Phys. Technol. 2024, 142, 105539. [Google Scholar] [CrossRef]
Niu, W.; Lei, X.; Li, H.; Wu, H.; Hu, F.; Wen, X.; Zheng, D.; Song, H. YOLOv8-ECFS: A lightweight model for weed species detection in soybean fields. Crop Prot. 2024, 184, 106847. [Google Scholar] [CrossRef]
Yang, S.; Wang, W.; Gao, S.; Deng, Z. Strawberry ripeness detection based on YOLOv8 algorithm fused with LW-Swin Transformer. Comput. Electron. Agric. 2023, 215, 108360. [Google Scholar]
Zhu, L.; Zhang, J.; Zhang, Q.; Hu, H. CDD-YOLOv8: A small defect detection and classification algorithm for cigarette packages. In Proceedings of the 2023 IEEE 13th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Qinhuangdao, China,, 11–14 July 2023; pp. 716–721. [Google Scholar]
Cui, S.; Zhang, Y.; Cao, F.; Qu, T.; Sun, X. Improved YOLOv8 track foreign object detection based on lightweight convolution and information enhancement. In Proceedings of the 2024 14th Asian Control Conference (ASCC), Dalian, China, 5–8 July 2024; pp. 1260–1265. [Google Scholar]
Wang, Y.; Pan, F.; Li, Z.; Xin, X.; Li, W. CoT-YOLOv8: Improved YOLOv8 for aerial images small target detection. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 4943–4948. [Google Scholar]
Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 214–230. [Google Scholar]
Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z. An improved faster R-CNN for small object detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar]
Lim, J.S.; Astrid, M.; Yoon, H.J.; Lee, S.-I. Small object detection using context and attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 181–186. [Google Scholar]
Benjumea, A.; Teeti, I.; Cuzzolin, F.; Bradley, A. YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv 2021, arXiv:2112.11798. [Google Scholar]
Liu, H.-I.; Tseng, Y.-W.; Chang, K.-C.; Wang, P.-J.; Shuai, H.-H.; Cheng, W.-H. A denoising FPN with transformer R-CNN for tiny object detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4704415. [Google Scholar] [CrossRef]

Figure 1. (a) YOLOv8 network architecture. (b) YOLOv8 module details. (c) LSOD-YOLOv8 network architecture.

Figure 2. LAWDS structure.

Figure 3. Structure of P2-LSDSBD.

Figure 4. Predicted bounding box and ground-truth bounding-box location information.

Figure 5. Augmented-image datasets.

Figure 6. Confusion matrix of true and predicted values.

Figure 7. Mainstream algorithm performance comparison.

Figure 8. Comparison of detection performance of algorithms in different scenarios.

Figure 9. Comparison of main indicators of ablation experiments.

Figure 10. Comparison of loss indicators of ablation experiments.

Figure 11. Comparison of detection performance of YOLOv8n model under different brightness conditions.

Figure 12. Comparison of detection performance of LSOD-YOLOv8n model under different brightness conditions.

Figure 13. Comparison of detection performance of YOLOv8n model at different blur levels.

Figure 14. Comparison of detection performance of LSOD-YOLOv8n model at different blur levels.

Table 1. Computer configuration.

Platform	Configuration Information
System	Windows 10
GPUs	NVIDIA GeForce RTX 4070
CPU	13th Gen intel(R) Core (TM) i5-13490F 2.50 GHz
Language	Python 3.9
GPU calculate platform	CUDA 12.1
Deep learning framework	Pytorch 2.2.0

Table 2. Data table of experimental metrics of LAWDS module at different positions.

	P	R	mAP50	mAP50-95	Parameters	Layers	Detection Speed (s)	GFLOPs
yolov8n	0.849	0.803	0.829	0.513	3,011,043	225	0.55	8.2
yolov8-LAWDS (all)	0.866	0.784	0.837	0.516	2,681,123	260	0.576	8.1
yolov8-LAWDS (neck)	0.871	0.782	0.831	0.516	2,959,331	239	0.54	8.3
yolov8-LAWDS (backbone)	0.866	0.796	0.836	0.506	2,732,835	246	0.492	8

Table 3. Comparative experimental results of different detection models.

	P	R	mAP50	mAP50-95	Parameters	Detection Speed (s)
YOLOv5n	0.84	0.792	0.805	0.437	2,508,659	0.69
YOLOv8n	0.849	0.803	0.829	0.513	3,011,043	0.55
YOLOv10n	0.858	0.803	0.842	0.524	2,707,430	0.67
LSOD-YOLOv8	0.868	0.805	0.857	0.535	2,141,477	0.64

Table 4. Results of ablation studies.

Model Structure			Evaluation Index
LAWDS	P2-LSCSBD	WIMIoU	P	R	mAP50	mAP50-95	Parameters	Detection Speed (s)
			0.849	0.803	0.829	0.513	3,011,043	0.55
√			0.866	0.796	0.836	0.506	2,732,835	0.492
	√		0.867	0.799	0.848	0.538	2,926,692	0.508
		√	0.877	0.799	0.841	0.516	3,011,043	0.55
√	√		0.856	0.798	0.851	0.526	2,141,477	0.64
√		√	0.869	0.81	0.847	0.513	2,367,460	0.408
	√	√	0.851	0.82	0.856	0.535	2,419,685	0.512
√	√	√	0.868	0.805	0.857	0.535	2,141,477	0.64

Table 5. Model performance across different datasets.

	Dataset	P	R	mAP50	mAP50-95
YOLOv8n	cigarette	0.849	0.803	0.829	0.513
LSOD-YOLOv8	cigarette	0.868	0.805	0.857	0.535
YOLOv8n	fire	0.543	0.607	0.519	0.251
LSOD-YOLOv8	fire	0.706	0.543	0.603	0.31
YOLOv8n	waste	0.985	0.987	0.994	0.873
LSOD-YOLOv8	waste	0.996	0.991	0.995	0.911
YOLOv8n	coco	0.734	0.687	0.71	0.539
LSOD-YOLOv8	coco	0.759	0.694	0.753	0.578

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Y.; Ouyang, H.; Miao, X. LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection. Appl. Sci. 2025, 15, 3961. https://doi.org/10.3390/app15073961

AMA Style

Huang Y, Ouyang H, Miao X. LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection. Applied Sciences. 2025; 15(7):3961. https://doi.org/10.3390/app15073961

Chicago/Turabian Style

Huang, Yijie, Huimin Ouyang, and Xiaodong Miao. 2025. "LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection" Applied Sciences 15, no. 7: 3961. https://doi.org/10.3390/app15073961

APA Style

Huang, Y., Ouyang, H., & Miao, X. (2025). LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection. Applied Sciences, 15(7), 3961. https://doi.org/10.3390/app15073961

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LSOD-YOLOv8: Enhancing YOLOv8n with New Detection Head and Lightweight Module for Efficient Cigarette Detection

Abstract

1. Introduction

2. Theory and Methods

2.1. Baseline

2.2. Model Improvements

2.2.1. LAWDS

2.2.2. P2-LSCSBD

2.2.3. WIMIoU

3. Experimental Preparation

3.1. Experimental Dataset

3.2. Indicators for Experimental Evaluation

3.3. Experimental Environment

4. Experimental Results and Analysis

4.1. Experimental Validation of LAWDS Module

4.2. Comparative Experiment of Different Detection Models

4.3. Ablation Experimental

4.4. LSOD-YOLOv8 Robustness Experiment

4.5. Comparison of Model Performance Across Different Datasets

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI