EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets

An, Jiaxi; Zhou, Lujing; Liu, Dianting; Zheng, Xinpeng; Zhou, Zhiyi; Wang, Heng

doi:10.3390/a19060438

Open AccessArticle

EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets

by

Jiaxi An

^1,2,

Lujing Zhou

^1,2,*,

Dianting Liu

^1,2,

Xinpeng Zheng

^1,2,

Zhiyi Zhou

² and

Heng Wang

^1,2

¹

The Key Laboratory of Advanced Manufacturing and Automation Technology, Education Department of Guangxi Zhuang Autonomous Region, Guilin University of Technology, Guilin 541006, China

²

College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541006, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(6), 438; https://doi.org/10.3390/a19060438

Submission received: 5 April 2026 / Revised: 13 May 2026 / Accepted: 20 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Modern Algorithms for Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Continuous manufacturing has emerged as the prevailing paradigm in the modern pharmaceutical industry, imposing stringent demands for efficient, real-time inspection methods. Furthermore, deploying high-performance deep learning models on industrial edge devices remains challenging due to computational constraints and the difficulty of detecting micro-defects (e.g., micro-cracks and spots). This paper proposes EMN-net, a lightweight defect detection model built upon the YOLOv8n architecture. The proposed algorithm integrates a MobileNetV3 backbone, the Efficient Local Attention (ELA) mechanism and the Normalized Wasserstein Distance (NWD) loss function to balance computational efficiency with sensitivity toward micro-defects. Evaluated on a self-built industrial tablet dataset expanded to 3086 images, EMN-net achieves an mAP50 of 97.8%, representing a 2.5% improvement over the baseline YOLOv8n. the computational complexity is reduced to 4.4 GFLOPs, while the inference throughput reaches 118 FPS, satisfying the real-time requirements of high-speed production lines. Additionally, the model exhibits improved robustness under simulated motion blur and sensor noise. EMN-net presents a balanced automated visual inspection solution for edge devices in continuous pharmaceutical manufacturing.

Keywords:

pharmaceutical quality control; surface defect detection; YOLOv8; lightweight neural network; MobileNetV3; NWD loss; ELA attention

1. Introduction

As a common form of oral solid dosage, pharmaceutical tablets represent the most widely used, most stable, and most cost-effective method of active drug delivery. In recent years, the pharmaceutical industry has gradually transitioned from traditional batch production to continuous manufacturing [1]. Driven by the rising demand for personalized medicines, the global pharmaceutical market size reached USD 1.67 trillion in 2024, with prescription drugs holding an 87% market share [2]. To meet this massive capacity demand, modern high-speed rotary tablet presses (e.g., the TPR 500 and TPR 700 series) now feature continuous production throughputs ranging from 403,200 to over 1,000,000 tablets per hour [3]. The transformative potential of deep learning has been extensively validated across various industrial manufacturing domains, such as defect detection in underwater friction stir welded pipes [4], corrosion rate prediction in flange joints [5], optimization of welding tool geometries [6], and multi-objective optimization of mechanical properties in dissimilar aluminum alloys [7]. Consequently, manual inspection can no longer meet the rigorous quality control requirements of modern production lines, making efficient and real-time total online inspection an essential choice.

During manufacturing stages such as tableting, minor differences in powder properties or equipment wear can easily cause tiny defects on the tablet surface, such as cracks, spots, and edge breakages [8]. These defects severely damage the physical structure and disintegration efficiency of the tablet, posing serious medical safety risks. Furthermore, traditional automated visual inspection (AVI) systems relying on fixed thresholds and manually designed features have limited generalization capabilities. They suffer from high false reject rates, costing the pharmaceutical industry up to USD 740 million annually by misinterpreting acceptable products as defective [9,10].

Therefore, object detection algorithms based on deep learning have emerged as a key research direction to break through this industrial bottleneck. However, existing models face a Pareto dilemma between high precision and real-time edge inference. For instance, recent applications of the Detectron2 instance segmentation framework achieved a Mean Average Precision (mAP) of 97.2%, but its complex mask prediction branch introduces massive computational overhead, making it too slow for high-speed lines [11]. In 2025, researchers demonstrated a CNN-based defect detection system for film-coated tablets with 99.7% accuracy [12]; yet it relied heavily on a 3D-printed static tray for image collection, rendering it unsuitable for dynamic continuous production lines. Existing inspection methodologies can be categorized based on their operational paradigms. High-resolution offline systems offer superior accuracy but fail to meet real-time throughput requirements. Conversely, lightweight detectors often prioritize speed at the expense of tiny-defect sensitivity. A systematic comparison reveals that most prior studies focus on static-tray classification, which lacks the robustness required for edge-deployed detection in dynamic industrial environments where computational costs must be strictly minimized without sacrificing micro-crack sensitivity.

Furthermore, improved single-stage detectors like CBS-YOLOv8 achieved a 97.4% mAP on pharmaceutical packaging datasets, but their inference speed was limited to 79.25 FPS on high-performance platforms [13]. Directly applying standard YOLOv8 to the online detection [14] of tiny tablet defects still faces severe limitations: industrial edge devices lack sufficient computing power [15], deep downsampling causes the loss of weak local features [16], and traditional Intersection over Union (IoU) loss functions are highly sensitive to tiny objects, leading to gradient vanishing and inaccurate localization [17].

To systematically address these challenges and strike an optimal algorithmic balance between the computational limitations of industrial edge devices and the demand for high-precision detection, this paper proposes a lightweight, high-precision tablet defect detection model based on YOLOv8n, named EMN-net.

The primary algorithmic optimization strategy of this study encompasses three aspects. First, to alleviate the computational bottleneck of edge devices, MobileNetV3 is introduced to reconstruct the backbone network. By utilizing depthwise separable convolutions, the parameter count and computational complexity are substantially reduced while ensuring feature extraction efficiency. Second, to enhance the model’s sensitivity to weak defects at high speeds, the Efficient Local Attention (ELA) mechanism is integrated into the network. Through 1D convolution and strip pooling, this mechanism enhances the network’s focus on local high-frequency features with extremely low computational overhead. Finally, addressing the localization failure of traditional loss functions on tiny objects, the Normalized Wasserstein Distance (NWD) loss function is introduced. By modeling bounding boxes as 2D Gaussian distributions, it effectively alleviates the gradient fluctuation problem of IoU when tiny objects do not overlap, improving localization robustness. Overall, the core novelty of this work lies not in a fundamentally new neural architecture, but in the targeted combination, placement, and industrial validation of MobileNetV3, ELA, and NWD specifically tailored for micro-defect detection on pharmaceutical edge devices.

The main contributions of this paper are summarized as follows:

(1): A customized EMN-net algorithm model for tiny tablet defects is proposed. By integrating MobileNetV3 and the ELA attention mechanism, the computational complexity is significantly reduced (GFLOPs halved) while simultaneously enhancing feature extraction and response capabilities for local tiny defects.
(2): The NWD loss function is introduced to reconstruct the bounding box regression strategy, effectively mitigating the gradient vanishing problem for tiny objects at the algorithmic level, and substantially improving the recall rate and localization accuracy for targets such as spots and micro-cracks.
(3): A realistic continuous production line tablet snapshot dataset was constructed, and comprehensive algorithmic comparison and validation experiments were conducted. The results indicate that EMN-net not only demonstrates superior performance against mainstream lightweight YOLO algorithms in detection accuracy and throughput but also exhibits strong robustness under degraded conditions like motion blur, demonstrating practical value for direct deployment on edge devices.

The remainder of this paper is organized as follows: Section 2 details the architecture of the proposed EMN-net algorithm; Section 3 introduces the dataset construction and experimental setup; Section 4 presents the experimental results and comparative analysis; and Section 5 summarizes the entire paper.

2. Proposed EMN-Net Algorithm

2.1. Overall Architecture of EMN-Net

YOLOv8, released by the Ultralytics team in 2023, is primarily composed of three components: the Backbone network responsible for feature extraction, the Neck network for multi-scale feature fusion, and the Head for the final object detection and classification tasks. By combining CSP Darknet and ELAN concepts to optimize gradient flow and introducing an anchor-free mechanism to reduce hyperparameter dependence, YOLOv8 achieves an excellent balance in general object detection.

To adapt to the extremely low computational constraints of edge devices on continuous pharmaceutical production lines while breaking through the performance bottleneck of tiny defect detection, this paper conducts a deep, customized reconstruction of the YOLOv8n architecture. The overall architecture of the improved EMN-net is illustrated in Figure 1. We introduce the MobileNetV3 network to reconstruct the backbone and strip away computational redundancy, integrate the ELA attention mechanism after the C2f modules in the neck network to enhance local feature anchoring, and replace the regression loss function in the detection head with the NWD loss to optimize the gradient propagation of tiny targets.

2.2. Lightweight Backbone: MobileNetV3

Since our detection task operates on a high-speed continuous tablet production line, the real-time throughput of the algorithm is strictly demanded. Under operating conditions where the tablet background is relatively uniform and the target contours are consistent, the general-purpose YOLOv8n backbone exhibits significant computational redundancy. Although the original YOLOv8n has excellent generalization capabilities, its bulky parameter size leads to high computational overhead (FLOPs). Therefore, we introduce the lightweight MobileNetV3 network to reconstruct the feature extraction backbone [18].

The foundational building block of MobileNetV3, the bneck (shown in Figure 2), introduces critical improvements over the Inverted Residual Block of its predecessor. Unlike standard convolutions that perform dense feature extraction simultaneously across spatial and channel dimensions, the bneck structure employs Depthwise Separable Convolutions, decoupling the computational dimensions. It first utilizes depthwise convolutions to extract structural information in the 2D spatial plane (preserving the geometric contour of the tablet), followed by 1 × 1 pointwise convolutions for cross-channel feature fusion. This structural computational decoupling significantly reduces redundant parameters and floating-point operations.

Regarding the selection of non-linear activation functions, considering the efficiency of industrial edge devices during quantized inference, MobileNetV3 introduces the hardware-friendly Hard-swish (h-swish) function to replace the traditional Swish or Sigmoid functions. Its mathematical definition is shown in Equations (1) and (2):

h - swish (x) = x \cdot HardSigmoid (x)

(1)

h - swish (x) = x \cdot \frac{ReLU 6 (x + 3)}{6}

(2)

Employing the ReLU6 truncation function guarantees non-linear expression capabilities while significantly reducing the high computational costs associated with exponential operations. Furthermore, to achieve ultimate low latency, MobileNetV3 reconstructs the time-consuming layers of the network. For instance, it reduces the number of filters in the first convolutional layer from 32 to 16, and significantly simplifies the dense computational layers in the Last Stage. This architectural optimization liberates edge device computing power while maintaining extremely high feature extraction fidelity, laying a solid lightweight foundation for the subsequent integration of tiny object detection modules.

2.3. Efficient Local Attention (ELA)

In the actual production process, samples for tablet visual inspection are usually captured by industrial shutter cameras. Tiny targets like fine cracks or edge breakages occupy extremely few pixels in the image. Coupled with pixel shifts caused by the high-speed motion of the tablets, these high-frequency local features are highly susceptible to being swallowed by complex background noise during network downsampling. When generating attention maps, the attention mechanisms of general models, such as Squeeze-and-Excitation (SE) [19], Convolutional Block Attention Module (CBAM) [20], and Coordinate Attention (CA) [21], typically rely on channel reduction operations, inevitably leading to the permanent loss of fine-grained information in high-dimensional space. To address this pain point, this paper introduces the Efficient Local Attention (ELA) mechanism into the neck network of EMN-net [22] (its structure is shown in Figure 3), aiming to maximize the algorithm’s perception of local key features without sacrificing channel dimensions.

Tailored to the morphological characteristics of the tablets, ELA discards conventional global pooling—which tends to smooth features—and instead employs Strip Pooling operations. Given an input feature map, ELA independently performs pooling projections along the horizontal (X-axis) and vertical (Y-axis) spatial dimensions, generating two 1D feature vectors with shapes H × 1 and 1 × W. This feature encoding strategy maintains a narrow and long receptive field to capture cross-regional, long-range dependencies, effectively shielding out interference from irrelevant background areas and precisely peeling anomalous defect signals away from the tablet background.

After obtaining the feature vectors in the horizontal and vertical directions, ELA uses 1D Convolution instead of traditional 2D convolution for local cross-channel interaction. 1D convolution has far fewer parameters, faster computational speeds, and is better suited for processing such sequential signals. Notably, to address the limitation that traditional Batch Normalization (BN) layers are constrained by batch size, ELA innovatively introduces Group Normalization (GN) for feature distribution alignment [23]. GN breaks free from batch dimension constraints, endowing the model with stronger domain generalization capabilities in complex industrial scenarios with variable lighting and diverse morphologies. Finally, the features processed by 1D convolution and GN generate attention weights via a Sigmoid activation function, which are element-wise multiplied with the original features, guiding the model’s computational power to focus heavily on the extremely tiny, irregular defect areas on the tablet surface.

2.4. Normalized Wasserstein Distance (NWD) Loss

In the baseline architecture of YOLOv8n, the default bounding box regression uses the CIoU (Complete Intersection over Union) loss function [24]. While CIoU performs excellently in general-scale object detection, it exhibits severe algorithmic limitations when processing extremely tiny defects on tablet surfaces. Because fine spots occupy so few pixels in an image, a minute positional deviation between the predicted box and the ground truth box often leads to zero overlap. When this occurs, the IoU-based metric becomes largely ineffective fails, rendering it unable to provide valid gradient information for backpropagation updates, ultimately causing severe training oscillations and high missed detection rates.

To break through this algorithmic bottleneck, this paper introduces a regression loss function based on the Normalized Wasserstein Distance (NWD) in the detection head [25]. The core algorithmic idea of NWD is to escape the absolute intersection constraints of rigid bounding boxes by re-modeling the target bounding boxes as 2D Gaussian distributions [26]. Specifically, a bounding box R = (cx, cy, w, h) is transformed into a Gaussian distribution N(μ, Σ) with a specific probability density, where the mean vector μ = (cx, cy) corresponds to the center coordinates of the box, and the diagonal elements of the covariance matrix Σ, (w/2)² and (h/2)², represent the variances. This smooth probability distribution modeling allows the network to obtain a continuously differentiable metric space even when handling highly irregular, tiny defects.

After completing the Gaussian distribution modeling, the difference between the distributions corresponding to the predicted box and the ground truth box (Na and Nb) can be precisely quantified via the second-order Wasserstein distance (W²₂), which accurately measures the optimal transport cost between the two distributions. To convert this into a similarity metric suitable for deep learning optimization, an exponential normalization operation is introduced. The definition of NWD is shown in Equation (3):

N W D (N_{a}, N_{b}) = \exp (- \frac{W_{2}^{2} (N_{a}, N_{b})}{C})

(3)

where C is a constant related to the scale of the current dataset, used to adjust the distance decay rate. The value range of NWD is strictly mapped to the (0, 1] interval. Based on this metric, the NWD loss function for bounding box regression of tiny defects is defined in Equation (4):

L_{N W D} = 1 - N W D (N_{p}, N_{g})

(4)

Directly replacing the original loss with

L_{N W D}

presents significant algorithmic advantages. Firstly, it offers scale invariance, possessing strong adaptability to sudden size variations in extremely tiny targets. Secondly, it substantially mitigates the gradient vanishing problem; even when there is absolutely no overlap between the predicted and ground truth boxes, the distance metric based on Gaussian distribution still provides smooth and stable gradients to guide continuous convergence, effectively addressing the challenge of missing tiny objects.

3. Experimental Setup and Dataset

3.1. Dataset Construction and Annotation

To verify the effectiveness of the proposed EMN-net algorithm, this paper constructed a tablet surface state detection dataset closely mirroring real industrial production lines. Initially, the dataset was established with 961 baseline images. Subsequently, to systematically evaluate and address potential optical and physical robustness bottlenecks in industrial edge environments, the dataset was expanded with 534 additional raw samples deliberately captured under extreme and sub-optimal conditions. As shown in Figure 4, these representative instances encompass five categories of industrial challenges: severe overexposure and specular reflections from powder dust; extreme low contrast simulating light source degradation; high-speed motion blur; incomplete target capture at the boundary of the field of view; and complex background interference from the conveyor belt. By integrating these challenging samples, the model is forced to decouple genuine defect features from environmental noise, effectively compensating for the inherent vulnerabilities of lightweight backbones. Following controlled data augmentation, a total of 3086 high-resolution images (original resolution of 1920 × 1080 pixels, preserving fine textures and defect features) were generated for training and evaluation. All images were captured in real-time on the production line using the industrial camera built into the tablet press. Lighting conditions were provided by the equipment’s built-in light source to ensure imaging consistency. For training and rigorous evaluation, the images were randomly split into a training set (2160 images), a validation set (308 images), and a true held-out test set (618 images), corresponding to an approximate ratio of 7:1:2. Crucially, the test set was curated from different production days and batches to strictly prevent data leakage and evaluate the model’s performance on unmapped samples.

The open-source DFLABEL tool was used for image annotation. The annotation focused on tablet surface quality, categorizing three classes: Crack (physical damages like breaks or chips), Contamination (foreign matter adhesion or discoloration), and Good (intact, no visible defects). The “Good” category is explicitly treated as an object detection class rather than being handled through classification or “absence-of-defect” logic. This provides a positive confirmation mechanism, ensuring that every tablet is successfully captured and analyzed, strictly aligning with pharmaceutical quality assurance standards. Furthermore, detecting “Good” tablets serves as a spatial anchor to suppress background noise—such as metallic reflections and conveyor dust—thereby reducing false positives. A total of 4388 annotated instances were extracted. Specifically, the instances comprise 872 Cracks, 1321Contamination, and 2195 Good samples, ensuring a comprehensive representation of both defective and non-defective surface states. Examples of the annotated samples are shown in Figure 5. Furthermore, to mitigate the impact of class imbalance (due to the scarcity of defective samples) on model parameter updating, an oversampling strategy was employed, supplemented by data augmentation methods such as random brightness adjustment, contrast transformation, and horizontal flipping to enhance the algorithm’s generalization capabilities under complex lighting.

3.2. Simulation of Industrial Environments

The working environment of actual continuous pharmaceutical production lines is highly complex. The image acquisition process is inevitably disturbed by equipment vibrations, lighting fluctuations, and lens smudges, leading to image quality degradation that affects the stability and accuracy of the detection algorithm. To simulate image degradation in real industrial scenarios and verify the robustness of the proposed algorithm against low-quality inputs, various artificial degradations were applied to the original validation set:

(1): Motion blur: A linear motion kernel with random directions and a length of 10~20 pixels was set to simulate trailing effects caused by high-speed conveyor belt vibrations or camera shake;
(2): Gaussian blur: A Gaussian kernel with a radius of 3~8 pixels was applied to simulate image defocusing caused by inaccurate camera focus;
(3): Gaussian noise: White Gaussian noise with a mean of 0 and a variance of 0.005~0.01 was added to simulate sensor thermal noise or graininess under insufficient light;
(4): Image shift: Random translations of 1~3 pixels were applied to simulate minor positional shifts of targets within the Field of View (FoV).

Testing the model on these degraded datasets, where edge gradients and clarity are compromised, comprehensively examines the deep learning model’s algorithmic baseline for resisting environmental interference.

3.3. Evaluation Metrics

To comprehensively evaluate the overall performance of the EMN-net model in tablet surface state detection tasks, this paper establishes an evaluation system from three dimensions: algorithmic complexity, inference efficiency, and detection accuracy.

Algorithmic complexity and inference efficiency are primarily measured using the number of Parameters, Giga Floating Point Operations (GFLOPs), and Frames Per Second (FPS). Parameters reflect spatial complexity; GFLOPs denote the computational volume required for a single forward inference; FPS indicates whether the algorithm meets the extreme detection tempo of continuous production lines when deployed on edge devices.

For detection accuracy, Precision (P), Recall (R), and mean Average Precision (mAP) are utilized as the primary evaluation metrics. Their computational definitions are shown in Equation (5):

P P = \frac{T P}{T P + F P}, R = \frac{T P}{T P + F N}

(5)

where TP represents True Positives, FP represents False Positives, and FN represents False Negatives. Based on this, the Average Precision (AP) is calculated to further comprehensively evaluate the model’s recognition ability for each tablet state category. The mAP is the mean of all category APs, as shown in Equation (6):

A P = \int P (R) d R, m A P = \frac{1}{C} \sum A P_{i}

(6)

where C is the total number of categories (in this paper, C = 3). This paper focuses heavily on mAP50 (mAP at an IoU threshold of 0.5) and the stricter mAP50:0.95 (average mAP over IoU thresholds from 0.5 to 0.95 with a step of 0.05) to comprehensively and objectively quantify the localization accuracy for tiny defect features.

3.4. Experimental Configuration

All experiments in this study were conducted in a unified software and hardware environment to ensure the reproducibility and fairness of the comparative results. The hardware platform is a Mechanical Revolution Kuangshi 16 Pro laptop (MECHREVO, Beijing, China), equipped with an Intel Core i9-12900HX processor (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The system is further configured with 16 GB of Samsung DDR4 RAM (Samsung Electronics, Suwon, Republic of Korea) and a 1 TB Kingston SSD (Kingston Technology, Fountain Valley, CA, USA) to ensure stable data I/O performance during large-scale model training. The software environment was based on the Windows 11 operating system, using Python 3.8, and the algorithms were accelerated using the PyTorch2.3.1 deep learning framework coupled with CUDA 12.0.

During the training phase, all models were trained from scratch without using pre-trained weights on the custom tablet dataset. Input images were uniformly resized to 632 × 300 pixels to balance detection accuracy and VRAM overhead. The total number of network epochs was set to 100, and the batch size was set to 16. Stochastic Gradient Descent (SGD) was selected as the optimizer, with an initial learning rate of 0.01, a momentum factor of 0.937, and a weight decay coefficient of 0.0005. After training, the algorithm’s performance was strictly evaluated on an independent validation set.

4. Experimental Results and Discussion

4.1. Ablation Study

To deeply analyze the algorithmic contributions of each improved module in the EMN-net architecture, ablation studies with progressive integration were designed on the custom dataset. As shown in Table 1, the experiment used the original YOLOv8n as the baseline model and progressively introduced the lightweight backbone, attention mechanism, and loss function optimization.

First, after introducing the MobileNetV3 backbone into the baseline network (Table 1, row 2), the computational complexity of the model dropped substantially, with GFLOPs decreasing from 8.7 to 4.2. Simultaneously, detection throughput increased significantly from 95 f/s to 125 f/s. From an algorithmic efficiency perspective, depthwise separable convolutions successfully stripped away computational redundancy. While substantially reducing computational overhead, the mAP50 maintained a high level of 96.8%, validating the high fidelity of this lightweight strategy in feature representation.

Secondly, integrating the ELA attention mechanism alone improved the mAP50 by 0.8 percentage points. At the feature extraction level, ELA discards the channel reduction operations that easily cause information loss in traditional attention mechanisms. Instead, through 1D convolution and strip pooling, it captures long-range spatial dependencies at an extremely low computational cost (adding only 0.2 GFLOPs), significantly enhancing the network’s response to high-frequency detailed features like micro-cracks.

Furthermore, the introduction of the NWD loss function brought a 1.7 percentage point leap in Recall. Regarding underlying optimization mechanisms, by mapping bounding boxes to Gaussian distributions, NWD effectively overcomes the gradient vanishing problem generated by traditional CIoU when tiny object bounding boxes do not overlap, providing continuous and smooth gradient guidance for the model without adding any computational burden during the inference stage.

As illustrated in Figure 6, while the overall training metrics exhibit a clear converging trend, noticeable local fluctuations can be observed in the curves during the training process. It is important to note that this volatility is primarily an inevitable artifact of the model’s near-saturation performance, rather than an indication of training instability. Because the proposed EMN-net rapidly achieves an high detection accuracy (approaching 100%), the remaining margin for improvement becomes extremely narrow. At this near-optimal stage, the numerical scale of the evaluation metrics becomes highly sensitive. Consequently, even negligible training perturbations—such as natural variations in mini-batch data distributions or the occasional sampling of highly challenging, ambiguous hard examples—can induce visually amplified oscillations in the metric curves. Furthermore, when the baseline precision and recall are inherently high, the remaining loss gradients are predominantly driven by a very small number of edge cases. Nevertheless, as clearly indicated by the smoothed trend lines, these transient micro-fluctuations do not compromise the global optimization trajectory. The network robustly overcomes these local disturbances and ultimately converges to a highly reliable and optimal state.

Ultimately, the EMN-net model (Table 1, row 8), which seamlessly integrates all three modules, achieved a global optimum in algorithmic performance. Its mAP50 increased to 97.8%, and under the stringent condition of a computational complexity of merely 4.4 GFLOPs, it achieved an extremely high throughput of 118 f/s.

To systematically evaluate the individual and synergistic contributions of the proposed modules, comprehensive ablation experiments were conducted. Starting with the standard YOLOv8n as the baseline, we incrementally integrated the lightweight MobileNetV3 backbone, the Efficient Local Attention (ELA) mechanism, and the Normalized Wasserstein Distance (NWD) loss function. The overarching objective of these step-by-step modifications is to optimize the network’s capability in extracting weak features of micro-defects while strictly maintaining a lightweight architecture suitable for real-time industrial deployment. As the overarching trends suggest, the progressive integration of these advanced components yields steady and cumulative improvements in detection accuracy, particularly for challenging cases such as low-contrast cracks and tiny contamination spots. Ultimately, the complete EMN-net model, which harmoniously combines all three enhancements, demonstrates the most superior comprehensive performance. The detailed quantitative comparison of these ablation configurations, including core evaluation metrics such as Precision, Recall, and mean Average Precision (mAP), is systematically summarized in Table 1.

The change curves of the ablation loss functions are presented in Figure 7. An analysis of these curves reveals that the proposed EMN-net demonstrates the most stable convergence performance throughout the entire training process. Compared to the baseline YOLOv8n and other model variants, the loss curve of EMN-net consistently resides at the lowest level, indicating higher learning efficiency and faster parameter optimization. Specifically, during the initial phase (0–40 epochs), the loss values of all models decrease rapidly, reflecting a significant performance improvement at the beginning of the learning process. However, as the training progresses beyond 50 epochs, the advantages of EMN-net become increasingly pronounced. While the loss reduction of other models gradually plateaus, the loss of EMN-net continues to decline steadily. This continuous downward trend demonstrates that EMN-net effectively maintains strong generalization capabilities in the later stages of training, successfully mitigating the risk of overfitting. Furthermore, it is evident from the Figure 7 that EMN-net exhibits the lowest fluctuation volatility throughout the entire training period, verifying that its optimization process is highly stable and its final detection predictions are more reliable.

4.2. Comparison with State-of-the-Art Algorithms

To avoid selection bias and objectively evaluate the algorithmic superiority of EMN-net in the industrial inspection domain, this paper selected a representative group of mainstream algorithms, including YOLOv5n, YOLOv7, and the recently released YOLOv10-n, for horizontal comparative evaluation. To guarantee a rigorously fair comparison, all models were trained entirely from scratch (without pre-trained weights) on the expanded 3086-image dataset under identical hardware settings and epoch lengths. Furthermore, for industrial edge deployment, the inference throughput (FPS) for all models was strictly measured with a batch size of 1, post a 100-frame hardware warm-up, and explicitly includes the Non-Maximum Suppression (NMS) post-processing latency (except for YOLOv10, which intrinsically eliminates NMS). The performance metrics, including precision across different defect sizes based on COCO metrics, are detailed in Table 2.

Experimental data shows that while the early lightweight algorithm YOLOv5n possesses a high throughput of 140 f/s, it suffers from severe feature loss when handling tiny tablet defects due to its relatively simple feature fusion network, resulting in an mAP50 of only 93.8%, which falls significantly short of industrial quality inspection requirements. Conversely, the YOLOv7 and YOLOv8s algorithms attempt to boost performance by significantly increasing network depth and channel width. For example, YOLOv8s improves the mAP50 to 98.2%, but at the expense of algorithmic efficiency: its computational overhead increases to 28.6 GFLOPs, causing the throughput to drop to 68 f/s, making it difficult to cope with the extremely high detection rhythm of continuous production lines.

In contrast, the proposed EMN-net algorithm successfully broke through the aforementioned performance bottlenecks. Through meticulous algorithmic structural design, EMN-net achieved a high detection accuracy of 97.8% (mAP50) under a computational overhead of only 4.4 GFLOPs (roughly half of the baseline YOLOv8n), even surpassing YOLOv8s, which has a computational complexity more than six times larger. This result strongly indicates that customized algorithmic optimization tailored to the characteristics of tiny targets significantly outperforms direct network expansion in terms of both computational efficiency and detection accuracy.

The experimental results reveal a critical limitation in conventional object detection optimization: the failure of “dimensional stacking” for micro-defects. When scaling the baseline YOLOv8n to heavier architectures like YOLOv8s or YOLOv7 [27], the computational overhead increases to 28.6 and 104.7 GFLOPs, respectively. This significantly reduces inference throughput (FPS) and severely restricts the feasibility of real-time deployment on industrial edge devices [15]. More critically, these heavy architectures fall into a “feature over-smoothing” trap. During the deep downsampling process, the weak, high-frequency gradient information of tiny defects (e.g., micro-cracks or spots) is diluted and swallowed by the dominant normal tablet background [16].

In contrast, EMN-net achieves Pareto optimality in both efficiency and accuracy. Compared to the heavy-weight YOLOv7, EMN-net requires less than 1/15th of the parameter count while maintaining an edge in overall precision. Even when compared with the newly introduced YOLOv10-n, EMN-net maintains a significant lead in fine-grained detection, with its AP_Small (88.6%) exceeding YOLOv10-n (84.2%) by 4.4%. First, its MobileNetV3 backbone [27] employs depthwise separable convolutions to decouple spatial and channel computations, effectively stripping away redundant parameters without distorting spatial features. Second, the ELA mechanism [22] utilizes 1D convolution and strip pooling to capture long-range spatial dependencies. Instead of employing destructive channel reduction like traditional attention mechanisms (e.g., SE [19] or CBAM [20]), ELA forces the computational resources to anchor onto the high-frequency regions of micro-defects. This explains why the extremely lightweight EMN-net can generate tighter bounding boxes and higher confidence scores than much larger YOLO variants.

To further visualize the classification and localization performance of the proposed EMN-net, the Precision-Recall (PR) curves and the absolute confusion matrix on the independent test set are illustrated in Figure 8a and Figure 8b, respectively. As shown in Figure 8a, the PR curves for all three classes (Crack, Contamination, and Good) maintain a high area-under-curve (AUC) value, with the mean Average Precision (mAP) reaching 97.8%, indicating that the model achieves stable detection performance even at high recall levels. Furthermore, the confusion matrix in Figure 8b provides a detailed breakdown of the model’s predictions. The high diagonal values—representing correct classifications of 435 Good tablets, 165 Cracks, and 256 Contamination instances—validate the effectiveness of the ELA mechanism and NWD loss in accurately capturing fine-grained local textures while suppressing environmental noise.

To intuitively demonstrate the superiority of the proposed algorithm in practical inspection scenarios, Figure 9 presents a qualitative visual comparison of the detection results among YOLOv5n, YOLOv7, YOLOv8n, and EMN-net. For extremely tiny defects such as the contamination spot (first row), both YOLOv5n and the baseline YOLOv8n fail to capture the weak local features, erroneously classifying the defective tablet as intact (“good”). Although YOLOv7 successfully detects the spot, its confidence score is relatively low (0.82). When dealing with low-contrast micro-cracks (second row), while all models successfully identify the defect, YOLOv5n yields a lower confidence score of 0.78. Moreover, the bounding boxes generated by these conventional YOLO variants are noticeably oversized and fail to tightly conform to the actual irregular contour of the crack. For the completely intact tablet (third row), all models perform adequately, but EMN-net outputs the highest confidence (0.99). In stark contrast to the baselines, benefiting from the Efficient Local Attention (ELA) mechanism and the Normalized Wasserstein Distance (NWD) loss, the proposed EMN-net exhibits high sensitivity and localization precision for tiny targets. It not only successfully detects all challenging samples with minimal misclassifications but also generates highly accurate bounding boxes with significantly higher confidence scores (e.g., 0.96 for the contamination spot and 0.97 for the micro-crack), fully validating its robust visual perception capabilities in complex industrial environments.

4.3. Algorithmic Robustness Under Degraded Conditions

In actual industrial deployment, image acquisition is inevitably disturbed by equipment vibrations and environmental lighting fluctuations, posing a stringent challenge to detection algorithms. To verify the stability of the EMN-net algorithm, generalization tests were conducted across four typical industrial degradations (Motion Blur, Gaussian Blur, Gaussian Noise, and Image Shift) at three levels of severity (Mild, Moderate, and Severe).

The testing results across different severity levels are detailed in Table 3. Under mild degradations, both YOLOv8n and EMN-net maintain high accuracy. However, as the degradation severity increases—for instance, under severe motion blur (20 px) or severe Gaussian noise—the baseline YOLOv8n experiences a significant performance decline, with its mAP50 dropping to 89.6% and 87.2%, respectively. This is primarily because extreme blur and noise destroy the high-frequency edge gradients of micro-defects. In contrast, EMN-net demonstrates significantly stronger anti-interference capabilities, maintaining an mAP50 of 95.4% and 94.1% under identical severe conditions.

The superior robustness of EMN-net under degraded conditions is grounded in its architectural optimization. First, by integrating Group Normalization (GN) into the ELA module, EMN-net computes the mean and variance within channel groups independently of the batch size, effectively enhancing the model’s resistance to non-i.i.d. noise caused by lighting fluctuations. Second, severe motion blur often causes the predicted and ground-truth boxes of tiny defects to have zero overlap, leading to gradient vanishing in traditional IoU-based losses. Under these conditions, the NWD loss yields continuous, non-zero gradients by modeling boxes as Gaussian distributions, ensuring stable training and robust localization even under severe feature degradation. These results substantiate the reliability of EMN-net for pharmaceutical quality assurance in complex real-world scenarios.

5. Conclusions

As a core component of the pharmaceutical quality assurance system, tablet surface defect detection is directly tied to medication safety and corporate reputation. In recent years, with the improved automation levels in the pharmaceutical industry and continually accelerating production line speeds, traditional manual inspection methods can no longer meet real-time and accuracy demands. Especially in high-speed production scenarios, where tablet targets are small, defect types are diverse, and background interference is complex, existing machine vision systems often cause missed detections or false alarms due to insufficient algorithmic capabilities, failing to meet industrial production requirements. Currently, some large pharmaceutical enterprises have introduced visual inspection systems based on deep learning, but the high computational hardware costs and algorithm maintenance expenses make it difficult for small and medium-sized pharmaceutical companies to widely adopt such applications.

This paper proposes an improved YOLOv8 algorithm that fuses a lightweight backbone network, an efficient attention mechanism, and tiny-object optimization. By adapting the model for edge computing devices, it provides the pharmaceutical industry with a low-cost, high-precision, real-time defect detection solution. Based on YOLOv8n, the algorithm undergoes deep customized improvements across three algorithmic dimensions: introducing MobileNetV3 to reconstruct the original backbone network to strip computational redundancy, integrating the ELA attention mechanism in the neck network to enhance extraction capabilities for local defect features, and employing the NWD loss function to optimize gradient propagation and localization effects for tiny targets. Experimental results show that, compared to the baseline model, the proposed EMN-net achieves significant breakthroughs across multiple metrics. The mAP50 reached 97.8%, an increase of 1.9 percentage points over the baseline. Simultaneously, computational complexity was significantly reduced to 4.4 GFLOPs, and inference throughput increased to 118 f/s, optimally balancing the dual advantages of detection accuracy and algorithmic execution efficiency. Furthermore, the simulated motion blur and data augmentation strategies utilized in this study effectively validated the domain generalization capability of the algorithm, ensuring that it maintains highly stable detection performance when facing common disturbances like illumination fluctuations and equipment vibrations on actual production lines, thereby providing reliable theoretical and data support for the algorithm’s industrial deployment.

In practical engineering applications, the EMN-net algorithm proposed in this paper can be efficiently integrated with low-cost industrial shutter cameras and embedded edge computing devices (e.g., NVIDIA Jetson Orin NX) on pharmaceutical production lines. This allows for the construction of a prototype system oriented towards real-time pharmaceutical quality monitoring, forming a closed-loop control process of “image acquisition—online detection—automatic sorting.” This not only helps mitigate medical safety risks and economic losses caused by defective pharmaceuticals entering the market but also significantly reduces manual re-inspection costs, improving the overall yield and efficiency of the production line. Concurrently, this algorithmic solution boasts excellent scalability and can be further extended to quality inspection tasks for other pharmaceutical dosage forms, such as capsules and blister packaging, providing robust algorithmic technical support for the intelligent upgrading of the pharmaceutical industry.

Author Contributions

Conceptualization, J.A. and L.Z.; methodology, J.A. and L.Z.; software, J.A., X.Z., Z.Z. and H.W.; validation, J.A., X.Z., Z.Z. and H.W.; formal analysis, J.A.; investigation, J.A., Z.Z. and H.W.; resources, L.Z. and D.L.; data curation, J.A. and X.Z.; writing—original draft preparation, J.A.; writing—review and editing, J.A., L.Z. and D.L.; visualization, J.A.; supervision, L.Z. and D.L.; project administration, L.Z.; funding acquisition, L.Z. and D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Scientific Research Foundation supported by Guilin University of technology (Project No.: GUTQDJJ2018091), and the Innovation Project of Guangxi Graduate Education (Project No.: YCSW2025411).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and commercial confidentiality restrictions related to the self-built industrial pharmaceutical dataset.

Acknowledgments

The authors would like to sincerely thank Guilin University of Technology for providing the essential technical support and experimental environment during this research. Furthermore, we express our gratitude to the anonymous reviewers and editors for their constructive comments, which have significantly contributed to the improvement of this paper. During the preparation of this manuscript, the authors used Gemini (Gemini 1.5 Pro) for the purposes of language polishing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
CNN	Convolutional Neural Network
ELA	Efficient Local Attention
NWD	Normalized Wasserstein Distance
EMN	Efficient MobileNet-based Network
mAP	mean Average Precision
FPS	Frames Per Second
IoU	Intersection over Union
TP	True Positive
FP	False Positive
FN	False Negative

References

Domokos, A.; Nagy, B.; Gyürkés, M.; Farkas, A.; Pataki, H.; Madarász, L.; Nagy, Z.K. Integrated Continuous Pharmaceutical Technologies—A Review. Org. Process Res. Dev. 2021, 25, 721–739. [Google Scholar] [CrossRef]
Precedence Research. Pharmaceutical Market Size, Share and Trends 2025 to 2034. Available online: https://www.precedenceresearch.com/pharmaceutical-market (accessed on 4 April 2026).
Syntegon Technology. TPR 500 and TPR 700: Tablet Press for High Volume Production. Available online: https://www.syntegon.com/solutions/pharma/tablet-press (accessed on 4 April 2026).
Sabry, I.; Mourad, A.-H.I.; Thekkuden, D.T. Study on underwater friction stir welded AA 2024-T3 pipes using machine learning algorithms. In Proceedings of the ASME 2021 International Mechanical Engineering Congress and Exposition, Virtual, Online, 1–5 November 2021; Volume 2A, p. V02AT02A033. [Google Scholar]
Sabry, I.; Mourad, A.-H.I.; Elwakil, M. Hybrid LSTM–dense neural network for accurate corrosion rate prediction in friction stir welded flange joints. Eng. Appl. Artif. Intell. 2026, 167, 113697. [Google Scholar] [CrossRef]
Sabry, I.; El-Zathry, N.E.; Mahamood, R.M.; Akinlabi, S.; Woo, W.L. Comparative study of FSW and TIG welding of AA3003 aluminium flange joints under varying tool geometries and rotational speeds. Weld. World 2026, 70, 763–780. [Google Scholar] [CrossRef]
Sabry, I.; Elwakil, M. Hybrid FEM–ML framework for multi-objective optimization of mechanical properties and surface quality in dissimilar AA6061-T6/AA6082-T6 friction stir welding. Mater. Today Commun. 2026, 52, 115197. [Google Scholar] [CrossRef]
Sabri, A.H.; Hallam, C.N.; Gabbott, I.P. Understanding tablet defects in commercial manufacture and transfer. J. Drug Deliv. Sci. Technol. 2018, 46, 1–6. [Google Scholar] [CrossRef]
Körber Pharma. Increase Quality and Cut Costs: AI-Powered Pharmaceutical Inspection. Available online: https://www.koerber-pharma.com/en/blog/increase-quality-and-cut-costs-ai-powered-pharmaceutical-inspection (accessed on 4 April 2026).
Czimmermann, T.; Ciuti, G.; Milazzo, M.; Chiurazzi, M.; Roccella, S.; Oddo, C.M.; Dario, P. Visual-based defect detection and classification approaches for industrial applications—A survey. Sensors 2020, 20, 1459. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Huang, J.; Liu, J.; Liu, J. Improved U2Net-Based Surface Defect Detection Method for Blister Tablets. Algorithms 2024, 17, 429. [Google Scholar] [CrossRef]
Ren, R.; Hung, T.; Tan, K.C. A generic deep-learning-based approach for automated surface inspection. IEEE Trans. Cybern. 2018, 48, 929–940. [Google Scholar] [CrossRef] [PubMed]
Vijayakumar, A.; Vairavasundaram, S.; Koilraj, J.A.S.; Rajappa, M.; Kotecha, K.; Kulkarni, A. Real-time visual intelligence for defect detection in pharmaceutical packaging. Sci. Rep. 2024, 14, 18811. [Google Scholar] [CrossRef] [PubMed]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 30 March 2026).
Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
Kos, A.; Belter, D.; Majek, K. Deep learning for small and tiny object detection: A survey. Pomiary Autom. Robot. 2023, 27, 85–94. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF ICCV, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF CVPR, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF CVPR, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar]
Xu, W.; Wan, Y. ELA: Efficient local attention for deep convolutional neural networks. arXiv 2024, arXiv:2403.01123. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group normalization. Int. J. Comput. Vis. 2018, 128, 742–755. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar] [CrossRef]
Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF CVPR, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]

Figure 1. Overall architecture of the proposed EMN-net for pharmaceutical tablet defect detection.

Figure 2. Schematic diagram of the bneck structure in MobileNetV3.

Figure 3. The internal architecture of the Efficient Local Attention (ELA) module.

Figure 4. Representative samples of challenging industrial conditions in the expanded dataset.

Figure 5. Examples of annotated tablet surface defect dataset, including cracks, contamination, and intact tablets.

Figure 6. Model training evaluation results.

Figure 7. Change curve of ablation loss function.

Figure 8. Performance visualization of the proposed EMN-net: (a) Precision-Recall (PR) curves; (b) Confusion matrix with absolute counts.

Figure 9. Comparison of training results between YOLO variants and EMN-Net.

Table 1. Comparison of ablation experiment results.

Model	MNetV3	ELA	NWD	P (%)	R (%)	mAP50 (%)	AP_Crack (%)	AP_Contam. (%)	AP_Good (%)
YOLOv8n	-	-	-	94.5	92.1	95.3 ± 0.6	93.5 ± 0.8	94.8 ± 0.7	97.6 ± 0.3
Variant 1	√	-	-	95.2	93.4	96.1	94.2	95.8	98.3
Variant 2	-	√	-	95.5	94.6	96.8	95.3	96.5	98.6
Variant 3	-	-	√	95.8	94.2	96.5	95.6	95.4	98.5
Variant 4	√	√	-	96.3	95.5	97.1	95.5	96.8	99.0
Variant 5	√	-	√	96.0	95.2	96.9	95.8	96.2	98.7
Variant 6	-	√	√	96.8	96.1	97.4	96.1	97.1	99.0
EMN-net	√	√	√	97.6	97.2	97.8 ± 0.2	96.5 ± 0.3	97.6 ± 0.3	99.3 ± 0.1

Note: “√” indicates the module is integrated, and “-” indicates it is not, “P” indicates Precision, “R” indicates Recall. Bold values indicate the best performance in each column.

Table 2. Performance comparison with state-of-the-art algorithms on the expanded tablet dataset.

Model	Params (M)	mAP50 (%)	AP_Small (%)	AP_Med (%)	AP_Large (%)	FPS (f/s)
YOLOv5n	1.86	91.8	76.5	90.2	96.4	140
YOLOv7	36.9	95.8	79.4	94.5	98.2	45
YOLOv10-n	2.69	96.4	84.2	94.8	98.5	125
YOLOv8n	3.01	95.3	81.4	93.2	98.1	95
YOLOv8s	11.1	98.2	87.1	96.1	98.9	68
EMN-net	2.36	97.8	88.6	96.5	98.8	118

Table 3. Severity-wise robustness evaluation (mAP50%) under various simulated industrial degradations.

Degradation Type	Severity Level	YOLOv8n (Base)	EMN-Net (Ours)
None (Baseline)	-	95.3	97.8
Motion Blur	Mild (10 px)	93.1	96.8
-	Moderate (15 px)	91.4	96.2
-	Severe (20 px)	89.6	95.4
Gaussian Blur	Mild (Radius 3)	94.2	97.1
-	Severe (Radius 8)	90.5	95.8
Gaussian Noise	Mild (Var 0.005)	92.8	96.3
-	Severe (Var 0.01)	87.2	94.1
Image Shift	Random (1–3 px)	94.5	97.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

An, J.; Zhou, L.; Liu, D.; Zheng, X.; Zhou, Z.; Wang, H. EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets. Algorithms 2026, 19, 438. https://doi.org/10.3390/a19060438

AMA Style

An J, Zhou L, Liu D, Zheng X, Zhou Z, Wang H. EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets. Algorithms. 2026; 19(6):438. https://doi.org/10.3390/a19060438

Chicago/Turabian Style

An, Jiaxi, Lujing Zhou, Dianting Liu, Xinpeng Zheng, Zhiyi Zhou, and Heng Wang. 2026. "EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets" Algorithms 19, no. 6: 438. https://doi.org/10.3390/a19060438

APA Style

An, J., Zhou, L., Liu, D., Zheng, X., Zhou, Z., & Wang, H. (2026). EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets. Algorithms, 19(6), 438. https://doi.org/10.3390/a19060438

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EMN-Net: A Lightweight YOLOv8-Based Model for Real-Time Surface Defect Detection of Pharmaceutical Tablets

Abstract

1. Introduction

2. Proposed EMN-Net Algorithm

2.1. Overall Architecture of EMN-Net

2.2. Lightweight Backbone: MobileNetV3

2.3. Efficient Local Attention (ELA)

2.4. Normalized Wasserstein Distance (NWD) Loss

3. Experimental Setup and Dataset

3.1. Dataset Construction and Annotation

3.2. Simulation of Industrial Environments

3.3. Evaluation Metrics

3.4. Experimental Configuration

4. Experimental Results and Discussion

4.1. Ablation Study

4.2. Comparison with State-of-the-Art Algorithms

4.3. Algorithmic Robustness Under Degraded Conditions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI