A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection

Chen, Rui; Zheng, Jianhua; Long, Wufa; Chen, Haolin; Luo, Zhijie

doi:10.3390/info16110973

Open AccessArticle

A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection

by

Rui Chen

^†

,

Jianhua Zheng

^†

,

Wufa Long

,

Haolin Chen

and

Zhijie Luo

^*

College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(11), 973; https://doi.org/10.3390/info16110973

Submission received: 4 September 2025 / Revised: 30 October 2025 / Accepted: 5 November 2025 / Published: 11 November 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Electrowetting display technology is increasingly prevalent in modern microfluidic and electronic paper applications, yet it remains susceptible to micro-scale defects such as screen burn-in, charge trapping, and pixel wall deformation. These defects often exhibit low contrast, irregular morphology, and scale diversity, posing significant challenges for conventional detection methods. To address these issues, we propose ASAF-Net, a novel lightweight network incorporating adaptive attention mechanisms for real-time electrowetting defect detection. Our approach integrates three key innovations: a Multi-scale Partial Convolution Fusion Attention module that enhances feature representation with reduced computational cost through channel-wise partitioning; an Adaptive Scale Attention Fusion Pyramid that introduces a dedicated P2 layer for micron-level defect detection across four hierarchical scales; and a Shape-IoU loss function that improves localization accuracy for irregular small targets. Evaluated on a custom electrowetting defect dataset comprising seven categories, ASAF-Net achieves a state-of-the-art mAP@0.5 of 0.982 with a miss detection rate of only 1.5%, while operating at 112 FPS with just 9.82 M parameters. Comparative experiments demonstrate its superiority over existing models such as YOLOv8 and RT-DETR, particularly in detecting challenging defects like charge trapping. This work provides an efficient and practical solution for high-precision real-time quality inspection in electrowetting display manufacturing.

Keywords:

electrowetting defect detection; adaptive scale attention fusion; yolo; multi-scale partial convolution; defect detection

1. Introduction

Electrowetting (EW) technology enables precise and controllable droplet manipulation by modulating the liquid-solid interfacial tension with an applied electric field. This capability provides a critical foundation for applications in microfluidic chips and electronic paper displays, among others. Its fundamental principles originate from Lippmann’s early theoretical work [1], which was later experimentally validated and developed in the late 20th century [2]. The growing demand for efficient, rapid, and reversible droplet manipulation has driven the widespread adoption of EW technology across numerous domains. A key development was the introduction of Electrowetting on Dielectric (EWOD) technology, which significantly improved system stability by using a dielectric layer to prevent electrolysis [3]. As modern display technology evolves to demand greater flexibility and dynamic performance, EW displays have emerged as a cutting-edge solution, offering millisecond response times and microwatt-level power consumption [4]. EWOD technology has thus laid the foundation for contemporary EW applications, including digital microfluidics, lab-on-a-chip systems, and advanced display technologies.

EW technology demonstrates significant advantages in applications such as microfluidic chips, e-paper displays, and tunable lenses [5], characterized by its rapid response, low energy consumption, and high operational precision [6,7,8]. By modulating the electrode voltage to alter liquid surface tension, precise control over droplet morphology and position is achieved. However, this technology still faces numerous challenges in practical implementation. During the manufacturing of EW displays, three typical defects are prone to occur: Screen Burn-in: display ghosting caused by residual charges; Charge Trapping: driving failure induced by charge accumulation within the dielectric layer; and Pixel Wall Deformation: droplet positioning deviation resulting from the collapse of microstructures. The size of such defects is often below 50 × 50 pixels (micrometer scale), and they exhibit dynamic deformation characteristics under voltage driving. This poses bottlenecks for traditional machine vision methods, including small-target feature loss and sensitivity to background interference. In complex scenarios, traditional techniques like Haar cascade classifiers exhibit unsatisfactory performance in both detection accuracy and speed, failing to meet real-time application requirements [9].

Advances in computer vision have led to the widespread adoption of deep learning-based object detection algorithms across numerous domains. These object detection algorithms are broadly categorized into single-stage and two-stage detectors. Single-stage detectors, represented by the YOLO series [10,11], while common two-stage detectors include Faster-RCNN [12], Fast R-CNN [13]. Other single-stage detectors include SSD [14]. However, two-stage detectors suffer from high computational complexity, hindering their suitability for real-time production line applications. Although CNN-based detectors (e.g., YOLOv5 [15]) have improved automation levels, they remain challenged by two key issues in the EWD context: (1) significant scale diversity among defects, and (2) low contrast between defects and the background, which leads to feature confusion. However, existing multi-scale fusion methods present specific limitations. The Adaptive Spatial Feature Fusion (ASFF) module [16] effectively fuses multi-scale features but does not reduce model complexity. Conversely, the Weighted Bi-directional Feature Pyramid Network (BiFPN) [17] enables efficient multi-level semantic complementarity but suffers from small-target feature loss due to its repeated downsampling operations.

Multi-scale convolutional fusion enhances object detection performance by extracting features at varying resolutions using distinct convolutional kernels, thereby improving model robustness and accuracy. However, conventional multi-scale fusion approaches exhibit inherent limitations. They typically process identical feature maps across multiple scales, substantially increasing computational complexity and resource consumption. This process necessitates extensive parameter tuning and higher computational overhead, which can prolong both training and inference times. Furthermore, the frequent memory access required for multi-scale processed feature maps can create a bottleneck, compromising operational efficiency.

Electrowetting defects frequently exhibit irregular morphologies. While YOLOv8’s native CIoU loss function enhances localization accuracy for standard objects by incorporating distance and aspect ratio penalties [18], it struggles to capture irregular shapes and extreme aspect ratios accurately, resulting in decreased bounding box precision for such defects. These issues collectively make electrowetting defect detection particularly challenging.

Therefore, to address the aforementioned challenges, the ASAF-Net proposed in this study exhibits superior performance in electrowetting defect detection. The main contributions of this work are as follows:

•: We propose the Multi-scale Partial convolutional Fusion with Attention mechanism (MPFA): MPFA integrates multi-scale partial convolutions with attention mechanisms to enhance electrowetting defect detection across scales and amplify feature representations of micro-defects. By leveraging the inherent properties of partial convolutions—specifically, base convolution sharing and a channel splitting strategy—we significantly reduce convolutional channel counts during multi-scale feature map processing. This design achieves marked reductions in both computational load and parameter volume.
•: We propose the Adaptive Scale Attention Fusion Pyramid (ASAF-Pyramid): ASAF-Pyramid introduces a finer-scale detection layer (P2) tailored for electrowetting targets. This framework constructs a cross-tier adaptive weighting mechanism that effectively incorporates features from four distinct scales, thereby optimizing cross-scale defect localization capabilities.
•: Furthermore, building upon the aforementioned foundation, we replaced the original CIoU loss function with Shape-IoU bounding box regression [19], enhancing the model’s capability to detect irregular, small-target electrowetting defects.

The Adaptive Scale Attention Fusion Pyramid Network we constructed demonstrates efficient detection performance on our custom-developed electrowetting defect dataset while exhibiting lower computational loads and parameter count.

2. Related Work

2.1. Object Detection Algorithms

The success of AlexNet [20] marked the beginning of the modern CNN era, during which two-stage detectors rapidly became the dominant paradigm in object detection. R-CNN [21] pioneered the use of region proposals with CNN-based feature extraction. However, its process was computationally redundant, as it performed a separate forward pass for each proposed region. Fast R-CNN subsequently addressed this by introducing Region of Interest (RoI) Pooling, which enabled feature extraction for all proposals in a single forward pass, substantially improving efficiency. Faster R-CNN further innovated by integrating a Region Proposal Network (RPN), thereby establishing the foundational framework for modern two-stage detection. During this period, the research focus was on maximizing detection accuracy; however, high inference latency persisted, making real-time deployment challenging.

Single-stage detectors initiated a shift toward real-time performance. The YOLO framework introduced a unified “divide-and-conquer” grid prediction strategy, achieving 45 FPS but suffering from reduced accuracy, particularly for small objects. SSD achieved a breakthrough by leveraging multi-scale feature maps for prediction, effectively balancing speed (59 FPS) and accuracy (74.3% mAP). RetinaNet [22] subsequently addressed the fundamental class imbalance issue in single-stage detection through its Focal Loss, enabling its accuracy to surpass that of contemporary two-stage models for the first time (achieving 39.1% AP on COCO). The core challenge thus shifted toward optimizing the trade-off between accuracy and speed, driven by growing industrial demand for efficient models.

Contemporary research focuses on optimizing the three-way balance between accuracy, speed, and model generalization. In model lightweighting, YOLOv7 [23] achieved a remarkable balance (56.8% AP on COCO) at a real-time speed of 30 FPS through re-parameterization techniques and dynamic label assignment mechanisms. Within multi-modal fusion, DETR [24] pioneered a fully end-to-end detection paradigm using Transformers, though its substantial computational demands initially limited its practical applicability. Conversely, RT-DETR [25] established a new state-of-the-art trade-off on the speed-accuracy frontier through an efficient hybrid encoder design.

2.2. Research on Small Object Detection

Small object detection remains a challenging frontier in computer vision, a difficulty that is particularly critical for identifying micron-scale electrowetting defects. The primary challenges are threefold: (1) Low resolution and limited features: Small objects occupy a minimal number of pixels, providing scant information for representation. (2) Feature degradation in deep networks: Successive downsampling operations can entirely lose or severely attenuate the already limited features of small objects. (3) Demand for precise localization: Bounding box regression requires extreme precision, as minor deviations result in a significant drop in Intersection over Union (IoU).

Existing research addresses these difficulties through various approaches, including multi-scale feature learning, generative models for data and feature enhancement, and refined training strategies. The Perceptual Generative Adversarial Network (Perceptual GAN) [26] pioneered the application of GANs to small object detection, enhancing performance by narrowing the representational gap between large and small objects. Similarly, Bai et al. [27] proposed MT-GAN, which employs an image-level super-resolution model to enhance features within small Regions of Interest (RoIs). Stachon et al. [28] systematically investigated the impact of data augmentation strategies, including oversampling and generative adversarial networks (GANs), on small object detection performance. Wang et al. [29] proposed the Normalized Wasserstein Distance (NWD), which models bounding boxes as 2D Gaussian distributions and uses the Wasserstein distance to measure their similarity. Multi-scale feature learning architectures also demonstrate strong potential; for instance, SEMA-YOLO [30] introduces a Shallow Layer Enhancement and a GCP-ASFF module for cross-scale fusion, while MLF-YOLO [31] employs an attention-guided Multiscale Local Fusion module. Therefore, the objective of our research is to address small object defects by advancing such multi-scale architectures.

2.3. Multi-Scale Fusion Methods

Multi-scale fusion techniques enhance object detection by integrating high-resolution spatial details from shallow layers with rich semantic information from deep feature hierarchies, establishing themselves as a critical architectural component. Early research, epitomized by the Feature Pyramid Network (FPN) [32], introduced a unidirectional top-down pathway to propagate semantic features to shallower layers, providing a foundational solution for multi-scale detection. However, this unidirectional fusion allows deep features to dominate, thereby diluting the fine-grained spatial details from shallow layers that are crucial for precise localization. Subsequent refinements, such as PANet [33], augmented this architecture by incorporating an additional bottom-up pathway, strengthening localization through bidirectional feature flow. A fundamental limitation shared by these methods is their spatial agnosticism; the fusion process applies uniform weighting across all spatial locations, ignoring regional variations in feature importance. This often results in blurred boundaries and poor localization, particularly for low-contrast objects.

Recent research has focused on dynamic feature weighting and lightweight architectural redesign. The BiFPN structure introduced learnable weights to enable per-channel adaptive weighting, balancing the contribution of different input features. On the MS COCO dataset, EfficientDet-D0 achieved 34.6% Average Precision (AP) with only 3.9 million parameters. However, BiFPN’s global, channel-wise weighting is less effective at capturing spatially localized salient features. To address this, the Adaptive Spatial Feature Fusion (ASFF) method proposed spatially adaptive weights, enabling location-wise feature fusion and effectively mitigating inter-scale feature conflicts. The detailed architectures of these modules are illustrated in Figure 1.

However, the three-tier fusion strategy of ASFF significantly increases model parameters and computational load. To address this, we propose the ASAF-Pyramid architecture, which reduces complexity by decreasing the frequency of fusion operations and integrating partial convolutions. To mitigate potential performance degradation in small-target detection from reduced fusion, the framework incorporates a dedicated high-resolution P2 layer. Simultaneously, it employs a Channel-weighted Selection and Fusion (CSF) module for progressive feature refinement. In this process, shallow features first fuse with adjacent intermediate layers before propagating upward, thereby enhancing gradient flow and feature representation. Consequently, the ASAF-Pyramid effectively integrates the advantages of both BiFPN and ASFF while resolving the high-complexity issue inherent in ASFF. The overall architecture is detailed in Figure 2.

3. Methods

3.1. Network Architecture

To address the challenge of multi-scale detection for electrowetting defects, we propose the ASAF-Net architecture shown in Figure 3. Specifically, we first construct an electrowetting defect dataset augmented through manual annotation and data enhancement to improve diversity. Employing YOLOv8 as the backbone feature extractor, YOLOv8 was selected as the feature extraction backbone in this study due to its superior balance between computational efficiency and detection accuracy, which is critical for real-time electrowetting defect detection in industrial applications. The architecture incorporates advanced components, including a CSPNet-enhanced backbone and a spatial pyramid pooling combined with path aggregation (SPP-PAN) neck structure, enabling multi-scale feature fusion with minimal computational overhead while maintaining inference speeds exceeding 100 FPS on standard GPUs—essential for high-speed production line deployment.

The YOLOv8 backbone comprises fundamental Convolution-Batch Normalization-SiLU (CBS) modules and more complex C2f modules. The CBS module, which consists of a convolutional layer, batch normalization, and a SiLU activation function, serves as the foundational building block for feature transformation and spatial downsampling. The C2f module, a critical component of YOLOv8, enriches gradient flow through a split and cross-stage hierarchical structure. This structure divides the input features into two branches: one undergoes extensive processing through multiple CBS modules, while the other maintains a residual connection. This design achieves an effective balance between computational cost and representational power. These extracted multi-scale features are then propagated to the Neck, which is built upon a PAN-FPN (Path Aggregation Network—Feature Pyramid Network) architecture. Within this module, features are fused bi-directionally through top-down and bottom-up pathways. In the top-down pathway, deep, semantically rich features from higher levels are upsampled and concatenated with shallow features containing fine-grained spatial details. Concurrently, in the bottom-up pathway, these detailed shallow features are integrated upward to refine the semantic representation of deeper features. This process constructs a robust multi-scale feature pyramid (denoted as levels P2 to P5) that integrates high-resolution spatial details with rich semantic context. Finally, a Spatial Pyramid Pooling Fast (SPPF) module is incorporated to rapidly expand the receptive field. It utilizes parallel pooling operations with varying kernel sizes to aggregate multi-scale contextual information, which is essential for accurate defect recognition across varying target sizes.

The model’s hierarchical feature pyramid natively handles the significant size and morphological variability of electrowetting defects (from sub-50-pixel charge trapping to larger structural deformations), substantially reducing missed detections for small targets. We integrate the proposed ASAF-Pyramid module to effectively fuse multi-scale features for the precise detection of defects across varying sizes. To specifically compensate for potential small-target detection degradation resulting from reduced fusion operations at the P3 level in ASAF-Pyramid, this module incorporates a lower P2 detection layer. This expands the framework to four detection heads corresponding to four feature scales, while implementing progressive feature refinement across multiple layers via CSF. Shallow features initially fuse with adjacent intermediate layers before propagating upward hierarchically. The Multi-scale Partial Convolutional Fusion Attention (MPFA) mechanism processes full-scale features, significantly enhancing micro-defect feature representation while reducing model complexity. Here, ASAF-Pyramid (A) specializes in micron-level electrowetting defect localization, whereas ASAF-Pyramid (B) achieves holistic detection of cross-scale defects, as follows:

•: ASAF-Pyramid (A) This is the primary and most powerful configuration. It introduces a dedicated, high-resolution P2 detection layer sourced from earlier, shallower feature maps. The P2 layer preserves the richest spatial details, which is paramount for the precise localization of micron-scale defects (e.g., charge trapping). Consequently, ASAF-Pyramid (A) employs four detection heads (P2, P3, P4, P5) for comprehensive defect detection across all scales, from the minutest to the largest.
•: ASAF-Pyramid (B) This variant serves as an ablative counterpart to demonstrate the effectiveness of our core fusion strategy independently of adding new scales. It applies the proposed progressive feature refinement and cross-tier adaptive weighting only to the original three pyramid levels (P3, P4, P5). This design allows for a direct comparison with the baseline YOLOv8 neck, showing that our fusion method itself brings significant improvement, and the addition of the P2 layer in variant (A) provides a further boost, especially for small targets.

The ‘A’ and ‘B’ pathways in Figure 3 correspond to these two variants, which can be utilized independently or, as in our final ASAF-Net model, together for a robust multi-scale analysis.

3.2. Multi-Scale Partial Convolution Fusion Attention

Electrowetting defects typically manifest at micro-pixel scales, as shown in Figure 4. Traditional convolution kernels struggle to capture effective features for such defects.

To address this challenge, the proposed MPFA module employs an innovative dual-branch architecture. This design incorporates partial convolution principles through shared fundamental convolutional feature partitioning [34], integrating local multi-scale partial convolutional features with global spatial attention while preserving original residual connection information. This approach significantly enhances both microscale feature extraction and computational efficiency for electrowetting defect detection. The module structure is illustrated in Figure 5, where component (A) denotes the Local Multi-scale Partial Convolution Module and component (B) denotes the Global Spatial Attention Module.

The core design philosophy of Partial Convolution is to exploit the inherent redundancy across feature map channels. This method convolves only a subset of input channels, leaving the remainder unchanged. Compared to traditional convolution, this approach significantly reduces floating-point operations (FLOPs) and memory access, thereby enhancing computational efficiency while preserving representational capacity. As illustrated in Figure 6, Partial Convolution employs this selective strategy to optimize the computational pathway without compromising model expressiveness. Consequently, this architecture is well-suited for resource-constrained environments, enabling high-precision micro-defect detection on electrowetting display surfaces with significantly improved efficiency.

We denote the input feature map as

X \in R^{B \times C \times H \times W}

, where B indicates batch size, C represents channel count, and H/W denote spatial height and width. Subsequently, we perform efficient QKV generation via a convolutional layer (incorporating partial convolution operations) to produce Q, K, and V. These are then split into three distinct components, as formalized in Equations (1) and (2).

(Q K V) = {C o n v}_{1 \times 1} (P a r t i a l_C o n v ({C o n v}_{1 \times 1} (R S (X))))

(1)

(Q, K, V) = s p i l t (Q K V)

(2)

Here, X denotes the input feature map and C represents the intermediate compressed channels for preventing information bottlenecks.

P a r t i a l_C o n v

refers to the partial convolution operation, designed to preserve specific channels unchanged.

Subsequently, we perform partial convolution on Q to derive base features. The leading subset of these base feature channels—precisely

C / / n_d i v

channels—is then isolated to extract core features for multi-scale branch reuse. This process reduces computational redundancy, with the formal implementation governed by Equations (3) and (4).

B a s e = P a r t i a l_C o n v (Q)

(3)

{B a s e}_{p a r t} = B [:, : c / / n_d i v]

(4)

Here, Base denotes the shared foundational features,

{B a s e}_{p a r t}

represents the partitioned channel segments, and

n_d i v

indicates the channel division scaling factor.

The partitioned channels undergo multi-scale convolution processing, where varied dilation rates modulate their receptive fields. These multi-scale outputs are then aggregated via channel-wise averaging, as formalized in Equations (5) and (6).

S_{k} = Z_{p a d} ({C o n v}_{3 \times 3}^{d = k} ({B a s e}_{p a r t})) \forall k \in {1,2, 3}

(5)

F_{m u l t i} = {C o n v}_{1 \times 1} (\frac{1}{3} \sum_{k = 1}^{3} S_{k}) + R S (X)

(6)

Here,

{C o n v}_{3 \times 3}^{d = k}

denotes a convolution operation with kernel size 3 × 3 and dilation rate k.

Z_{p a d}

represents the zero-padding function, designed to maintain consistent channel dimensionality through padding.

The mathematical formulation for the global attention module after generating Q, V, and K is provided in Equations (7)–(10).

Q = N o r m (R e a r r a n g e (q)), K = N o r m (R e a r r a n g e (k)), V = R e a r r a n g e (v)

(7)

A (Q, K, V) = S o f t m a x ((Q K^{T}) \cdot τ)

(8)

O_{a t t n} = r e s h a p e (A (Q, K, V) \cdot V)

(9)

F_{a t t n} = {C o n v}_{1 \times 1} (O_{a t t n})

(10)

Ultimately, MPFA preserves undistorted low-level information through residual connections, with direct propagation of original features mitigating gradient vanishing in deep networks as formulated in Equation (11) and structurally illustrated in Figure 7.

O u t = F_{a t t n} + F_{m u l t i} + R S (X)

(11)

Here, RS denotes the channel shuffle operation, and X represents the input feature tensor.

Figure 7. MPFA triple residual connection structure, A represents the local multi-scale partial convolution module, and B denotes the global spatial attention module.

3.3. Shape-IoU Loss Function

The CIoU loss function enhances bounding box regression by incorporating the aspect ratio between predicted and ground-truth boxes, thereby improving the model’s focus on shape characteristics. However, CIoU does not fully resolve orientation mismatches. To address this, we utilize the SIoU loss function, which considers both the centroid distance and the angular relationship via the connecting vector, leading to more precise bounding box alignment. SIoU explicitly accounts for the vector angle and comprises four cost components: Angle Cost, Distance Cost, Shape Cost, and IoU Cost. This design enables better differentiation of electrowetting defects with similar shapes but varying scales, as illustrated in Figure 8.

The Shape-IoU loss formulation is given in Equations (12)–(17).

Angle cost is defined as follows:

Λ = 1 - 2 {s i n}^{2} (a r c s i n (\frac{c_{h}}{σ}) - \frac{π}{4})

(12)

where

c_{h}

and

σ

denote the height difference between the ground truth bounding box

b^{g^{t}}

and the predicted bounding box b, and the center point, respectively.

Distance cost is defined as follows:

∆ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}})

(13)

ρ_{x} = {(\frac{b_{c_{x}}^{g^{t}} + b_{c_{x}}}{c_{w}})}^{2}, ρ_{y} = {(\frac{b_{c_{y}}^{g^{t}} + b_{c_{y}}}{c_{h}})}^{2}, γ = 2 - Λ

(14)

where (

b_{c_{x}}^{g^{t}}

,

b_{c_{y}}^{g^{t}}

) denotes the center coordinates of ground truth box

b^{g^{t}}

, and (

b_{c_{x}}

,

b_{c_{y}}

) denotes the center coordinates of predicted box b.

Shape cost is defined as follows:

Ω = \sum_{t = w, h} {(1 - e^{- w_{t}})}^{θ} = {(1 - e^{- w_{w}})}^{θ} + {(1 - e^{- w_{h}})}^{θ}

(15)

w_{w} = \frac{|w - w^{g t}|}{m a x (w, w^{g t})}, w_{h} = \frac{|h - h^{g t}|}{m a x (h, h^{g t})}

(16)

where

θ

is the attention degree parameter for shape loss, (w, h) and (

w^{g t}

,

h^{g t}

) represent the width and height of predicted box b and ground truth box

b^{g^{t}}

.

In conclusion, the final definition of the SIoU loss function is as follows:

{L o s s}_{S I o U} = 1 - I o U + \frac{∆ + Ω}{2}

(17)

where

I o U = \frac{(A \cap B)}{(A \cup B)}

, A is the predicted box and B is the ground truth box.

The SIoU loss function integrates the four cost components described above. Building upon the standard IoU, it incorporates angle, distance, and shape factors, optimizing these components to better align the geometric properties of predicted and ground-truth bounding boxes. This results in a more accurate loss calculation and a more effective optimization process, which significantly enhances the detection accuracy of small targets.

4. Experiment

4.1. Datasets

In this study, we utilized a custom-built electrowetting defect image dataset. Multiple data augmentation techniques—including flipping, cropping, exposure enhancement, and rotation—were applied to the original 112 electrowetting defect images. The augmented dataset was then divided into training, validation, and test sets at an 8:1:1 ratio. The electrowetting defect dataset comprises seven categories: Burnt, Charge trapping, Deformation, Degradation, Oil leakage, Oil splitting, and Normal, as illustrated in Figure 9.

Detailed Descriptions of Dataset Categories:

•: Burnt: Refers to localized overheating caused by overcurrent, overvoltage, or short-circuiting, resulting in permanent physical ablation of the electrowetting device. The dark brown/black areas in Figure 9a indicate the burnt regions.
•: Charge Trapping: This is one of the most prevalent and challenging failure modes in electrowetting devices. It occurs when unwanted electric charges become trapped within the dielectric or other sensitive layers. Charges trapped in the dielectric layer (beneath the hydrophobic coating) generate a residual electric field, as illustrated in Figure 9b.
•: Deformation: This describes the irregular distortion of pixel walls caused by voltage-induced alterations in droplet morphology, which ultimately compromises display quality, as shown in Figure 9c.
•: Degradation: This signifies the progressive deterioration of electrowetting performance over time. Contributing factors may include the chemical degradation of the hydrophobic layer (leading to reduced hydrophobicity), fluid contamination, and material aging. These issues prevent the droplet from achieving an optimal contact angle on the surface, as depicted in Figure 9d.
•: Oil Leakage: This failure mode involves the spillage of the oil phase from the fluidic cavity due to insufficient interfacial tension between the oil and aqueous phases. This leakage can cause display malfunctions or damage other components, as seen in Figure 9e.
•: Oil Splitting: In the unpowered (non-actuated) state, the oil (purple area) should uniformly cover the entire pixel area. However, due to factors such as hydrophobic layer degradation, the oil fails to split and retreat properly to the corners, thereby exposing the substrate below (appearing white), as shown in Figure 9f.
•: Normal: This represents the ideal state of the device, characterized by intact pixel structures, an undamaged dielectric layer, proper sealing, and well-defined, controllable oil-aqueous interfaces, as shown in Figure 9g.

4.2. Experimental Environment

The specific experimental hardware configuration is detailed in Table 1.

Optimizer: SGD with an initial learning rate of 0.01, weight decay coefficient of 0.0005, and momentum factor of 0.937. Training: 100 epochs with a batch size of 16.

4.3. Evaluation Metrics

The evaluation metrics for this electrowetting defect detection experiment employed precision (P), recall (R), and mean average precision (mAP), with mathematical formulations provided in Equations (18)–(20).

P = \frac{T P}{T P + F P}

(18)

R = \frac{T P}{T P + F N}

(19)

A P = \int_{0}^{1} P (R) d R, m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(20)

Simultaneously, we utilized parameters (Params) and giga floating-point operations (GFLOPs) to measure model complexity, while inference speed was evaluated in frames per second (FPS).

4.4. Results and Analysis

4.4.1. Comparison Experiment

To validate model efficacy and superiority, comparative experiments were conducted with classical mainstream object detection algorithms, including SSD, Faster R-CNN, CenterNet [35], YOLOv5, YOLOv8 [36], YOLOv10 [37], YOLOv11, and RT-DETR. All experiments maintained identical hardware configurations, experimental environments, parameter settings, and employed the same self-collected electrowetting (EW) defect dataset. Experimental results presented in Table 2 distinctly demonstrate the detection performance and efficiency of each algorithm.

Table 2 demonstrates that ASAF-Net achieves superior performance in electrowetting defect detection: ASAF-Net-A attains an mAP@0.5 of 0.982 with merely 1.5% missed detection rate, significantly reducing false alarms while maintaining 0.912 precision. Its inference speed of 112 FPS fully satisfies real-time requirements for high-speed production lines. The parameter count of 9.82M indicates low model complexity, enabling efficient edge deployment. ASAF-Net-B also exhibits excellent detection performance (second only to ASAF-Net-A), achieving 0.975 mAP@0.5, though with slightly higher model complexity. Faster R-CNN, as a two-stage detector, exhibits excessive complexity unsuitable for real-time industrial inspection. YOLOv5n, while attaining 0.927 mAP@0.5 with the lowest Params and GFLOPs among alternatives, remains significantly inferior to ASAF-Net in detection capability.

For intuitive comparison of mainstream object detection algorithms’ performance and efficiency, we present Figure 10A contrasts model accuracy against computational costs, while Figure 10B compares speed versus precision. These visualizations demonstrate that ASAF-Net achieves high-precision detection within acceptable computational constraints while simultaneously attaining operational efficiency.

Electrowetting defect detection faces challenges with elusive defect types (e.g., charge trapping, deformation, degradation) where conventional methods frequently exhibit missed/false detections. To validate our model’s remarkable performance on small EW defects, comparative experiments were conducted with multiple object detection algorithms, as detailed in Table 3. The results demonstrate that all tested models effectively detect burnt-type defects, but exhibit significant performance discrepancies on the other three defect categories. Lightweight models represented by YOLOv5n show poor detection capability for micro-defects like charge trapping (visual evidence in Figure 11), while conventional YOLO variants fail to achieve optimal precision on irregular small defects (deformation, degradation). Our ASAF-Net achieves the highest precision for deformation and burnt-type defects, with ASAF-Net-A particularly reaching 0.986 AP in detecting charge trapping micro-defects, conclusively validating its superior small-target detection capability.

Figure 11 provides a simplified visual comparison of detection outcomes for YOLOv5n, YOLOv5s, YOLOv8s, YOLOv10s, ASAF-Net-B, and ASAF-Net-A. (a) Original image, Subsequent panels display results from the six detection algorithms under identical validation conditions (confidence threshold fixed at 0.5), enabling clear differentiation of small/irregular defect detection capabilities. Observations reveal, YOLOv5n, YOLOv5s, YOLOv8s, and YOLOv10s exhibit varying degrees of missed detection across defect categories. Severe missed detection occurs for small targets (e.g., charge trapping defects). ASAF-Net-B lacks dedicated small-target detection layers, showing partial missed detection. ASAF-Pyramid’s cross-scale design preserves limited small-target detection capacity in ASAF-Net-B. ASAF-Net-A (equipped with small-target detection layers) demonstrates particularly pronounced small-target detection capability.

To validate the efficacy of our designed ASAF-Pyramid in reducing model complexity, we conducted comparative experiments benchmarking the complexity of ASAF-Pyramid—which reduces fusion operation frequency—against the conventional three-tier ASFF approach. Results are listed in Table 4.

4.4.2. Ablation Experiments

Ablation studies in Table 5 demonstrate that the YOLOv8s Backbone achieves 0.965 mAP@0.5 with 11.1M parameters and 28.7 GFLOPs. When implementing ASAF-Pyramid (A), mAP improves to 0.972 while parameters decrease to 9.93M, though GFLOPs increase to 33.4. Introducing MPFA yields 0.971 mAP while reducing both parameters and GFLOPs, indicating effective model complexity reduction. The optimal configuration (ASAF-Pyramid (A) + MPFA + SIoU) achieves 0.982 mAP at minimal complexity (9.82 M parameters, 33.2 GFLOPs). ASAF-Pyramid (B) enhances accuracy with a slight complexity increase, while MPFA consistently reduces complexity and significantly improves detection accuracy. These results comprehensively validate each module’s efficacy.

Our shared channel partitioning strategy, governed by the n_div parameter, determines the number of channels allocated for multi-scale operations. For instance, when n_div = 4, one-quarter of channels undergo multi-scale processing, significantly reducing computational overhead and accelerating training. The processed features then undergo stacked averaging and are concatenated with zero-padded unpartitioned channels along the channel dimension. Since the partitioning ratio affects model performance, we conduct ablation experiments to assess its impact across diverse channel configurations. This identifies the optimal balance point between performance and complexity, as detailed in Table 6.

As evidenced in Table 6, a smaller n_div value indicates greater channel participation in multi-scale convolutional operations, preserving more feature details at the expense of increased computational load. Conversely, a larger n_div reduces computational channels, lowering complexity while risking critical information loss. The configuration n_div = 4 represents an optimal trade-off point, maintaining sufficient channel information without peak complexity. On the ASAF-Net-A architecture, n_div = 4 achieves the best balance, yielding 0.3% higher accuracy with reduced parameters compared to n_div = 2. Though n_div = 8 delivers maximal computational efficiency, it incurs a 0.7% accuracy drop. For the larger ASAF-Net-B variant, n_div = 4 likewise achieves the optimal accuracy-efficiency trade-off.

5. Conclusions

This study proposes ASAF-Net, an end-to-end electrowetting (EW) defect detection framework that effectively addresses critical challenges of scale diversity and low contrast in industrial settings. By innovatively integrating a Multi-scale Partial Convolutional Attention Fusion mechanism with an Adaptive Scale Attention Fusion Pyramid, the network achieves adaptive feature enhancement for micro-scale EW defects. Through a shared channel-wise partitioning strategy, it significantly reduces computational complexity during multi-scale branching while achieving the highest precision in detecting elusive micro-defects (charge trapping, deformation, degradation), thereby resolving the missed detection issues for small targets in conventional detectors. Replacing CIoU with Shape-IoU loss further optimizes localization accuracy for irregular small targets. Evaluations on a custom EW defect dataset demonstrate ASAF-Net-A attains state-of-the-art performance (0.982 mAP@0.5) with real-time inference speed (112 FPS) and minimal parameters (9.82M), outperforming mainstream detectors (YOLOv8s, RT-DETR) in both accuracy and efficiency.

This solution provides a viable technical approach for high-precision real-time inspection on EW display production lines, particularly addressing detection bottlenecks for dynamic micro-defects such as charge trapping and pixel wall deformation.

Author Contributions

R.C.: conceptualization, writing—original draft, review & editing. J.Z.: formal analysis, review & editing. W.L.: data curation. H.C.: investigation. Z.L.: methodology, visualization, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2022 Zhongkai University of Agriculture and Engineering Graduate Education Innovation Plan Project (KA220160228); Guangdong Rural Science and Technology Commissioner Project (No. KTP20240633). Guangdong Basic and Applied Basic Research Foundation (2023A1515011230).

Data Availability Statement

The original data presented in the study are openly available in Zenodo at https://doi.org/10.5281/zenodo.17103305.

Conflicts of Interest

The authors declare no conflicts of interest to report regarding the present study.

References

Lippmann, G. Relations entre les phénomènes électriques et capillaires. Ann. Chim. Phys. 1875, 5, 494–549. [Google Scholar]
Beni, G.; Hackwood, S. Electro-Wetting Displays. Appl. Phys. Lett. 1981, 38, 207–209. [Google Scholar] [CrossRef]
Berge, B. Electrocapillarite et mouillage de films isolants par l’eau. C. R. Acad. Sci. Paris. Sér II 1993, 317, 157–163. [Google Scholar]
Hayes, R.A.; Feenstra, B.J. Video-Speed Electronic Paper Based on Electrowetting. Nature 2003, 425, 383–385. [Google Scholar] [CrossRef] [PubMed]
Mugele, F.; Baret, J.-C. Electrowetting: From Basics to Applications. J. Phys. Condens. Matter. 2005, 17, R705–R774. [Google Scholar] [CrossRef]
Giraldo, A.; Aubert, J.; Bergeron, N.; Li, F.; Slack, A.; Van De Weijer, M. 34.2: Transmissive Electrowetting-Based Displays for Portable Multi-Media Devices. SID Symp. Dig. Tech. Pap. 2009, 40, 479–482. [Google Scholar] [CrossRef]
Ku, Y.; Kuo, S.; Huang, Y.; Chen, C.; Lo, K.; Cheng, W.; Shiu, J. Single-layered Multi-color Electrowetting Display by Using Ink-jet-printing Technology and Fluid-motion Prediction with Simulation. J. Soc. Inf. Disp. 2011, 19, 488–495. [Google Scholar] [CrossRef]
Heikenfeld, J.; Steckl, A.J. Intense Switchable Fluorescence in Light Wave Coupled Electrowetting Devices. Appl. Phys. Lett. 2005, 86, 011105. [Google Scholar] [CrossRef]
Cuimei, L.; Zhiliang, Q.; Nan, J.; Jianhua, W. Human Face Detection Algorithm via Haar Cascade Classifier Combined with Three Additional Classifiers. In Proceedings of the 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Yangzhou, China, 20–22 October 2017; pp. 483–487. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Jocher, G. YOLOv5 by Ultralytics. Available online: https://github.com/ultralytics/yolov5 (accessed on 20 June 2025).
Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv 2022, arXiv:2205.12740. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. arXiv 2013, arXiv:1311.2524v5. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. arXiv 2017, arXiv:1706.05274. [Google Scholar] [CrossRef]
Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11217, pp. 210–226. ISBN 978-3-030-01260-1. [Google Scholar]
Stachoń, M.; Pietroń, M. Chosen Methods of Improving Small Object Recognition with Weak Recognizable Features. In Advances in Information and Communication; Arai, K., Ed.; Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2023; Volume 652, pp. 270–285. ISBN 978-3-031-28072-6. [Google Scholar]
Wang, J.; Xu, C.; Yang, W.; Yu, L. A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. arXiv 2022, arXiv:2110.13389. [Google Scholar] [CrossRef]
Wu, Z.; Zhen, H.; Zhang, X.; Bai, X.; Li, X. SEMA-YOLO: Lightweight Small Object Detection in Remote Sensing Image via Shallow-Layer Enhancement and Multi-Scale Adaptation. Remote Sens. 2025, 17, 1917. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, C.; Fan, Y. MLF-YOLO: A Novel Multiscale Feature Fusion Network for Remote Sensing Small Target Detection. J. Real-Time Image Process. 2025, 22, 138. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2017, arXiv:1612.03144. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. arXiv 2023, arXiv:2303.03667. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLOv8: Ultralytics’ Newest Object Detection Model. Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 20 June 2025).
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. (a) PAN-FPN: Bottom-up path augmentation. (b) BiFPN: Cross-scale connections with weighted fusion. and (c) ASFF: Adaptive feature filtering per level.

Figure 2. The ASAF-Pyramid structure.

Figure 3. ASAF-Net architecture diagram.

Figure 4. Microscale electrowetting defects.

Figure 5. (A) Local Multi-scale Partial Convolution Module; (B) Global Spatial Attention Module.

Figure 6. Partial convolution structure.

Figure 8. Angle cost and distance cost.

Figure 9. Dataset of electrowetting defects.

Figure 10. (A) Model accuracy vs. computational cost; (B) speed vs. accuracy comparison.

Figure 11. (a) Original image, (b) YOLOv5n detection results, (c) YOLOv5s detection results, (d) YOLOv8s detection results, (e) YOLOv10s detection results, (f) ASAF-Net-B detection results, (g) ASAF-Net-A detection results.

Table 1. Experimental configuration.

Device
System	Windows 11 OS
GPU	NVIDIA GeForce RTX 4070Super 12G GPU
CPU	13th Gen Intel(R) Core(TM) i5-13490F 2.50 GHz
NVIDIA CUDA	CUDA 12.4
PyTorch version	2.6.0
Python version	3.11

Table 2. Comparative experimental results.

Model	Precision	Recall	mAP@0.5	Param (M)	GFLOPs	FPS
SSD	0.892	0.905	0.918	24.4	32.3	106
Faster RCNN	0.905	0.882	0.924	137	369	22.5
CenterNet	0.893	0.904	0.917	32.3	70.2	60.4
YOLOv5n	0.916	0.876	0.927	2.51	7.20	141
YOLOv8n	0.913	0.929	0.953	3.01	8.20	143
RT-DETR	0.904	0.919	0.951	20.2	60.0	85.5
YOLOv10n	0.904	0.921	0.946	2.71	8.40	138
YOLOv5s	0.906	0.909	0.940	9.12	23.8	132
YOLOv8s	0.915	0.941	0.965	11.1	28.7	139
YOLOv10s	0.908	0.923	0.953	7.22	21.4	127
ASAF-Net-B	0.909	0.952	0.975	12.3	34.8	115
ASAF-Net-A	0.912	0.985	0.982	9.82	33.2	112

Table 3. Comparison of small electrowetting defect detection capabilities across models.

Model	AP
Model	Charge Trapping	Burnt	Deformation	Degradation
YOLOv5n	0.759	0.935	0.855	0.838
YOLOv8n	0.796	0.942	0.871	0.857
YOLOv10n	0.794	0.981	0.887	0.854
YOLOv5s	0.832	0.995	0.945	0.947
YOLOv8s	0.850	0.995	0.995	0.949
YOLOv10s	0.889	0.995	0.946	0.926
ASAF-Net-B	0.975	0.995	0.995	0.960
ASAF-Net-A	0.986	0.995	0.995	0.978

Table 4. Complexity comparison of ASAF vs. ASFF.

Structure	Param (M)	GFLOPs
ASAF	9.93	33.4
ASFF	10.1	36.8

Table 5. Ablation study results.

YOLOv8s BackBone				mAP@0.5	Param (M)	GFLOPs
ASAF-Pyramid (A)	ASAF-Pyramid (B)	MPFA	SIoU	mAP@0.5	Param (M)	GFLOPs
				0.965	11.1	28.7
√				0.972	9.93	33.4
	√			0.968	12.8	35.6
		√		0.971	10.7	27.8
			√	0.967	11.1	28.7
√		√		0.978	9.82	33.2
√			√	0.974	9.93	33.4
	√	√		0.973	12.3	34.8
	√		√	0.971	12.8	35.6
	√	√	√	0.975	12.3	34.8
√		√	√	0.982	9.82	33.2

√ indicates adding the corresponding module here based on the YOLOv8s BackBone.

Table 6. Model performance and complexity under varying channel partitioning ratios.

Models	MPFA			mAP@0.5	Param (M)	GFLOPs
Models	n_div = 2	n_div = 4	n_div = 8	mAP@0.5	Param (M)	GFLOPs
ASAF-Net-A	√			0.979	9.86	33.4
		√		0.982	9.82	33.2
			√	0.972	9.81	33.1
ASAF-Net-B	√			0.971	12.5	35.1
		√		0.975	12.3	34.8
			√	0.968	12.2	34.7

√ indicates that the MPFA module uses the corresponding n_div parameter value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, R.; Zheng, J.; Long, W.; Chen, H.; Luo, Z. A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection. Information 2025, 16, 973. https://doi.org/10.3390/info16110973

AMA Style

Chen R, Zheng J, Long W, Chen H, Luo Z. A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection. Information. 2025; 16(11):973. https://doi.org/10.3390/info16110973

Chicago/Turabian Style

Chen, Rui, Jianhua Zheng, Wufa Long, Haolin Chen, and Zhijie Luo. 2025. "A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection" Information 16, no. 11: 973. https://doi.org/10.3390/info16110973

APA Style

Chen, R., Zheng, J., Long, W., Chen, H., & Luo, Z. (2025). A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection. Information, 16(11), 973. https://doi.org/10.3390/info16110973

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Lightweight Adaptive Attention Fusion Network for Real-Time Electrowetting Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. Object Detection Algorithms

2.2. Research on Small Object Detection

2.3. Multi-Scale Fusion Methods

3. Methods

3.1. Network Architecture

3.2. Multi-Scale Partial Convolution Fusion Attention

3.3. Shape-IoU Loss Function

4. Experiment

4.1. Datasets

4.2. Experimental Environment

4.3. Evaluation Metrics

4.4. Results and Analysis

4.4.1. Comparison Experiment

4.4.2. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI