Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing

Xiong, Jianbin; Wang, Jing; Wang, Qi; Yang, Jianxiang; Dong, Xiangjun; Dai, Weikun; Zhang, Qianguang

doi:10.3390/jsan15010016

Open AccessArticle

Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing

by

Jianbin Xiong

,

Jing Wang

^*,

Qi Wang

^*,

Jianxiang Yang

,

Xiangjun Dong

,

Weikun Dai

and

Qianguang Zhang

College of Automation, Guangdong Polytechnic Normal University, Guangzhou 510450, China

^*

Authors to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2026, 15(1), 16; https://doi.org/10.3390/jsan15010016

Submission received: 24 November 2025 / Revised: 14 January 2026 / Accepted: 26 January 2026 / Published: 1 February 2026

Download

Browse Figures

Versions Notes

Abstract

The high-quality droplet formation in continuous inkjet printing (CIJ) is crucial for precise character deposition on product surfaces. This process, where a piezoelectric transducer perturbs a high-speed ink stream to generate micro-droplets, is highly sensitive to parameters like ink pressure and transducer amplitude. Suboptimal conditions lead to satellite droplet formation and charge transfer issues, adversely affecting print quality and necessitating reliable monitoring. Replacing inefficient manual inspection, this study develops MBSim-YOLO, a deep learning-based method for automated droplet detection. The proposed model enhances the YOLOv8 architecture by integrating MobileNetv3 to reduce computational complexity, a Bidirectional Feature Pyramid Network (BiFPN) for effective multi-scale feature fusion, and a Simple Attention Module (SimAM) to enhance feature representation robustness. A dataset was constructed using images captured by a CCD camera during the droplet ejection process. Experimental results demonstrate that MBSim-YOLO reduces the parameter count by 78.81% compared to the original YOLOv8. At an Intersection over Union (IoU) threshold of 0.5, the model achieved a precision of 98.2%, a recall of 99.1%, and a mean average precision (mAP) of 98.9%. These findings confirm that MBSim-YOLO achieves an optimal balance between high detection accuracy and lightweight performance, offering a viable and efficient solution for real-time, automated quality monitoring in industrial continuous inkjet printing applications.

Keywords:

continuous inkjet printing; micro-droplets; MBSim-YOLO; droplet detection

1. Introduction

Continuous Inkjet Printing (CIJ) represents a canonical deflection-based inkjet printing technology. The printing system continuously generates a stream of ink droplets, which are precisely guided to the surface of the substrate by a high-voltage deflection electric field. The redundant ink droplets that do not participate in imaging are recycled through a circulation system back to the ink supply device for reuse [1]. The inkjet printing system accurately deposits microfluidic droplets onto predefined regions via precision nozzles. It is characterized by the dynamic programmability of printed patterns and supports non-contact printing with micron-level resolution [2,3]. Continuous inkjet coding machines establish a comprehensive digital identity recognition system by enabling traceability across the entire product life cycle, encompassing raw material procurement, production and manufacturing, logistics distribution, and end-user consumption. This technology not only facilitates dual functions of product anti-counterfeiting and channel management but also offers an innovative approach to enhancing enterprise competitiveness through supply chain data visualization for digital marketing strategies. Consequently, CIJ technology finds extensive application across diverse sectors, including the food industry, cosmetics industry, pharmaceutical industry, logistics, automotive parts manufacturing, wire and cable production, aluminum-plastic pipe fabrication, tobacco and alcohol sectors, as well as numerous other domains [4,5,6,7,8,9].

As production efficiency improves and production scales expand, higher demands are placed on the printing efficiency, quality, and durability of inkjet printers. However, issues such as ink droplet tailing, satellite droplets, and incomplete droplet separation during the ejection process can directly result in a significant deterioration of print quality [10]. Thus, achieving the optimal droplet break-up state during inkjet printing and enabling real-time monitoring of jetting anomalies continue to represent the critical scientific challenges and engineering bottlenecks confronting industrial inkjet coding technology today. In CIJ technology, ink droplets are continuously generated via the nozzle under the driving force of piezoelectric crystal shear stress. Upon passing through the charging electrode, each droplet acquires a specific electric charge, which determines its deflection trajectory within the high-voltage deflection plates. Ultimately, the droplets are deposited onto the target substrate [11,12,13]. Owing to the structural complexity of CIJ, visually monitoring the ejection state of ink droplets poses significant challenges. Existing research demonstrates that visual sensing technology utilizing Charge-Coupled Devices (CCD) has been effectively applied for the quantitative analysis of ink droplet state parameters [14,15,16,17]. To ensure the stability of the inkjet printing process, current industrial inkjet coding systems require continuous monitoring by specialized personnel, resulting in increased costs for consumables, equipment maintenance, and labor. If the coding equipment in the production line requires shutdown for inspection upon anomaly detection, this will lead to higher resource consumption, extended monitoring duration, and may ultimately result in production process delays and increased production costs [13].

In recent years, machine learning algorithms based on artificial neural networks have been effectively utilized to address complex challenges, including ink droplet target detection and jetting state prediction [10,18,19]. In particular, supervised learning is increasingly being utilized for target prediction in additive manufacturing processes through the optimization of parameter sets [20]. Based on this, J Huang et al. compared most existing methods that use supervised or semi-supervised approaches to learn from production process data and proposed an unsupervised learning method for analyzing droplet flow patterns [21]. This method eliminates the need for clearly defined true-value labels, thereby significantly reducing the cost associated with manual tagging. These latest studies demonstrate that on-site monitoring and anomaly detection of the ink droplet ejection process in inkjet printing possess significant application potential.

You Only Look Once (YOLO) is a deep learning-based object detection algorithm that has become one of the most representative algorithms in the field of computer vision in recent years. The core idea of this algorithm lies in transforming the object detection problem into a regression problem. Traditional object detection methods typically involve two steps: region proposal and classification. In contrast, YOLO leverages a convolutional neural network (CNN) to directly predict the location and category of objects. The YOLO model is specifically designed to address tasks such as object tracking, object recognition, and segmenting objects or regions from input images [22,23]. For the identification of ink droplet states in inkjet printers, the YOLO model is capable of rapidly detecting various states of ink droplets, including trailing-tail, satellite, stick-together, and Ball-shaped droplets. This enables it to satisfy both the real-time monitoring requirements of production lines and the specific needs of researchers and developers in the inkjet printing domain. The YOLO algorithm, characterized by its strong robustness, scalability, and end-to-end training capability, represents a highly promising approach for identifying the ejection state of inkjet droplets [24,25].

The primary focus of this study is the identification and detection of droplet ejection states in CIJ. In this domain, real-time monitoring of the droplet formation process is critical, as the droplet state directly impacts the quality and reliability of the printing process. Droplet morphology, including Satellite droplets, Stick-together droplets, and Trailing-tail droplets, represents one of the key factors influencing the ejection state. To address this, a CCD area array camera integrated with a telecentric lens was employed to observe the droplets, and the monitoring results were captured as a dataset of image frames. Based on this dataset, a YOLO-based target detection model was developed for classifying droplet morphologies and assessing the ejection state during inkjet printing. Specifically, MobileNetV3, a lightweight pre-trained architecture, was incorporated into the YOLO model, and its hyperparameters were fine-tuned to meet the specific requirements of this study. In practical applications, the combination of YOLO’s robust target detection capabilities and real-time performance with MobileNetV3’s lightweight design significantly reduces computational complexity, thereby enhancing operational efficiency. This approach not only achieves efficient and high-precision droplet state recognition but also satisfies the demands of industrial environments for limited computing resources and rapid feedback, enabling the trained model to be deployed on mobile embedded devices. Furthermore, this method serves as an effective tool that can substantially reduce material and labor costs, ultimately improving the operational efficiency and quality control of inkjet printers.

2. Materials and Methods

In this study, an industrial area-array camera paired with a telecentric lens was employed to capture the nozzle region of the inkjet printer. Through the acquisition of frame-sequence images during the ink droplet falling process, synchronous observation of the ink droplet state changes was achieved. Through an in-depth analysis of the working principle of the inkjet printer head and the formation mechanism of ink droplets, the classification of ink droplet states was further refined, specifically encompassing Ball-shaped, Satellite, Stick-together, and Trailing-tail. Based on the aforementioned theoretical research and platform development, this study systematically acquired images of ink droplets from the inkjet printer under various jetting conditions. The input characteristics and target distribution of the collected ink droplet image data are presented in Figure 1c. Based on the aforementioned research foundation, this paper presents a lightweight deep learning model named MBSim-YOLO for real-time detection of ink droplet states in continuous inkjet printers. This model leverages the YOLOv8 object detection architecture and integrates MobileNetv3’s depthwise separable convolution technology, thereby significantly reducing the model’s computational complexity. Consequently, it enables efficient operation on embedded devices to facilitate real-time monitoring of ink droplet states. Furthermore, by incorporating the multi-scale feature fusion module BiFPN and the attention mechanism SimAM, the model not only achieves efficient fusion of ink droplet state features across different scales but also effectively suppresses background noise, leading to a significant improvement in classification accuracy. The architecture of the ink droplet state detection model is presented in Figure 1d. This model is capable of rapidly detecting the state changes of ink droplets during the ejection process of the inkjet printer, thus enhancing the stability and output accuracy of the inkjet printer.

2.1. Experimental Apparatus

This study utilized the EC-JET1000 CIJ as the experimental apparatus, with its structural principle illustrated in Figure 2. The device features an ink drive control system and a nozzle system, enabling effective regulation of ink droplet states. The nozzle of this equipment has a diameter of 60 µm, features a fully sealed design, and is connected to the inkjet printer via a 2-m-long conduit. The inkjet printer is designed to operate with single-phase AC power input at 115 V or 230 V, 50/60 Hz. The ink utilized in the EC-JET1000 (Guangzhou EC-PACK Packaging Equipment CO., Ltd., Guangzhou, China) consists of a mixture of Acrylic Enamel and Acrylic Thinner, with densities of 1.430 g/cm³ and 0.857 g/cm³, respectively. These components are transported via U-shaped pipes into a mixing tank, where they are thoroughly combined. Conductive ink is pressurized into the ink chamber and subsequently ejected through the nozzle. As the ink flows through the nozzle, it is fragmented into a continuous series of equally spaced and uniformly sized droplets by the action of the piezoelectric crystal. The jetted ink stream travels downward and passes through the charging electrode, at which point the ink droplets separate from the ink line. The system applies a specific voltage to the charging electrode, causing the ink droplet to instantaneously acquire a negative charge proportional to the applied voltage as it separates from the conductive ink line. Meanwhile, the voltage frequency of the charging electrode must be precisely adjusted to match the frequency at which the ink droplets break off, thereby ensuring that each droplet acquires the predetermined negative charge with high accuracy. For this study, an acrylic magnetic paint-based mixed fluid material was utilized as the medium for jet droplets to enable effective identification of the inkjet process state.

2.2. Ink Droplet Data Acquisition

This study utilizes HIKVISION’s industrial area array camera (MV-CS016-10GM/GC, Hikvision, Hangzhou, China) in combination with MY series telecentric lenses (MVL-MY-08-130-MP, Hikvision, Hangzhou, China) to construct an ink droplet observation platform. The MV-CS016-10GM/GC is integrated with Sony’s IMX296 (Sony, Tokyo, Japan) global shutter CMOS chip, which exhibits low noise and superior image quality. It enables rapid real-time transmission of uncompressed data via a Gigabit Ethernet interface. At full resolution, the maximum frame rate achievable is 65.2 fps. During the jetting process, ink droplets detach from the ink stream at the charging electrode. Consequently, a CCD industrial camera is positioned to target the charging electrode of the inkjet printer head for capturing images. Simultaneously, to ensure accurate and clear imaging of the falling ink droplets, LED exposure technology is employed, with the exposure frequency synchronized to the piezoelectric drive frequency. By fine-tuning the pressure and amplitude parameters of the EC-JET1000, the state of the ejected ink droplets can be effectively altered. To capture the inkjet printing scenarios of the printer under various operating conditions, while maintaining a constant piezoelectric crystal driving waveform, multiple videos of ink droplets being ejected under different pressure and amplitude settings were recorded according to the matrix presented in Table 1.

In addition, to accommodate the spacing between the charging plates and clearly capture the falling process of the ink droplets within the gap, the horizontal and vertical offsets of the CCD were adjusted to 612 and 58, respectively. Simultaneously, by optimizing the settings, the size of the generated data was reduced. The final image resolution for the collected data was 136 pixels × 1020 pixels, with the image format being PNG.

2.3. Image Preprocessing and Dataset Preparation

When utilizing the ink droplet observation platform to capture the ink droplet ejection process, the nozzle may generate various types of noise while performing ejection, purging, or suction cleaning procedures. Moreover, fluctuations typically occur during the initial stage of nozzle ejection, causing ink droplets to separate from the ink line with difficulty and instability. This can lead to unformed cluster ink blockages or splashing, which may ultimately adhere to the charging electrode plate. Therefore, during the detection process, to avoid confusion arising from system-generated noise and satellite droplets, data preprocessing must be performed prior to analyzing, extracting, and measuring specific droplet attributes. The original images captured were grayscale images. To improve the image quality for subsequent analysis, we optimized the acquisition parameters. It was found that setting the exposure time to 24,000 µs and the exposure gain to 1.0052 dB produced clear and bright droplet images in the grayscale domain, with high contrast between the droplet region and the background, thus facilitating reliable feature extraction. To enhance image contrast and compensate for potential non-linear camera responses, a Gamma correction operation was performed. Gamma correction is a standard image processing technique that applies a power-law transformation to pixel intensities. The transformation is defined as:

f (I) = I^{γ}

(1)

where

I

represents the normalized input pixel intensity (for an 8-bit image,

I \in [0, 1]

), and

γ

is the correction factor. We set

γ = 2.2

, a common value that approximates the perceptual response of human vision and provided optimal contrast enhancement for our droplet images, facilitating subsequent analysis.

A total of 3186 ink droplet images were captured from the collected ink droplet ejection video in a frame-by-frame time sequence. Based on the systematic observation of the ink droplet ejection process, its appearance can be classified into four types. When the droplet is still connected to the nozzle before contraction, it is classified as the Stick-together state. Once detached, if the visible ligament connects the droplet to the nozzle or the ink column, this state is defined as the Trailing-tail. The satellite category includes one or more smaller secondary droplets that separate from the main droplet or its tail. Finally, when the droplet reaches a stable, Ball-shaped or nearly Ball-shaped under the action of surface tension, it is classified as the Ball-shaped state. Representative examples of these states are shown in Figure 3. It should be particularly noted that the annotation protocol is limited to these four foreground categories, and there is no separate background category annotated.

During the image annotation phase, the LabelImg tool (version 1.8.6) was employed to accurately annotate the four types of ink droplet states in the images, thereby ensuring the consistency and accuracy of the data labeling process. LabelImg is an open-source graphical image annotation tool that facilitates the manual drawing of bounding boxes and assignment of class labels, outputting annotations in formats such as YOLO (which includes TXT files with normalized center coordinates and dimensions). The generated TXT label files include the center point coordinates of each ink droplet target along with the width and height of their respective bounding boxes. To ensure a rigorous evaluation that prevents data leakage and tests the model’s generalization to unseen operating conditions, the dataset was split following a condition-stratified protocol. As detailed in Table 1, the data were collected from videos under nine distinct pressure-amplitude combinations. We reserved all image frames from two complete, independent conditions (Condition 8: 44.2 psi, 200 µm and Condition 9: 53 psi, 200 µm) to form the held-out validation set (547 images). The remaining frames, originating from the seven mutually exclusive conditions (Conditions 1–7), constituted the training set (2639 images). This approach guarantees no temporal or parametric overlap between the training and validation data, providing a robust assessment of model performance. The histogram of the collected laboratory data is presented in Figure 1c. This histogram not only illustrates the distribution range of each feature but also visually differentiates the data through regions marked with distinct colors.

2.4. Evaluation Metrics

To comprehensively assess the performance and computational efficiency of the inkjet droplet state detection model, this study employs precision (P), recall (R), mean average precision (mAP), F1 score, Parameters, and gigaflops (FLOPs) as evaluation metrics.

Precision (P) is a critical metric for assessing the accuracy of target prediction, indicating the proportion of correctly detected targets among all retrieved targets [26]. It is formally defined as:

P = \frac{TP}{TP + FP}

(2)

where true positive (TP) refers to the count of samples correctly classified as positive, whereas false positive (FP) indicates the count of samples incorrectly classified as positive.

Recall (R) is an evaluation metric that quantifies a model’s ability to successfully identify all true positive instances. It is formally defined as:

R = \frac{TP}{TP + FN}

(3)

where FN denotes the count of samples that are truly positive yet incorrectly classified as negative.

The mean average precision (mAP) is computed as the average of precision values across different categories and multiple IoU thresholds. Specifically, first, the correctness of detection results for each category is determined by evaluating the IoU between predicted bounding boxes and ground truth bounding boxes. Next, the area under the precision-recall curve is calculated for each category. The mAP is obtained by averaging these areas across all categories. Its mathematical expression is as follows:

mAP = \frac{1}{N_{C}} \sum_{i = 1}^{N_{C}} {AP}_{i}

(4)

where

N_{C}

denotes the total number of classes, while

{AP}_{i}

represents the average precision for the i-th class.

The F1-score reflects a comprehensive balance between precision and recall, being defined as their harmonic mean. It is computed as the harmonic mean of these two metrics, where a higher value signifies superior model performance. The formal definition is presented as follows:

F 1 - Score = 2 \times \frac{P \times R}{P + R}

(5)

In addition, the complexity of a model can be quantified by the number of parameters and GFLOPs. Specifically, the number of parameters denotes the total trainable parameters in the model, whereas FLOPs indicate the count of floating-point operations during the model’s inference or training. One GFLOP corresponds to

10^{9} FLOPs

.

2.5. Model Architecture

In order to enhance the detection accuracy of complex ink droplets (such as stick-together ink droplets, trailing-tail ink droplets and satellite ink droplets) while reducing the model complexity, we proposed MBSim-YOLO, an improved version of YOLOv8 model. As shown in Figure 4, its architecture includes three core improvements: using the MobileNetV3 backbone network for lightweight feature extraction, employing the BiFPN neck to achieve effective multi-scale feature fusion, and integrating the SimAM block in the detection head to enhance the distinguishability of features. The input image is first processed by the MobileNetV3 backbone, which outputs a hierarchy of multi-scale feature maps (labeled P2 to P5 in Figure 4). These features are then fused bidirectionally by the BiFPN module (highlighted in blue) to combine both deep semantic and shallow spatial information. Finally, the refined features are passed through detection heads that each contain a SimAM attention module (highlighted in yellow) to adaptively weight important features before making final predictions. This design achieves an optimal balance between high precision and low computational cost, making it suitable for real-time droplet monitoring.

2.5.1. Lightweight Backbone Network

Lightweight models provide an effective solution to the issues of high complexity, large parameter size, and stringent deployment requirements associated with traditional models. The MobileNet series, introduced by the Google team, represents a class of lightweight neural networks specifically designed for embedded devices and demonstrates substantial significance and broad applicability in practical applications. The MobileNet series comprises three primary versions: MobileNetv1, MobileNetv2, and MobileNetv3 [27,28,29]. The MobileNetv1 network model is primarily constructed through the stacking of depthwise separable convolution modules, enabling effective control of the model’s scale and complexity by adjusting two hyperparameters. MobileNetv2 introduces bottleneck residual modules based on its predecessor, effectively maintaining network performance while further reducing model complexity and achieving a more lightweight architecture. In contrast, MobileNetv3 incorporates multiple advanced techniques, including depthwise separable convolution, pointwise convolution, SE attention mechanism, and inverted residual blocks, and adopts the novel activation function H-swish(x), as illustrated in Figure 5. Compared to traditional convolution methods, this model employs a strategy that integrates depthwise separable convolution with pointwise convolution. Specifically, each convolutional layer consists of one depthwise convolution kernel (DW) and multiple pointwise convolution kernels (PW). The DW kernel extracts spatial feature information from local regions in the image, while the PW kernel, being only 1 × 1 in size, enables dimensionality expansion or reduction by adjusting its count. This design not only effectively mitigates the parameter increase associated with dimensionality changes but also substantially enhances the model’s computational efficiency. Therefore, the model adopts a strategy of performing DW convolution followed by PW convolution. Moreover, the introduction of the SE attention module further strengthens the representation of channel information, thereby significantly enhancing the network’s feature extraction capability. The selection of the activation function in our MobileNetV3 backbone was guided by two primary criteria: enhancing feature representation in low-dimensional spaces and ensuring hardware-friendly efficiency. The process and rationale are detailed as follows:

Mitigating Information Loss: The standard ReLU6 activation, common in lightweight networks, can cause information loss when processing low-dimensional features. To better preserve feature information, we first considered the Swish function, which incorporates a smooth, non-monotonic property via a Sigmoid gate. Its expression is as follows:

Swish [x] = x \cdot Sigmoid (β x)

(6)

Here,

β

represents a learnable parameter, and x denotes the input of the activation function.

Ensuring Computational Efficiency: Although Swish addresses the representational issue, its dependence on the computationally expensive Sigmoid function is unsuitable for efficient deployment. Therefore, we adopted the H-swish function. It closely approximates Swish’s behavior using a piecewise linear formulation based on ReLU6, dramatically reducing computational cost. Its specific expression is as follows:

H - swish [x] = \frac{ReLU 6 x (x + 3)}{6}

(7)

By meeting both criteria, H-swish enhances feature representation capability while maintaining high computational efficiency, making it an ideal choice for the lightweight objectives of our MBSim-YOLO model.

Finally, the model incorporates the Inverted Residual module. In comparison to the conventional residual network, this module ensures consistency between the input and output feature map sizes, resulting in a spindle-shaped network structure. This design allows the model to maintain a relatively low computational cost while fully leveraging the performance-enhancing advantages of the residual network, thus significantly improving the model’s training effectiveness. Considering the limited computational capability of terminal detection devices in the inkjet printer droplet state detection domain, this paper adopts MobileNetV3 from the MobileNet series as the backbone network to realize model lightweighting, thereby enhancing detection efficiency.

2.5.2. Optimized Neck Feature Fusion Module

In the design of the neck network, this study adopts the weighted Bidirectional Feature Pyramid Network (BiFPN) as a multi-scale feature fusion module. By aggregating features across different resolutions, this module significantly strengthens the model’s feature extraction capability. Based on the original single-scale feature pyramid structure, Lin et al. proposed a network for addressing multi-scale problems—the Feature Pyramid Network (FPN), as illustrated in Figure 6a [30]. FPN integrates high-level feature maps containing semantic information with low-level feature maps containing detailed information via a top-down information flow pathway, preserving both high-level target information and low-level background details. This enhances detection accuracy in multi-scale detection tasks. However, subsequent research revealed that FPN incurs a high computational cost, thereby increasing the model’s training and inference times. Building upon this, the Path Aggregation Network (PANet) further improved performance by aggregating feature information across different feature layers, ensuring all feature information is fully utilized and achieving even greater detection accuracy, as depicted in Figure 6b [31].

As visually summarized in Figure 6, these architectures represent key stages in feature fusion design. In subfigure (a), FPN establishes a foundational top-down pathway (indicated by the arrows flowing from P5 to P3) to merge semantic and detail information. Subfigure (b) shows PANet, which augments this with an additional bottom-up pathway, creating a more complete but sequential information flow. Compared with FPN, PANet not only incorporates a more efficient bottom-up feature propagation mechanism but also preserves richer detailed information after feature fusion via cascading connections. As a result, it achieves superior performance while requiring fewer computational resources. The multi-scale feature fusion module in YOLOv8 is built upon PANet. While PANet improves upon FPN by adding a bottom-up path, its essentially sequential, unidirectional information flows (first top-down, then bottom-up) can limit cross-scale integration. In such a flow, the contribution of features at different resolutions is asymmetric; deep features are refined once by shallow features but not vice versa within the same pass, potentially leading to suboptimal fusion [32]. To address this, our study employs the weighted Bidirectional Feature Pyramid Network (BiFPN) as a superior multi-scale fusion module. BiFPN enhances “effective integration” through two key mechanisms: (1) learnable weight-based fusion, where inputs from different scales are adaptively weighted to reflect their varying importance to the output, and (2) an optimized bidirectional topology, which removes nodes with minimal contribution and adds extra cross-scale connections. This design enables true bidirectional and simultaneous multi-scale feature flow, allowing deep semantic information and shallow spatial details to interact and reinforce each other repeatedly. Consequently, BiFPN achieves a more balanced and powerful integration of features across all scales compared to its predecessors. During the training process, the fusion of feature information for small targets, such as satellite ink droplets, is relatively ineffective. Each node acquires only a limited amount of effective information, which in turn results in an increase in the model’s parameter count. BiFPN extends PANet by eliminating an input node that neither participates in feature fusion nor significantly contributes to the feature fusion network [33]. Simultaneously, it introduces a new channel to link the original input and output nodes, as depicted in Figure 6c. The structure of our adopted BiFPN module introduces two key improvements visible in the diagram: (1) Optimized Topology: Compared to PANet, it removes the redundant input node that provided little contribution (simplifying the flow), and adds extra cross-scale connections (e.g., the direct links from P6 and P7 to intermediate nodes), enabling more efficient and direct feature exchange. (2) Weighted Feature Fusion: Each connection is associated with a learnable weight parameter (w), applied during the fusion operation at each node (see Equation (8)), enabling adaptive calibration of multi-scale features. We believe this revision preserves the technical accuracy of the description and aligns with the journal’s emphasis on clarity and completeness of information.

Mathematically, this weighted fusion is implemented via a fast normalized operation, where learnable scalar weights for each input branch adaptively address the semantic and resolution disparities between features. As shown in Equation (8), these weights (

w_{i}

,

w_{j}

) are applied in a fast normalized fusion operation. They are optimized through backpropagation during training, allowing the network to dynamically calibrate the contribution of each input feature map. In practice, a higher learned weight indicates that the corresponding feature (e.g., a deep, semantically rich layer) is more critical for constructing a robust output at that node, while features from noisier or less relevant resolutions are implicitly suppressed. This weighted fusion ensures that information is balanced not by simple averaging but based on learned importance, significantly enhancing the quality of multi-scale representations [34]. The concrete two-step weighted fusion process at a BiFPN node (e.g., the 6th level) is detailed in Equation (9). BiFPN employs fast normalized feature fusion, with its mathematical expression given as follows:

O = \sum_{i} \frac{w_{i}}{ε + \sum_{j} w_{j}} I_{i}

(8)

Here,

w_{i}

and

w_{j}

denote distinct weight learning parameters, while

ε

is a constant set to 0.001 to ensure numerical stability.

I_{i}

represents the input feature. As illustrated in Figure 6c, the feature fusion mechanism of the 6th node is expressed by the following equation.

\{\begin{matrix} T_{6} & = Conv (\frac{w_{1} \cdot I_{6} + w_{2} \cdot R (I_{7})}{w_{1} + w_{2} + ε}) \\ O_{6} & = Conv (\frac{w_{1}^{'} \cdot I_{6} + w_{2}^{'} \cdot T_{6} + w_{3}^{'} \cdot R (T_{5})}{w_{1}^{'} + w_{2}^{'} + w_{3}^{'} + ε}) \end{matrix}

(9)

Here,

T_{6}

denotes the intermediate feature of the 6th level,

O_{6}

denotes the output feature of the 6th level, Conv refers to the separable convolution operation, R stands for the upsampling or downsampling operation, and w and

w^{'}

serve as the weight learning parameters.

2.5.3. Introduction of Attention Mechanism in Head Networks

In the inkjet droplet state detection task, one of the challenges is the low contrast between ink droplets and the background. To allow the model to adaptively focus on ink droplet states with less distinct features (e.g., Satellite droplets), this paper incorporates a self-attention mechanism module (SimAM) into the Head network layer. This enhances the model’s feature extraction capability while mitigating background interference. SimAM is a parameter-free, plug-and-play, and efficient attention mechanism module. Without increasing the parameter count of the original network, it further improves the accuracy of ink droplet state detection while also addressing the design requirement for model lightweighting. The SimAM attention mechanism introduced by Yang et al. is grounded in the spatial theory of visual neuroscience [35]. According to this theory, active neurons inhibit the activity of neighboring neurons, thereby warranting higher priority for active neurons. This priority is quantified by the energy function

e_{t}

(

w_{t}

,

b_{t}

,y,

x_{i}

).

e_{t} (w_{t}, b_{t}, y, x_{i}) = {(y_{t} - \hat{t})}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(y_{o} - {\hat{x}}_{i})}^{2}

(10)

Here,

y_{t}

and

y_{i}

are defined as the linearly transformed outputs of the target neuron t and the neighboring neuron

x_{i}

, i.e.,

y_{t} \equiv \hat{t}

and

y_{i} \equiv {\hat{x}}_{i}

. In the above equation,

\hat{t} = w_{t} t + b_{i}

and

{\hat{x}}_{i} = w_{i} x_{i} + b_{i}

respectively represent the linear transformations applied to the target neuron t and other neurons

x_{i}

. Here, i denotes the spatial dimension index, M indicates the total number of neurons, and

w_{i}

and

b_{i}

correspond to the weight and bias parameters, respectively.

Let the scalar values

y_{t}

and

y_{o}

be denoted by

- 1

and 1, respectively, and incorporate the regularization term

λ

. Consequently, the energy function can be formulated as:

\begin{matrix} e_{t} (w_{t}, b_{t}, y, x_{i}) & = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (w_{t} {\hat{x}}_{i} + b_{t}))}^{2} + {(1 - (w_{t} t + b_{t}))}^{2} + λ w_{t}^{2} \end{matrix}

(11)

From the above equation, the weight

w_{t}

and bias

b_{t}

can be derived as:

w_{t} = - \frac{2 (t - μ_{t})}{{(t - μ_{t})}^{2} + 2 σ_{t}^{2} + 2 λ}

(12)

b_{i} = - \frac{1}{2} (t + μ_{t}) w_{t}

(13)

In the formula,

μ_{t}

and

σ_{t}

respectively denote the mean and variance of all neurons in the channel excluding neuron t.

The formula for the final minimum energy

e_{t}^{*}

is derived as follows:

e_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{(t - μ^{2}) + 2 {\hat{σ}}^{2} + 2 λ}

(14)

The above equation demonstrates that the lower the minimum energy

e_{t}^{*}

, the more distinct this neuron is from its neighboring neurons, and consequently, the higher its importance in the network.

Based on the definition of the attention mechanism, let X denote the input features and E denote the minimum energy. By applying the Sigmoid function to filter excessively large values in E, SimAM can be formulated as follows:

\tilde{X} = Sigmoid (\frac{1}{E}) ⊙ X

(15)

Unlike existing attention modules, the SimAM module does not merely refine features by adding convolutional blocks in a single direction along either the channel or spatial dimension. Instead, it generates three-dimensional weights by fusing information from the spatial, channel, and feature dimensions and propagates these weights in the form of an energy function to subsequent layers. In the process of network feature extraction, spatial and channel information jointly contribute to information selection in visual processing, enabling a global perception of ink droplet morphological changes. The specific implementation of the SimAM attention mechanism is illustrated in Figure 7.

3. Experimental Result and Analysis

3.1. Experimental Environment and Parameter Settings

All experiments in this study were carried out on a consistent computer platform. The hardware specifications comprised: Windows 10 Professional operating system, an 11th-generation Intel® Core i9-11900K processor (3.50 GHz), 64 GB DDR5 RAM, and an NVIDIA GeForce RTX 3060 Ti SUPER GPU with 12 GB VRAM. The programming language used was Python 3.8.20, and the development framework employed was PyTorch 2.0.0+cu118. For more detailed configuration information, please refer to Table 2.

The experiment accelerated the training process by configuring cuDNN v8.9.7 with CUDA 12.7. Additionally, during the training of the ink droplet morphology detection model, a series of training parameters were fine-tuned to enhance the model’s performance. All models were trained from random initialization without leveraging external pretrained weights, ensuring the learned features were specifically optimized for the domain of ink droplet imagery. The entire training process was set to run for 200 epochs, with the key training parameters comprising: processing 16 images per batch, an initial learning rate of 0.01, and employing the SGD optimizer. To enhance training efficiency and accelerate convergence, this study implemented an early stopping mechanism with a Patience value of 20. Specifically, if no performance improvement was observed over 20 consecutive checks, the training process would terminate. Ultimately, training ceased at the 179th epoch, which also corresponded to the model’s best performance. Detailed training parameters are provided in Table 3.

To improve the model’s detection capability for inkjet droplet morphology, the mosaic data augmentation strategy was disabled during the final 10 training epochs. Furthermore, to strengthen the model’s generalization ability, various data augmentation techniques were employed, including auto-augmentation and random erasing.To ensure a comprehensive and statistically reliable evaluation, we adhere to the following protocols: (1) Model performance is evaluated using both the mean Average Precision at an IoU threshold of 0.5 (mAP50) and the more stringent COCO-style mAP averaged over IoU thresholds from 0.5 to 0.95 with a step size of 0.05 (mAP@[0.5:0.95]). (2) Critical ablation and comparison studies were conducted over five independent training runs with different random seeds to account for variance. The reported metrics are the mean values from these runs, demonstrating stable convergence as evidenced by the learning curves.

3.2. Ablation Experiment

This study introduced three critical enhancements to the original YOLOv8 model: substituting the backbone network of YOLOv8 with MobileNetV3, utilizing BiFPN as the neck network, and incorporating the lightweight self-attention mechanism SimAM. A series of six ablation experiments were performed to systematically assess the individual and combined contributions of the three proposed modules (MobileNetV3, BiFPN, SimAM) to the performance of the MBSim-YOLO model for ink droplet state detection. Additionally, these experiments aimed to validate the influence of the MobileNetV3 backbone network, BiFPN, and SimAM on the overall performance of the MBSim-YOLO detection model. This study systematically investigated and validated various combinations of the enhanced modules. Each row in Table 4 corresponds to a specific combination of the three enhancement modules, as clearly indicated by the checkmarks (✓) in the first three columns. The first row represents the YOLOv8n baseline without any of the proposed modules.

In the enhanced model, several innovative modules were incorporated, effectively minimizing the computational cost. Through the optimization of the backbone network using MobileNetV3, the model’s parameter count was reduced by 57.72%, and the GFLOPs decreased by 29.63%, accompanied by a minor reduction in accuracy. This approach substantially improved the model’s computational efficiency. By incorporating BiFPN into the neck network optimization of the YOLOv8 model, the Recall and mAP50 metrics were enhanced by 0.3 and 0.1 percentage points, respectively. By incorporating SimAM into the head network of the YOLOv8 model, the Precision and Recall metrics were enhanced by 0.6 and 0.5 percentage points, respectively.

Furthermore, the integration of the aforementioned modules leads to a further enhancement in network performance. By employing MobileNetV3 as the backbone network and introducing BiFPN, Precision is increased by 1.7 percentage points, while GFLOPs and Parameters are reduced by 66.67% and 54.94%, respectively. Building upon this foundation, the incorporation of SimAM further improves detection accuracy without substantially increasing the number of parameters or computational burden. In comparison to using only MobileNetV3 as the backbone network for YOLOv8, despite a slight increase in the number of parameters, Precision, Recall, F1 score, and mAP50 improve by 3%, 0.6%, 2%, and 0.2%, respectively. Notably, the more stringent mAP@[0.5:0.95] metric shows a more pronounced improvement of 2.4 percentage points (from 73.4% to 75.8%), indicating that the full set of enhancements (MobileNetv3 + BiFPN + SimAM) particularly benefits localization precision at higher IoU thresholds. To provide a more granular assessment of model performance, per-class mAP@[0.5:0.95] results are reported in Table 5, Each row in Table 5 corresponds to a specific combination of the three enhancement modules, as clearly indicated by the checkmarks () in the first three columns. The ablation analysis indicates that the SimAM attention module yields the most substantial individual gain on the challenging satellite droplet class. Furthermore, the full MBSim-YOLO configuration achieves either the best or competitive performance across all four droplet categories, demonstrating consistently balanced improvements. Notably, significant gains are observed in the two most difficult classes satellite and stick-together which are critical for robust and reliable inkjet print quality monitoring in industrial settings. Finally, through the comprehensive application of these three improvement strategies, compared with the original model, Parameters and computational burden are significantly decreased by 78.81% and 75.31%, respectively. Meanwhile, both Precision and Recall improved by 0.4 percentage points, while the F1 score and mAP50 increased by 0.1 percentage points, further validating the effectiveness of the proposed method. As illustrated in Figure 8a, the loss curve of the training set demonstrates that the integration of MobileNetV3, BiFPN, and SimAM not only substantially decreased the loss value but also significantly boosted the detection performance.

The confusion matrix is a standard tool for evaluating multi-class classification performance. In this study, it is used to visualize the detailed classification performance of our model across the four ink droplet states. The matrix is structured such that each row corresponds to the true (ground truth) droplet category, and each column corresponds to the predicted category. Therefore, each cell at the intersection of row i and column j displays the count (or proportion) of droplets with true class i that were predicted as class j. The values along the main diagonal represent the number of correctly classified droplets for each state, with higher values indicating better per-class accuracy. Conversely, the off-diagonal elements reveal the specific types and frequencies of misclassifications, identifying which droplet states are most frequently confused with one another [36]. It is important to note that the background label in the confusion matrix does not correspond to an annotated class. The annotation process of the experiment only draws bounding boxes for the four pre-cursor droplet states mentioned above. The backgroundrow and column are automatically generated by the evaluation toolkit to represent false positive detections (predictions that do not intersect with any ground-truth droplet) and false negatives (ground-truth droplets that were not detected), respectively. Therefore, the matrix primarily assesses the classifier’s ability to distinguish among the four droplet states, while the backgroundcells quantify localization errors against non-droplet regions. Figure 8b presents the confusion matrix of the enhanced YOLOv8 model. By utilizing the lightweight backbone network MobileNetV3, YOLOv8 achieves a notable increase in computational efficiency at the expense of some model accuracy. Upon integrating the BiFPN and SimAM modules into this architecture, the prediction accuracy for all four ink droplet states is further improved. Compared to the original model, MBSim-YOLO not only decreases the demand for computational resources but also enhances the detection accuracy of ink droplet states more effectively.

3.3. Comparison Experiments of Different Models

To verify the superior performance of the MBSim-YOLO model for ink droplet state recognition, a comprehensive comparative experiment was conducted against several advanced models. To ensure a fair and rigorous comparison, all baseline models were constructed as full object detectors and trained under an identical protocol, as shown in Table 3. Specifically, the models typically used as classification backbones were implemented within established detection frameworks: the ResNet entry denotes a RetinaNet detector with a ResNet-50 backbone and FPN neck; the MobileViT-xxs entry denotes an FCOS detector with a MobileViT-xxs backbone and a simple FPN neck; the EfficientNet entry denotes an EfficientDet-D0 detector with an EfficientNet-B0 backbone and BiFPN neck [37,38]. The ablation variants (+GhostNetv2, +ShuffleNetv2, +MobileNetv3) were constructed by replacing the backbone of our base YOLO-style detector. For clarity, Table 6 summarizes the detailed detector configurations of the key classification-based models. Table 7 provides a detailed comparison of the performance of each model in the ink droplet morphology detection task and presents specific comparison data. The ResNet model demonstrates exceptional performance in terms of Precision, Recall, F1 score, and mAP50, achieving values of 98.7% respectively. However, its high computational complexity, with GFLOPs at 150.8 and a parameter count of 187.4 M, indicates a significant demand for computing resources. This is primarily due to its deep network structure, which results in slower inference speeds and may become a performance bottleneck in high real-time tasks or scenarios with limited hardware resources. The EfficientNet model exhibits relatively balanced performance across various metrics. Compared to other large neural network models, it has fewer parameters (5.6 GFLOPs and 7.25 M parameters), significantly reducing storage space and computational resource requirements. However, its detection accuracy is slightly lower than that of the MBSim-YOLO model, with Precision, Recall, F1 score, and mAP50 being 4%, 1.3%, 3%, and 0.8% lower, respectively. The MobileViT-xxs model shows certain advantages in terms of computational complexity, with GFLOPs and parameter counts of 5.3 and 4.51 M, respectively, outperforming both ResNet and EfficientNet. However, its detection accuracy is relatively low, with Precision, Recall, F1 score, and mAP50 decreasing by 10.9%, 2.4%, 7%, and 4.3%, respectively, compared to MBSim-YOLO. Furthermore, MBSim-YOLO significantly outperforms MobileViT-xxs in terms of GFLOPs and parameter count, further highlighting its efficiency and lightweight characteristics. To comprehensively assess deployment feasibility, the inference efficiency of all models was evaluated on a CPU. The detailed per-stage latency (Preprocess, Inference, Postprocess) and the resulting frames-per-second (FPS) throughput are summarized in Table 8 and Figure 9. The proposed MBSim-YOLO achieves the fastest core inference time of 8.51 ms, which is substantially lower than that of YOLOv8n (19.51 ms) and YOLOv9c (30 ms). Compared to other efficient models, MBSim-YOLO is approximately 23% faster than EfficientNet (11.9 ms) and 35% faster than the variant using only the MobileNetV3 backbone (13.15 ms). Consequently, MBSim-YOLO attains the lowest end-to-end latency of 11.0 ms, translating to the highest throughput of 90.9 FPS. This performance far exceeds the common real-time threshold of 30 FPS, providing concrete evidence that the model’s architectural lightweighting directly enables high-speed inference suitable for embedded systems in industrial inkjet printers.

The comparison under the mAP@[0.5:0.95] metric provides further insight into the models’ localization robustness. As shown in Table 7, MBSim-YOLO achieves the highest score of 75.8%, a 1.3 percentage point lead over the next best model, YOLOv9c (75.2%), despite having only 2.5% of its parameters. This demonstrates that our lightweight design does not compromise precise boundary regression. A finer-grained, per-class performance analysis is provided in Figure 10. Crucially, for the most challenging Satellite droplet class, MBSim-YOLO’s mAP@[0.5:0.95] is 74%, outperforming other efficient models, highlighting its suitability for detecting small and fragmented droplets.

In this experiment, the YOLO series models demonstrated strong overall performance. Specifically, YOLOv8 achieved 97.8%, 98.7%, 98%, and 98.8% in Precision, Recall, F1 score, and mAP50, respectively. YOLOv9c reached 98.5% and 98.9% in Precision and mAP50, respectively. Additionally, YOLOv6 matched YOLOv8’s performance in Precision and F1 score, highlighting the robust capabilities of the YOLO series in object detection tasks. However, these models exhibit relatively high computational complexity. Compared to MBSim-YOLO, YOLOv6, YOLOv8, and YOLOv9c have GFLOPs values that are 9.8, 6.1, and 100.3 higher, respectively, and parameter counts that are 13.72 M, 9.04 M, and 94.07 M larger, respectively. By contrast, the MBSim-YOLO model outperformed all other models in terms of overall performance. It achieved Recall, F1 score, and mAP50 values of 99.1%, 99%, and 98.9%, respectively, with a computational cost of only 2.0 G—approximately 76.25% of the computational cost of YOLOv8. The convergence behavior of all models, tracked via both mAP50 and mAP@[0.5:0.95] over 200 epochs, is visualized in Figure 11. The curves show that MBSim-YOLO not only converges to a higher final performance in both metrics but also exhibits stable training dynamics with minimal fluctuations, especially in the critical later stages (epochs 160–200). In conclusion, MBSim-YOLO not only maintains high detection accuracy but also achieves substantial reductions in computational overhead, making it highly suitable for practical applications.

Furthermore, to further validate the effectiveness of the MBSim-YOLO model in reducing computational resource consumption, this study replaces the original backbone network of YOLOv8 with lightweight models, including ShuffleNetV2, MobileNetV3, and GhostNetV2 [20,39,40]. An objective comparison is conducted under the same experimental conditions. The experimental results are presented in Table 7. Owing to its unique network architecture and mobile MQA mechanism, MobileNetV3 achieves a 66.41% reduction in parameters compared to GhostNetV2 and a 25.73% reduction compared to ShuffleNetV2. In terms of detection performance, MobileNetV3 demonstrates an 8.7% improvement in Precision, a 1.9% increase in Recall, and a 4.1% enhancement in mAP@50% relative to ShuffleNetV2. However, when compared to GhostNetV2, MobileNetV3 exhibits a slight decline in detection accuracy. To address this issue, the enhanced MBSim-YOLO model successfully mitigates these deficiencies by incorporating an attention mechanism and a feature fusion module. As a result, the MBSim-YOLO model achieves significant improvements in detection accuracy over GhostNetV2, particularly in Precision (+2.8%), Recall (+0.6%), and F1 score (+2%). Moreover, compared to lightweight backbone networks such as GhostNetV2 and ShuffleNetV2, MBSim-YOLO substantially reduces computational complexity and parameter count, with GFLOPs reduced by 74.68% and Parameters reduced by 60%. These findings indicate that MBSim-YOLO is better suited for deployment on embedded devices and offers higher practical application value. Figure 12 shows the actual detection results of various comparative models during the CIJ jetting process. Among them, only the MBSim-YOLO model yields relatively satisfactory results in detecting stick-together ink droplets. The detection accuracies of the remaining comparative models are less than optimal, and some models even exhibit cases of missed detections and false alarms.

3.4. Evaluation of MBSim-YOLO Model’s Detection Performance

This study developed the MBSim-YOLO ink droplet state recognition model by integrating MobileNetV3, BiFPN, SimAM, and YOLOv8 structures. The MBSim-YOLO model was applied to conduct detection experiments on the dataset. The results demonstrated that the MBSim-YOLO model exhibited strong performance in the ink droplet state detection task. Specifically, the precision of the model’s detection results reached 98.2%, indicating its high-precision capability in identifying ink droplets. The Recall rate achieved 99.1%, which suggests that the model rarely missed detections during the recognition process, ensuring comprehensive detection coverage. Furthermore, the model’s mAP50 reached 98.9%, reflecting its high recognition accuracy. The F1 score attained 99%, highlighting the model’s balanced performance between Precision and Recall. These findings confirm that the MBSim-YOLO model possesses high accuracy and reliability in the ink droplets state detection task, making it a valuable technical solution for related applications.

The precision-confidence and recall-confidence curves in Figure 13 provide a nuanced view of the model’s detection behavior across different confidence thresholds. In the precision-confidence curve (Figure 13a), precision rises sharply from 20% to nearly 100% as the confidence threshold increases from 0 to about 0.2. This indicates that even at very low confidence thresholds, the model maintains a low false positive rate, a sign of strong inherent classification capability. Precision then plateaus near 100% for thresholds above 0.5, confirming the high reliability of its high-confidence predictions. In the recall-confidence curve Figure 13b), recall remains high (above 95%) for thresholds below 0.5, demonstrating comprehensive detection coverage. The gradual decline in recall as the confidence threshold increases beyond approximately 0.5 reflects a fundamental trade-off: the model becomes more conservative, prioritizing high-certainty predictions at the cost of potentially missing some difficult or ambiguous instances (e.g., faint satellite droplets). The optimal operating point balances these two metrics. The sustained high precision across most thresholds and the maintained high recall at moderate thresholds collectively validate the robust and reliable performance of MBSim-YOLO for ink droplet state recognition.

In addition, to evaluate the performance of the MBSim-YOLO model in detecting ink droplet states under various operating conditions of the inkjet printer, images were extracted from the test set for experimental validation. Given that the ink droplets in the images are relatively small, some images were cropped and background interference was reduced appropriately to better highlight the recognition results. Figure 14 illustrates the detection results of the MBSim-YOLO model for ink droplet states under different pressure and amplitude settings. The results indicate that the trained model is capable of accurately identifying ink droplet state categories.

4. Discussion

This study meticulously constructed a diverse dataset that comprehensively encompasses the data feature types of continuous inkjet printers under various operating conditions, thereby effectively enhancing the model’s generalization capability. Based on the evolution process of ink droplet states, the study innovatively categorized the droplet states into four distinct types, providing a robust foundation for subsequent model training and performance assessment. Through a series of rigorous comparative experiments, the results revealed that the improved MBSim-YOLO model achieved a significant enhancement in detection accuracy, demonstrating exceptional detection performance. This fully validates the model’s robustness and effectiveness in handling the complex operational states of inkjet printers. Moreover, the MBSim-YOLO model features a lightweight design, enabling efficient deployment within inkjet printers for real-time condition monitoring, thus ensuring high print quality. Although this study has achieved significant progress in ink droplet state detection, it is important to acknowledge that there remains potential for further enhancement of system performance. The current research is subject to the following limitations:

The model’s interpretability requires improvement. Due to the inherent interpretability challenges associated with convolutional neural networks, it is suggested to incorporate visualization tools such as heat maps during the detection process to elucidate the rationale behind the model’s decisions. This method can facilitate the evaluation of how different feature regions influence the accuracy of ink droplet state detection.
The MBSim-YOLO model demonstrates excellent performance in detecting Ball-shaped ink droplets, trailing-tail ink droplets, stick-together ink droplets, and satellite ink droplets with distinct characteristics. However, certain challenges remain. Specifically, the limited pixel proportion of small satellite ink droplets in images, their high similarity to the background, and potential occlusion by large stains restrict the model’s detection accuracy for very small satellite ink droplets generated under high-frequency jetting conditions. To overcome this limitation, future research will focus on expanding the dataset with samples of satellite ink droplets across various jetting frequencies, further optimizing the classification system to enhance the model’s ability to recognize fine details. Moreover, structural improvements to the model will be implemented to strengthen its feature extraction and interpretation capabilities in complex scenarios, thereby minimizing the loss of critical semantic information and enabling more precise detection of very small satellite ink droplets.
While the robustness to real-world industrial environmental variations requires further validation, the primary contribution of this study lies in the design and proof-of-concept of a lightweight deep learning architecture. To focus on the core task of droplet morphological classification, experiments were conducted under highly controlled conditions. The system’s robustness to several critical real-world variables essential for practical deployment has not yet been evaluated. This includes long-term environmental drift (e.g., gradual changes in lighting intensity, lens contamination, or ink accumulation on the nozzle), mechanical and operational variations (e.g., vibration from adjacent machinery, or the use of different nozzles or inks), as well as generalization across different devices and machine models. Investigating the model’s adaptability to these variations constitutes a primary objective of our subsequent work. We plan to conduct environmental accelerated life testing, introduce controlled perturbations to simulate hardware wear, and perform cross-validation on various industrial printing systems. Addressing these robustness issues is the crucial next step in transitioning the MBSim-YOLO model from a laboratory proof-of-concept to a robust industrial tool.
While this study has confirmed the feasibility of using the MBSim-YOLO model for detecting ink droplet states, it has not extensively investigated the detection methods for various ink droplet features, such as volume and area. Considering that the primary goal of this study is to enable the detection of ink droplet states, there remain limitations in the quantitative analysis of ink droplet characteristics. Future research will aim to integrate advanced object detection networks with techniques for extracting ink droplet shape features. By refining algorithm design and parameter optimization, the accuracy and reliability of ink droplet quality assessment can be further enhanced.

5. Conclusions

This study proposes an efficient deep learning model for monitoring the droplet states of continuous inkjet printers. The model is capable of classifying four representative droplet states: ball-shaped, stick-together, satellite, and trailing-tail droplets. By substituting the backbone network of YOLOv8 with MobileNetV3, the model significantly reduces both the number of parameters and computational resource requirements. MobileNetV3 leverages depthwise separable convolutions and a lightweight architecture to effectively decrease model complexity while preserving high detection performance. Furthermore, given that inkjet printer droplet states may exhibit multi-scale characteristics under varying printing speeds and nozzle conditions, the model incorporates the BiFPN module. BiFPN fuses multi-scale features via a bidirectional feature pyramid mechanism, thereby enhancing the model’s ability to detect objects of different sizes. This capability ensures that the model maintains high detection accuracy when addressing satellite, ball-shaped, trailing-tail, and stick-together droplets. To further improve the model’s focus on detail, the SimAM module was introduced. SimAM adaptively adjusts the critical parts of the feature map, thereby improving detection performance for small targets and complex backgrounds. It particularly excels in handling detailed features such as stick-together and satellite droplets. By integrating MobileNetV3, BiFPN, and SimAM into the YOLOv8 framework, the model achieves efficient, real-time, and high-precision detection in the task of inkjet printer droplet state detection, while also enhancing overall robustness.

Experimental results indicate that at a 0.5 Intersection over Union (IoU) threshold—where IoU measures the overlap between predicted and ground-truth bounding boxes, and a prediction is considered correct if the overlap ratio exceeds 0.5—the mAP values for the four types of droplets are as follows: ball-shaped droplets (99.4%), satellite droplets (99.3%), stick-togethe droplets (98.3%), and trailing-tail droplets (98.7%). Overall, this model demonstrates highly efficient and accurate performance in detecting inkjet printer droplet states. The incorporation of these technologies not only boosts the model’s real-time performance and computational efficiency but also improves its detection capabilities for diverse droplet states, including smaller or irregularly shaped droplets. Especially in scenarios involving complex backgrounds and varied droplet types commonly encountered in inkjet printers, the model provides more accurate and stable detection outcomes. Additionally, the model maintains relatively fast inference speeds when deployed on embedded devices or mobile applications.

Author Contributions

Conceptualization, J.X., J.W. and Q.W.; methodology, J.W., J.X. and X.D.; software, J.X., J.W. and X.D.; validation, J.W. and J.X.; formal analysis, J.X., J.W. and J.Y.; investigation, J.W., J.X. and W.D.; resources, J.X. and J.W.; data curation, J.X., J.W. and Q.Z.; writing—original draft preparation, J.X., J.W. and Q.W.; writing—review and editing, J.X., X.D. and J.W.; supervision, J.X. and J.W.; project administration, J.X., J.W. and Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Huangpu Special Project of the Science and Technology Service Network Initiative (STS) of the Chinese Academy of Sciences (Grant No.: STS-HP-202202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The relevant data of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This work was supported in part by the Natural Science Foundation of Guangdong Province under Grant U22A20221; in part by the National Natural Science Foundation of China under Grant 62073090; in part by the Chinese Academy of Sciences Science and Technology Service Network Program Huangpu special project under Grant STS-HP-202202; in part by the Natural Science Foundation of Guangdong Province of China under Grants 2023A1515011423, 2024A1515012090, 2023A1515240020; in part by the Key Laboratory of Marine Environmental Survey Technology and Application Ministry of Natural Resources P.R.China under GrantMESTA-2022-B001 and in part by the Special Fund for Scientific and Technological Innovation Strategy of Guangdong Province under Grant PDJH2023B0304 and Grant PDJH2024A225. And Guangdong Province Key Construction Discipline Research Capacity Enhancement Project No. 2024ZDJS021.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Shah, M.A.; Lee, D.G.; Lee, B.Y.; Hur, S. Classifications and applications of inkjet printing technology: A review. IEEE Access 2021, 9, 140079–140102. [Google Scholar] [CrossRef]
Singh, M.; Haverinen, H.M.; Dhagat, P.; Jabbour, G.E. Inkjet printing—Process and its applications. Adv. Mater. 2010, 22, 673–685. [Google Scholar] [CrossRef] [PubMed]
Tekin, E.; Smith, P.J.; Schubert, U.S. Inkjet printing as a deposition and patterning tool for polymers and inorganic particles. Soft Matter 2008, 4, 703–713. [Google Scholar] [CrossRef] [PubMed]
Haeberle, G.; Desai, S. Investigating rapid thermoform tooling via additive manufacturing (3d printing). Am. J. Appl. Sci. 2019, 16, 238–243. [Google Scholar] [CrossRef]
Desai, S.; Craps, M.; Esho, T. Direct writing of nanomaterials for flexible thin-film transistors (fTFTs). Int. J. Adv. Manuf. Technol. 2013, 64, 537–543. [Google Scholar] [CrossRef]
Aljohani, A.; Desai, S. 3D printing of porous scaffolds for medical applications. Am. J. Eng. Appl. Sci. 2018, 11, 1076–1085. [Google Scholar] [CrossRef]
Parupelli, S.K. A Comprehensive Review of Additive Manufacturing (3D Printing): Processes, Applications and Future Potential. Am. J. Appl. Sci. 2019, 16, 244–272. [Google Scholar] [CrossRef]
Ronan, D.; Tomás, S.H.; Graham, D.M.; Ian, M.H. Inkjet printing for pharmaceutics—A review of research and manufacturing. Int. J. Pharm. 2015, 494, 554–567. [Google Scholar]
Ryan, D.B.; Philip, R.M.; Justin, D.; Shane, S.; Roger, J.N. Inkjet printing for pharmaceutical applications. Mater. Today 2014, 17, 247–252. [Google Scholar] [CrossRef]
Brishty, F.P.; Urner, R.; Grau, G. Machine learning based data driven inkjet printed electronics: Jetting prediction for novel inks. Flex. Print. Electron. 2022, 7, 015009. [Google Scholar] [CrossRef]
Phung, T.H.; Kwon, K. How to manipulate droplet jetting from needle type jet dispensers. Sci. Rep. 2019, 9, 19669. [Google Scholar] [CrossRef] [PubMed]
Hengyu, L.; Junkao, L.; Kai, L.; Yingxiang, L. Piezoelectric micro-jet devices: A review. Sens. Actuators A Phys. 2019, 297, 111552. [Google Scholar]
Kwon, K.; Rahman, M.K.; Phung, T.H.; Hoath, S.D.; Jeong, S.; Kim, J.S. Review of digital printing technologies for electronic materials. Flex. Print. Electron. 2020, 5, 043003. [Google Scholar] [CrossRef]
Kwon, K.; Jang, M.; Park, H.Y.; Ko, H. An inkjet vision measurement technique for high-frequency jetting. Rev. Sci. Instruments 2014, 85, 065101. [Google Scholar] [CrossRef]
Kwon, K.; Zhang, D.; Go, H. Jetting frequency and evaporation effects on the measurement accuracy of inkjet droplet amount. In Proceedings of the NIP & Digital Fabrication Conference, Portland, OR, USA, 27 September 2015; pp. 19–28. [Google Scholar]
Kwon, K.; Kim, H.; Choi, M. Measurement of inkjet first-drop behavior using a high-speed camera. Rev. Sci. Instrumentss 2016, 87, 035101. [Google Scholar] [CrossRef]
Tianjiao, W.; Tsz-Ho, K.; Chi, Z.; Scott, V. In-situ droplet inspection and closed-loop control system using machine learning for liquid metal jet printing. J. Manuf. Syst. 2018, 47, 83–92. [Google Scholar]
Gaikwad, A.; Chang, T.; Giera, B.; Watkins, N.; Mukherjee, S.; Pascall, A.; Stobbe, D.; Rao, P. In-process monitoring and prediction of droplet quality in droplet-on-demand liquid metal jetting additive manufacturing using machine learning. J. Intell. Manuf. 2022, 33, 2093–2117. [Google Scholar] [CrossRef]
Ball, A.K.; Das, R.; Roy, S.S.; Kisku, D.R.; Murmu, N.C. Modeling of EHD inkjet printing performance using soft computing-based approaches. Soft Comput. 2020, 24, 571–589. [Google Scholar] [CrossRef]
Goh, G.D.; Sing, S.L.; Lim, Y.F.; Thong, J.L.J.; Peh, Z.K.; Mogali, S.R.; Yeong, W.Y. Machine learning for 3D printed multi-materials tissue-mimicking anatomical models. Mater. Des. 2021, 211, 110125. [Google Scholar] [CrossRef]
Huang, J.; Segura, L.J.; Wang, T.; Zhao, G.; Sun, H.; Zhou, C. Unsupervised learning for the droplet evolution prediction and process dynamics understanding in inkjet printing. Addit. Manuf. 2020, 35, 101197. [Google Scholar] [CrossRef]
Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-size object detection algorithm based on camera sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and applications. J. Manuf. Syst. 2018, 48, 144–156. [Google Scholar] [CrossRef]
Hadikhani, P.; Borhani, N.; Hashemi, S.M.; Psaltis, D. Learning from droplet flows in microfluidic channels using deep neural networks. Sci. Rep. 2019, 9, 8114. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.; Zhong, J.; Ma, T.; Huang, X.; Zhang, W.; Zhou, Y. Pavement distress detection using convolutional neural networks with images captured via UAV. Autom. Constr. 2022, 133, 103991. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18 June 2018; IEEE: New York, NY, USA, 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 2019; IEEE: New York, NY, USA, 2019; pp. 1314–1324. [Google Scholar]
Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21 July 2017; IEEE: New York, NY, USA, 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18 June 2018; IEEE: New York, NY, USA, 2018; pp. 8759–8768. [Google Scholar]
Li, J.; Li, J.; Zhao, X.; Su, X.; Wu, W. Lightweight detection networks for tea bud on complex agricultural environment via improved YOLO v4. Comput. Electron. Agric. 2023, 211, 107955. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13 June 2020; IEEE: New York, NY, USA, 2018; pp. 10781–10790. [Google Scholar]
Chen, J.; Mai, H.; Luo, L.; Chen, X.; Wu, K. Effective feature fusion network in BIFPN for small object detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AL, USA, 19 September 2021; IEEE: New York, NY, USA, 2018; pp. 699–703. [Google Scholar]
Yang, L.; Zhang, R.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Graz, Austria, 18 July 2021; pp. 11863–11874. [Google Scholar]
Jiale, L.; Chenglong, Y.; Xuefei, W. Real-time instance-level detection of asphalt pavement distress combining space-to-depth (SPD) YOLO and omni-scale network (OSNet). Autom. Constr. 2023, 155, 105062. [Google Scholar]
Seong, J.K.; Eunsik, C.; Dong, Y.W.; Gyuhyeon, H.; Kunsik, A.; Kyung-Tae, K.; Sanha, K. Accelerated deep-learning-based process monitoring of microfluidic inkjet printing. CIRP J. Manuf. Sci. Technol. 2023, 46, 65–73. [Google Scholar]
Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8 September 2018; Springer: Cham, Switzerland, 2018; pp. 116–131. [Google Scholar]
Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]

Figure 1. Ink Droplet State Observation Platform & Theoretical Analysis of Ink Droplet Formation. (a) Components of the experimental setup including Piezoelectric actuator, High Voltage Amplifier, and CCD acquisition system. (b) Three-phase droplet formation dynamics and force analysis. (c) Data acquisition and structural organization for ink droplets. (d) Deep Learning-Based droplets state detection model for continuous inkjet printing.

Figure 2. Schematic diagram of the structure principle of EC-JET1000 inkjet printer.

Figure 3. Illustration of ink droplet ejection process and state classification. (a) Sequence combination diagrams of droplet ejection events under different parameter conditions (with varying pressures and amplitudes). (b) Representative examples of the four defined droplet states (Stick-together, Trailing-tail, Satellite, Ball-shaped).

Figure 4. MBSim-YOLO’s model framework.

Figure 5. MobileNetV3’s model descriptions.

Figure 6. Structures of various feature fusion networks. (a) FPN. (b) PANet. (c) BiFPN.

Figure 7. SimAM attention structure diagram.

Figure 8. Performance analysis of ink droplet state detection models. (a) Training loss curves of six different object detection architectures; (b) Confusion matrix of the optimal MBSim-YOLO model on the test dataset.

Figure 9. Comparison of average latency and throughput (FPS) among different object detection models on a CPU platform.

Figure 10. Performance comparison of different detection models in various droplet shape categories. (a) mAP@0.5 performance comparison; (b) mAP@0.5:0.95 performance comparison.

Figure 11. Training dynamics of different models. (a) Changes in the mAP@0.5 evaluation metric; (b) Changes in the mAP@[0.5:0.95] evaluation metric.

Figure 12. The actual detection outcomes of various comparative models during the CIJ jetting process.

Figure 13. Precision and recall curves versus confidence thresholds for different classes. (a) Precision-confidence curves; (b) Recall-confidence curves.

Figure 14. The detection results of the model on the inkjet state of the CIJ under different pressure and amplitude combination working conditions.

Table 1. Combination of pressure and amplitude.

Sequence No.	Pressure Setting	Pressure (psi)	Amplitude (µm)
1	150	40	80
2	170	44.2	80
3	200	53	80
4	150	40	140
5	170	44.2	140
6	200	53	140
7	150	40	200
8	170	44.2	200
9	200	53	200

Table 2. Experiment environment configuration.

Laboratory Setting	Configuration Information
CPU	i9-11900K
GPU	RTX3060
Operating System	Windows 10
RAM	64 GB
Programming Language	Python 3.8.20
Deep Learning Framework	Pytorch 2.0.0+cu118

Table 3. Experiment hyperparameters.

Hyperparameters	Value
Epoch of train	200
Patience	20
Batchsize	16
Initial learning rate	0.01
Optimizer	SGD
weight_decay	0.0005
momentum	0.937
lr_scheduler_type	cosine
warmup_epochs	3.0
warmup_momentum	0.8
warmup_bias_lr	0.1
input size	136 × 1020
data augmentation	flip/resize
NMS threshold	0.7

Table 4. Comparison results of different models for ablation experiments (The best data for that column is shown in bold).

MoblieNetv3	BiFPN	SimAM	Precision (%)	Recall (%)	F1	mAP50 (%)	mAP50:95 (%)	GFLOPs	Parameters (M)
			97.8	98.7	0.98	98.8	74.5	8.1	11.47
✓			95.2	98.5	0.97	98.7	73.4	5.7	4.85
	✓		97.1	99.0	0.98	98.9	75.0	8.1	11.46
		✓	98.4	99.2	0.99	98.9	75.1	8.1	11.46
✓	✓		96.9	96.4	0.96	98.1	74.8	1.9	2.20
✓	✓	✓	98.2	99.1	0.99	98.9	75.8	2.0	2.43

Table 5. Mean Average Precision at IoU threshold 0.5:0.95 (mAP@0.5:0.95) performance comparison for various ink droplet morphology categories (the best data for that column is shown in bold).

MobileNetv3	BiFPN	SimAM	Ball-Shaped	Satellite	Stick-Together	Trailing-Tail
			0.749	0.731	0.731	0.769
✓			0.736	0.722	0.732	0.745
	✓		0.762	0.738	0.736	0.764
		✓	0.760	0.743	0.740	0.761
✓	✓		0.759	0.736	0.734	0.763
✓	✓	✓	0.762	0.740	0.759	0.772

Table 6. Detailed detector configurations for classification-based baselines.

Model	Backbone	Neck	Detection Head
ResNet	ResNet-50	FPN	RetinaNet Head
MobileViT-xxs	MobileViT-xxs	Simple FPN	FCOS Head
EfficientNet	EfficientNet-B0	BiFPN	EfficientDet Head

Table 7. Comparison of the performance among various network models (the best data for that column is shown in bold).

Model	Precision (%)	Recall (%)	F1	mAP50 (%)	mAP50:95 (%)	GFLOPs	Parameters (M)
YOLOv6	97.8	98.0	0.98	98.5	73.9	11.8	16.15
YOLOv8n	97.8	98.7	0.98	98.8	74.5	8.1	11.47
YOLOv9c	98.5	98.6	0.98	98.9	75.2	102.3	96.5
ResNet	98.7	98.1	0.98	98.7	75.1	150.8	187.40
MoblieViT-xxs	87.3	96.7	0.92	94.6	70.3	5.3	4.51
EfficientNet	94.2	97.8	0.96	98.1	73.5	5.6	7.28
+GhostNetv2	95.4	98.5	0.97	98.8	73.8	7.9	14.44
+ShuffleNetv2	86.5	96.4	0.92	94.6	69.7	5.0	6.53
+MoblieNetv3	95.2	98.5	0.97	98.7	73.4	5.7	4.85
MBSim-YOLO	98.2	99.1	0.99	98.9	75.8	2.0	2.43

Table 8. CPU inference time breakdown of different models (the best data for each column is shown in bold).

Model	Preprocess (ms)	Inference (ms)	Postprocess (ms)
YOLOv6	1.1	11.0	2.2
YOLOv8n	1.94	19.51	4.73
YOLOv9c	2	30	6
ResNet	1.2	36.7	1.5
MoblieViT-xxs	1.3	21.9	1.8
EfficientNet	1.0	11.9	1.6
+GhostNetv2	1.1	25.8	1.6
+ShuffleNetv2	1.03	17.09	1.82
+MoblieNetv3	1.0	13.15	2.87
MBSim-YOLO	1.0	8.51	1.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiong, J.; Wang, J.; Wang, Q.; Yang, J.; Dong, X.; Dai, W.; Zhang, Q. Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing. J. Sens. Actuator Netw. 2026, 15, 16. https://doi.org/10.3390/jsan15010016

AMA Style

Xiong J, Wang J, Wang Q, Yang J, Dong X, Dai W, Zhang Q. Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing. Journal of Sensor and Actuator Networks. 2026; 15(1):16. https://doi.org/10.3390/jsan15010016

Chicago/Turabian Style

Xiong, Jianbin, Jing Wang, Qi Wang, Jianxiang Yang, Xiangjun Dong, Weikun Dai, and Qianguang Zhang. 2026. "Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing" Journal of Sensor and Actuator Networks 15, no. 1: 16. https://doi.org/10.3390/jsan15010016

APA Style

Xiong, J., Wang, J., Wang, Q., Yang, J., Dong, X., Dai, W., & Zhang, Q. (2026). Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing. Journal of Sensor and Actuator Networks, 15(1), 16. https://doi.org/10.3390/jsan15010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Ink Droplet State Recognition for Continuous Inkjet Printing

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Apparatus

2.2. Ink Droplet Data Acquisition

2.3. Image Preprocessing and Dataset Preparation

2.4. Evaluation Metrics

2.5. Model Architecture

2.5.1. Lightweight Backbone Network

2.5.2. Optimized Neck Feature Fusion Module

2.5.3. Introduction of Attention Mechanism in Head Networks

3. Experimental Result and Analysis

3.1. Experimental Environment and Parameter Settings

3.2. Ablation Experiment

3.3. Comparison Experiments of Different Models

3.4. Evaluation of MBSim-YOLO Model’s Detection Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI