EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework

Wang, Kun; Wu, Yinchuan; Yan, Zhengguo

doi:10.3390/sym17091443

Open AccessArticle

EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework

by

Kun Wang

,

Yinchuan Wu

^* and

Zhengguo Yan

Engineering Research Center of Energy Equipment Intelligent & Visualize Detect Technology, Universities of Shaanxi Province, Xi’an Shiyou University, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(9), 1443; https://doi.org/10.3390/sym17091443

Submission received: 14 July 2025 / Revised: 21 August 2025 / Accepted: 28 August 2025 / Published: 3 September 2025

(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Characters on liquid crystal display (LCD) interfaces often appear densely arranged, with complex image backgrounds and significant variations in target appearance, posing considerable challenges for visual detection. To improve the accuracy and robustness of character detection, this paper proposes an enhanced character detection algorithm based on the DBNet framework, named EBiDNet (EfficientNetV2 and BiFPN Enhanced DBNet). This algorithm integrates machine vision with deep learning techniques and introduces the following architectural optimizations. It employs EfficientNetV2-S, a lightweight, high-performance backbone network, to enhance feature extraction capability. Meanwhile, a bidirectional feature pyramid network (BiFPN) is introduced. Its distinctive symmetric design ensures balanced feature propagation in both top-down and bottom-up directions, thereby enabling more efficient multiscale contextual information fusion. Experimental results demonstrate that, compared with the original DBNet, the proposed EBiDNet achieves a 9.13% increase in precision and a 14.17% improvement in F1-score, while reducing the number of parameters by 17.96%. In summary, the proposed framework maintains lightweight design while achieving high accuracy and strong robustness under complex conditions.

Keywords:

character detection; DBNet; machine learning; LCD interface; process control equipment

1. Introduction

Liquid Crystal Display (LCD) interfaces serve as critical human–machine interaction units in process control equipment. They perform core functions such as displaying system status, providing parameter feedback, and visualizing control information, significantly contributing to the operational precision, response speed, and stability of the equipment [1]. These interfaces are widely employed across diverse fields including industrial automation [2], power systems [3], petrochemical plants [4], and medical instruments. Their display performance directly impacts the reliability and safety of the equipment. In practical applications, LCD interfaces predominantly convey critical information—such as equipment operating status, operational commands, and alarm prompts—in textual form. Consequently, the integrity and clarity of these characters are paramount for ensuring accurate and timely operations.

During industrial production, devices like electric actuators often undergo LCD character defect inspection before leaving the factory. Traditional manual visual inspection, while intuitive, suffers from low efficiency, high labor intensity, and difficulty in scaling to meet mass production demands. With the rapid advancement of deep learning technologies, an increasing body of research is exploring automated algorithms to replace manual inspection, aiming for breakthroughs in both detection accuracy and processing speed [5]. Within the automated inspection workflow, character detection constitutes a crucial preliminary step. It is responsible for accurately localizing character regions within images, thereby providing reliable foundational data for subsequent recognition and reference-image-based defect verification.

Although Optical Character Recognition (OCR) technology has achieved significant success in recognizing natural scene text and printed documents, its application to industrial LCD character detection still faces multifaceted challenges: Firstly, LCD characters often exhibit low contrast, fine strokes, and susceptibility to blurring; Secondly, fonts lack standardization and may exhibit variations such as deformation and skew; Thirdly, character arrangements are frequently irregular, featuring significant scale variations and complex backgrounds. These factors not only increase the difficulty of character localization but also impose higher demands on the precision and robustness of detection models.

Current deep learning-based character detection techniques are primarily categorized into two major paradigms: regression-based methods and segmentation-based methods [6]. The former predominantly evolved from generic object detection models. For instance, TextBoxes [7] implements detection leveraging the SSD framework, while CTPN achieves sequential character localization by integrating Fast R-CNN [8]. These methods directly regress character bounding box coordinates, rendering them particularly suitable for horizontally aligned, regularly arranged characters [9]. To enhance robustness for multioriented character detection, the EAST [10] model introduced improvements in angle handling and inference speed, demonstrating promising results. However, regression-based approaches still exhibit limitations in precisely delineating curved or deformed characters, struggling to accurately fit complex character boundaries [11]. To address this, researchers have proposed segmentation-based character detection methods. These techniques generate character probability maps via pixel-level classification and subsequently derive refined character boundaries through post-processing steps [12]. Representative algorithms include PixelLink [13], DBNet [14], and FCENet [15]. Among these, DBNet enhances both detection accuracy and inference efficiency by introducing a learnable binarization module. FCENet, conversely, employs Fourier descriptors to model arbitrarily shaped characters, significantly strengthening its capability to detect complex curved characters.

Collectively, segmentation-based detection methods are increasingly becoming the predominant approach for natural scene character detection due to their superior adaptability. This is particularly evident in application scenarios characterized by complex character morphology and severe background interference, where they demonstrate outstanding performance. However, for LCD interface character detection in industrial environments, significant practical challenges persist. These challenges include high density of character arrangements, pronounced interference from display defects, and substantial scale variations of target characters.

Therefore, it is imperative to design a character detection model that excels in both accuracy and robustness to meet the dual requirements of detection precision and real time performance in industrial applications. Addressing the characteristics of industrial LCD interface character detection, this paper proposes improvements based on the classic DBNet framework, primarily encompassing the following three aspects:

(1): Introducing a Bidirectional Feature Pyramid Network (BiFPN) structure to achieve comprehensive multiscale feature fusion, thereby enhancing the detection accuracy and stability for characters of varying sizes;
(2): Constructing and expanding a dedicated LCD character dataset tailored for industrial scenarios. This better matches the distribution characteristics of high density, fine-grained characters and strengthens the model’s generalization capability under complex backgrounds and varying lighting conditions;
(3): The original ResNet50 backbone is substituted with the more efficient EfficientNetV2-S, which preserves detection accuracy while markedly enhancing feature extraction efficiency and inference speed, thereby achieving a favorable balance between precision and computational efficiency.

This approach effectively reduces computational overhead while ensuring detection performance, enhancing the model’s real time capability and deployment feasibility in industrial environments.

2. Related Works

In recent years, with the continuous advancement of deep learning technologies, their application research within industrial engineering scenarios has deepened significantly [16]. Confronted with the dual challenges of detection accuracy and real time performance in complex environments, researchers commonly adopt specific strategies to enhance models. These include optimizing network architectures, incorporating attention mechanisms, designing targeted loss functions, or replacing/augmenting critical modules. This improves both detection performance and environmental adaptability.

Within the domain of character detection, several studies have achieved notable progress in enhancing the traditional DBNet architecture. For instance, Guan et al. [17] proposed employing EfficientNetV2-S as the backbone network, integrated with a MultiScale DropKey Attention (MSKA) mechanism and an Efficient Multiscale Attention (EMA) module, specifically designed to address perspective distortion and complex background challenges, respectively. Wei et al. [18] augmented the detection head by incorporating an ROI Align bilinear interpolation mechanism, thereby enhancing the model’s perception of character edge information; this approach achieved an accuracy of 82.7% in detecting power equipment nameplates. Yao et al. [19] introduced DAMNet, an extension of DBNet, which incorporates a multiscale attention mechanism, an auxiliary Otsu branch, and an adaptive feature fusion structure, coupled with a dual loss strategy, effectively improving the accuracy and robustness for container number detection in complex environments. Li et al. [20] proposed the iSFF-DBNet model, fusing attention mechanisms with a bilateral up sampling module to enhance the adaptability of both the Feature Pyramid Network (FPN) and the binarization module. Zheng et al. [21] developed a DBNet-based detection model optimized with Coordinate Attention (CA), Convolutional Block Attention Module (CBAM), and Hierarchical Feature Aggregation (HFA) modules for the recognition network, significantly boosting character localization and global feature extraction capabilities, thereby enabling high precision and high efficiency extraction of weld seam information. Huang et al. [22] enhanced DBNet’s differentiable binarization network to improve text region detection. They introduced an Efficient Channel Attention (ECA) module to mitigate feature pyramid conflicts. Additionally, they modified the convolutional layer structure to delay down sampling, thereby reducing semantic feature loss. These modifications collectively improved detection accuracy and robustness.

Although these improvements have yielded satisfactory detection results across various scenarios, industrial liquid crystal display interfaces—which are characterized by dense characters, significant interference, and complex structural elements—still lack customized solutions capable of effectively balancing detection accuracy and real time performance. The network design proposed in this study is detailed in the following sections.

3. Methods

3.1. DBNet Architecture

As illustrated in Figure 1, the overall architecture of the DBNet network can be divided into three main stages: feature extraction, feature fusion, and prediction decoding [23]. Firstly, the input image is processed by the backbone network (ResNet) to extract multiscale features, denoted as C1~C5. The spatial resolution of these feature maps successively decreases (As presented in the figure, the resolutions are successively reduced to 1/2, 1/4, 1/8, 1/16, and 1/32 of the input image resolution). These multiscale features are fed into the Feature Pyramid Network (FPN) for feature fusion. Through operations including up sampling (UP×2, UP×4, etc.) and lateral connections (involving weighted fusion across layers), unified scale fused feature maps P1~P3 are formed. These fused feature maps retain high level semantic information while simultaneously enhancing spatial resolution. Finally, the fused feature maps are concatenated and processed through convolutional layers to generate a unified fused feature output [24].

During the prediction decoding stage, this unified feature output is used to simultaneously generate a Probability Map and a Threshold Map [25]. The Probability Map characterizes the confidence level of each pixel belonging to a text region. Conversely, the Threshold Map adaptively provides a reference binarization threshold for each pixel. Ultimately, the Approximate Binary Map is calculated through the joint contribution of the Probability Map and the Threshold Map, and this process enables high precision segmentation of text regions.

3.2. EBiDNet—Improved DBNet Network Structure

To improve the detection of false negatives and false positives in the LCD interfaces of process control equipment, a character detection model based on an improved DBNet is proposed. The network structure of the model is illustrated in Figure 2. The primary enhancements of this study are as follows:

(1): The lightweight network EfficientNetV2-S, which offers a better balance between model accuracy and computational efficiency, is introduced as the backbone network to replace the original ResNet. This substitution reduces both the model parameters and computational complexity while also enhancing feature extraction capability.
(2): The Bidirectional Feature Pyramid Network (BiFPN) is incorporated into the neck network to fuse multiscale features through bidirectional cross scale connections.

3.3. Backbone

To enhance the computational efficiency and feature representation capability of the detection model, this study introduces the lightweight backbone network EfficientNetV2 into an improved DBNet framework, replacing the original ResNet50. Compared to ResNet50, EfficientNetV2 offers significant advantages in computational efficiency: it employs a Compound Scaling strategy combined with Fused-MB Conv modules [26]. This approach effectively reduces the parameter size and computational complexity while maintaining or even surpassing the original detection accuracy, significantly decreasing the time and resource consumption required for model training and inference.

In terms of feature representation capability, EfficientNetV2’s hierarchical structure efficiently captures multiscale information. Shallow features retain fine spatial details, while deep features possess rich semantic abstraction capabilities, achieving a good balance between local information and global semantics [27]. It significantly reduces false positives and false negatives for characters that are blurred, low contrast, or deformed, thereby improving the robustness and accuracy of detection.

In this study, EfficientNetV2-S is used as the backbone for feature extraction. The EfficientNetV2 family includes S, M, L, and XL variants, which differ in scale and computational cost. Among them, EfficientNetV2-S provides a good trade-off between model performance and computational efficiency. As shown in Table 1, its architecture comprises 8 distinct stages. Key specifications include:

(1): The core operational modules within each stage consist of a Conv3 × 3 convolutional layer, Fused-MB Conv modules, and MB Conv modules. The table details the stride operation, output channel count, and the number of times each module is repeated within its stage.
(2): Within the Fused-MB Conv module, the notation 1 or 4 denotes the expansion ratio, while k3 × 3 specifies a convolutional kernel size of 3 × 3.
(3): When the expansion ratio equals 1 (expansion ratio = 1), the main branch contains only a 3 × 3 convolutional layer. When the expansion ratio differs from 1 (expansion ratio ≠ 1), a 1 × 1 convolutional layer follows the initial 3 × 3 convolution in the main branch.

Figure 3 illustrates the overall architecture of EfficientNetV2. The model employs a stacked stage structure for progressive feature extraction. The initial Stem module performs primary feature transformation. Stages 1 to 3 utilize Fused-MB Conv modules, optimizing fusion and enhancement of low level features. Stages 4 to 6 adopt MB Conv modules to deepen feature representations and enrich semantic information.

The Fused-MB Conv module first extracts local features via a 3 × 3 standard convolution. It then incorporates the Squeeze-and-Excitation (SE) attention mechanism to recalibrate channel weights. Finally, a 1 × 1 convolution is employed to integrate the features. The structure of the SE attention mechanism is illustrated in Figure 4.

The SE module primarily comprises three stages: Squeeze, Excitation, and Recalibration [28]. Its structure is depicted in Figure 3, and its computational procedure is outlined as follows:

(1): Squeeze Stage

Given an input feature map

F \in R^{H \times W \times C}

, global average pooling is applied to perform spatial compression on each channel. This process yields a channel wise descriptor vector

x_{c}

, as expressed by Equation:

x_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(1)

where

u_{c} (i, j)

denotes the pixel value at spatial coordinates

(i, j)

for the

c

,

x_{c} \in R^{C}

.

(2): Excitation Stage

The compressed vector x is fed into a two layer fully connected network. It first passes through a ReLU activation function, followed by a Sigmoid function that maps the values to the range [0, 1], yielding the channel weights:

x_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(2)

s i g m o i d = \frac{1}{e^{- x} + 1}, x \in [0, 1]

(3)

The overall process for calculating the channel weights is:

F^{'} = \frac{1}{1 + e^{- W_{2} (Re l u (0, W_{1} (x)))}} \otimes F

(4)

where

W_{1} (\cdot)

,

W_{2} (\cdot)

represent the two fully connected layers, respectively.

(3): Recalibration Stage

The obtained channel weights

s \in R^{C}

are used to perform a channel wise multiplication (weighting) with the original input feature map

F

. This outputs the recalibrated feature map

F^{'}

, where salient channels are enhanced and less salient channels are suppressed. This process improves the feature representational capability of the model.

The MB Conv module enhances efficiency by adopting depthwise separable convolution. Specifically, it first performs an n × n depthwise convolution on each individual channel to capture spatial information. Then, a 1 × 1 pointwise convolution is applied to integrate inter channel features and adjust feature dimensions. With this design, the module maintains strong representational ability while markedly decreasing parameter count and computational overhead, as shown in Figure 5.

The two modules collaborate synergistically at different stages, effectively enabling the model to strike a balance between representational capability and computational efficiency throughout the feature extraction and transformation processes.

3.4. Neck

To enhance the adaptability and robustness of the character detection algorithm during multiscale feature fusion [29], this paper introduces improvements and optimizations to the feature fusion module within the original DBNet architecture. As illustrated in Figure 6, traditional Feature Pyramid Networks (FPN) [30] typically adopt a unidirectional bottom-up feature propagation path: the original multiscale feature layers (P1–P5) are sequentially propagated via up sampling operations to their corresponding intermediate feature layers (H1–H5). However, this unidirectional fusion strategy is susceptible to the effects of operations like repeated down sampling during propagation. This susceptibility leads to the progressive degradation of spatial localization information, which significantly impairs the detection accuracy for fine-grained text instances, particularly those at small scales.

To address the aforementioned limitations, this paper replaces the traditional Feature Pyramid Network (FPN) with the Bidirectional Feature Pyramid Network (BiFPN) [31]. As illustrated in Figure 7, BiFPN enhances the efficiency of information exchange and the quality of fusion between multiscale features by introducing symmetrical bidirectional fusion pathways (top-down and bottom-up) and a learnable feature weighting mechanism. This symmetrical architectural design not only strengthens feature complementarity but also effectively maintains the balance and consistency of multiscale information.

Core Architecture Principle: The fundamental concept of this architecture is to achieve bidirectional information interaction and fusion across multiple scales, thereby enhancing the network’s perception capability for targets of varying sizes. In Figure 7, the leftmost 1 × 1 convolutional module serves as the feature entry point, performing channel dimension adjustment on the input raw feature maps. Here, P1 represents higher level semantic features, while P4 emphasizes lower level, fine-grained textural information. The fusion process comprises two distinct stages:

(1): Bottom-up path (red arrows)

Propagation Logic: Starting from low-level, fine-grained features (e.g., the base features extracted by the P4 branch), features propagate upward level-by-level along the path indicated by the red arrows to the P3, P2, and P1 branch layers. During this upward propagation, features effectively carry and transmit rich edge information and textural details. This process ensures the integrity and continuity of fine-grained features within the multiscale feature pyramid, providing a solid foundation for subsequent enhancement of high-level semantic information.

Intermediate Feature Generation (Nodes Q1–Q3): During propagation, features converge at the green ‘C’ modules for cross-branch fusion. The resulting intermediate fused features generated at nodes Q1–Q3 simultaneously exhibit fine-grained representations of low-level details and preliminary integration of multiscale information. This effectively enhances the richness and discriminative power of the feature representations, laying both a theoretical and practical foundation for subsequent semantic feature enhancement and object detection.

(2): Top-down path (blue arrows)

Propagation Logic: Starting from high-level semantic features (e.g., the abstract representation from the fused Q1), features propagate downward along the path indicated by the blue arrows to the Q2, Q3, and P series multiscale branches. During this downward propagation, semantic features rich in category information and global structure are effectively transmitted to lower-level features. This augments their semantic representation capacity and discriminative power.

Final Fused Feature Output (Nodes H1–H4): During the downward flow of semantic information, features undergo cross-scale fusion and convolutional refinement via the green ‘C’ modules. This results in the generation of multiscale fused features at the output nodes H1–H4. These final features integrate the fine-grained textural information propagated upward from the bottom-up pathway with the high-level semantic information propagated downward from the top-down pathway. This achieves an efficient consolidation of local details and global semantics, significantly boosting the expressiveness richness and discriminative performance of the features.

Each fusion node integrates Module C (green blocks), which processes multiple input features through two sequential components:

(1): Fast Normalized Fusion

This component adaptively assigns learnable weights to feature maps from different sources. It employs normalization to achieve stable and effective weighted feature fusion, enhancing both fusion stability and representational capacity. Specifically, as follows:

Weighted Feature Fusion: For input feature maps

X_{1}, X_{2}, \dots, X_{n}

, dynamically weighted fusion is achieved through learnable fusion weights

w_{1}, w_{2}, …, w_{n}

, enhancing the adaptability and discriminative power of feature fusion. The computational procedure is as follows:

(a): Non-negative Rectification via ReLU:

{\overset{⌢}{w}}_{i} = \max (0, w_{i})

(5)

where

w_{i}

denotes the learnable fusion weight for the i-th input feature. The ReLU activation ensures non-negativity, guaranteeing numerical stability during fusion.

(b): Weight Normalization:

α_{i} = \frac{{\overset{⌢}{w}}_{i}}{\sum_{j = 1}^{N} {\overset{⌢}{w}}_{j} + ε}

(6)

Here,

α_{i}

represents the normalized weighting coefficient, satisfying

\sum α_{i} = 1

.

(c): Weighted Fusion Output:

Y = \sum_{i = 1}^{N} α_{i} \cdot X_{i}

(7)

Equivalently:

Y = \sum_{i = 1}^{N} (\frac{\max (0, w_{i})}{\sum_{j = 1}^{N} \max (0, w_{j}) + ε} \cdot X_{i}

(8)

where

ε

is a small constant (e.g., 10⁻⁴) preventing division by zero.

(2): Depthwise Separable Convolution

This operation replaces standard convolution, significantly reducing model parameters and computational costs while maintaining strong feature extraction capabilities, See Figure 5 for the specific structure.

3.5. Head

3.5.1. Head Network Architecture

As illustrated in Figure 8, the probability map (P) and threshold map (T) generated by the Neck layer are integrated to produce the approximate binary map (B). Here, P denotes the confidence score of a pixel belonging to a text region, while T dynamically allocates an adaptive threshold to each pixel [32]. The integration of these two maps enables B to precisely delineate character boundaries and strengthen responses to fine-grained edges. During training, P and B share identical supervision labels to enhance detection performance, whereas T learns optimal thresholds via regression loss, thereby achieving adaptive boundary modeling.

3.5.2. Principle of Differentiable Binarization

In segmentation-based text detection methods, converting the probability map into a binary map typically involves applying a fixed threshold during the post-processing step. The standard binarization function can be expressed as:

B_{i, j} = \{\begin{cases} 1 {, P}_{i, j} \geq t \\ 0, otherwise \end{cases}

(9)

where P represents the input probability map,

(i, j)

denotes the coordinate, t is the threshold, and

B_{i, j}

is the output binarized map. However, the standard binarization method is non-differentiable, thus preventing end-to-end training of the network. To address this limitation, the Differentiable Binarization (DB) algorithm approximates the step function used in standard binarization. It replaces it with the following differentiable function:

\overset{⌢}{B} = \frac{1}{1 + e^{- k (P_{i, j} - T_{i, j})}}

(10)

Here,

\overset{⌢}{B}

represents the approximated binarized map, P is the input probability map, T denotes the learned threshold map, and k is the amplification factor.

3.5.3. Adaptive Threshold

During training, both the probability map P and the approximated binarized map B are supervised using the same ground truth labels. To simultaneously enhance the model’s discriminative capability within text regions and improve boundary localization accuracy, we apply a controlled shrinking operation to each annotated text polygon. The offset D is computed as follows:

D = \frac{A (1 - r^{2})}{L}

(11)

where L denotes the perimeter of the annotated polygon, A represents its area, and r is a preserved scaling factor empirically set to r = 0.4.

3.5.4. Loss Calculation

The composite loss function is defined as:

L = L_{s} + α L_{b} + β L_{t}

(12)

where

L_{s}

denotes the shrunk map loss,

L_{b}

represents the binary cross-entropy loss, and

L_{t}

is the threshold map loss. The weighting coefficients

α

and

β

are empirically set to 1 and 10. Both

L_{s}

and

L_{b}

employ the Binary Cross-Entropy (BCE) loss:

L_{b} = L_{s} = \sum_{i \in S_{l}} y_{i} \log x_{i} + (1 - y_{i}) \log (1 - x_{i})

(13)

where

S_{l}

is the sampled data subset, with a ratio of positive to negative samples of 1:3.

The loss term

L_{t}

employs the Mean Absolute Error (MAE) loss, defined as:

L_{t} = \sum_{i \in R_{d}} |y_{i}^{*} - x_{i}^{*}|

(14)

Here,

R_{d}

represents the set of all pixels within the region

G_{d}

obtained by expanding the annotated bounding box using offset D.

y^{*}

denotes the computed ground truth threshold map label, and

x^{*}

corresponds to the model’s predicted value.

4. Experiments and Results

4.1. Dataset

The dataset employed in this study was self-collected by the researchers. It primarily encompasses LCD interfaces from various types of process control equipment, with a primary focus on the liquid crystal displays of electric actuators, as illustrated in Figure 9.

4.1.1. Construction of the Dataset Acquisition Platform

This data acquisition platform targets industrial applications, as illustrated in Figure 10. It is centered on an electric actuator as the control subject and uses a Machine Vision Camera as the sensory core. The platform integrates computer-based acquisition control to establish an integrated hardware–software intelligent acquisition system. The system enables precise capture and transmission of equipment status information, providing foundational data support for industrial automation control and equipment operation and maintenance management.

(1): Electric Actuator: As a critical industrial control device, it provides precise actuation for valves, dampers, and related mechanisms. Its embedded sensors deliver real-time feedback on status parameters including valve opening position, motor rotation speed, and current. Serving as the primary data acquisition target, it furnishes fundamental execution-end data for industrial process control.
(2): Machine Vision Camera: Equipped with high-definition imaging modules and intelligent algorithms, it captures and analyzes visual data from the electric actuator’s display interface, physical appearance, and operational status. This visual information is converted into digital signals, enabling non-contact, visual data acquisition. This approach overcomes the limitation of conventional sensors, which typically capture only single-parameter measurements.
(3): PC: It manages the synchronized operation of the actuator and camera, ensures precise timing of data acquisition, and handles the reception, storage, and preliminary processing of data, thereby laying the groundwork for subsequent analysis.

4.1.2. Data Collection and Partitioning

During the data acquisition phase, comprehensive coverage of diverse application scenarios, font types, and display content variations was ensured. This included incorporating different background environments, display resolutions, character arrangements, and graphical symbol combinations. Figure 11 shows representative samples from the dataset. Consequently, the overall dataset exhibits high balance and representativeness in terms of both sample diversity and class distribution. Building upon this foundation, the dataset was randomly partitioned into training, validation, and test sets using an 8:1:1 ratio. Owing to the extensive diversity achieved during collection, this partitioning method naturally preserves the proportional distribution of different classes, scene conditions, and display characteristics across all subsets. This effectively prevents data distribution bias, thereby ensuring the scientific validity and generalizability of the experimental results.

4.1.3. Training Augmentation Strategies

To further enhance the model’s generalization capability and robustness, multiple augmentation strategies tailored to the specific characteristics of character detection tasks were introduced during the training phase. These strategies aim to expand the feature space of the samples and improve the model’s adaptability to complex inputs. The specific augmentations include:

(1): Horizontal Flipping: Applied with a probability of 0.5 to increase diversity in character orientation along the horizontal axis.
(2): Random Rotation: Performed within the range [−10°, 10°] to enhance the model’s ability to handle slightly skewed characters.
(3): Random Scaling: Executed within the scaling factor range [0.5, 3] to improve detection performance for characters of varying sizes.
(4): Random Cropping: Images are randomly cropped to an output size of 640 × 640 while preserving the aspect ratio, thereby strengthening the model’s recognition capability for local features and partially occluded characters.

These augmentation operations collectively function through spatial geometric transformations and scale perturbations. They significantly boost the model’s detection robustness under conditions involving varying orientations, different scales, and complex backgrounds.

4.2. Training Details

The experiments are conducted in a Windows 10 environment using an RTX4060 GPU, and the project is developed using PyCharm 2021. The detailed experimental environment configuration is presented in Table 2.

This study proposes EBiDNet for character detection in LCD interfaces. After the model converges on the training set, its performance is evaluated on the test set. During training, the Adaptive Moment Estimation (Adam) optimizer is employed, with the decay rates of the first and second moments set to 0.9 and 0.999, respectively, and the learning rate fixed at 0.0001. The amplification factor k of the Differentiable Binarization is empirically set to 50. All training images are resized to 640 × 640, while test images are resized to 960 × 960. The batch size is set to 8, and the total number of training epochs is 100.

4.3. Evaluation Indicators

Model performance is evaluated using four metrics: Precision, Recall, F1-score, and Frames Per Second (FPS). The essential confusion matrix components are defined as: TP: Positive instances correctly predicted as positive; FP: Negative instances incorrectly predicted as positive; FN: Positive instances incorrectly predicted as negative. The corresponding evaluation metrics are defined by these formulae:

(1): Precision: Refers to the ratio of the number of correctly identified positive samples to the total number of samples predicted as positive, as expressed in the following formula:

p r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(15)

(2): Recall: Refers to the ratio of the number of correctly identified positive samples to the total number of actual positive samples, as expressed in the following formula:

r e c a l l = \frac{T P}{T P + F N} \times 100 %

(16)

(3): F1-Score: Refers to the harmonic mean of Precision and Recall, used to measure the balanced performance of a classification model between these two metrics, as expressed in the following formula:

F 1 = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(17)

(4): FPS (Frames Per Second): Refers to the number of image frames a model can process per second. In this experiment, FPS is calculated by measuring the average time required for the model to process 105 images, as expressed by the formula below:

F P S = \frac{105}{\sum_{i = 1}^{105} t_{i}}

(18)

These four aforementioned evaluation metrics collectively constitute a well-suited framework for assessing the character detection model in this study.

4.4. Ablation Experiment

To evaluate the individual contributions of these improvements to overall model performance, a series of systematic ablation studies were designed and conducted. Under identical training settings, various model configurations were compared in terms of precision, recall, F1-score, number of parameters, and FPS.

As shown in Table 3, each structural improvement enhanced performance to varying degrees, validating the effectiveness of the proposed modules. When using ResNet50 as the backbone network, the introduction of BiFPN increased precision, recall, and F1-score by 2.05%, 7.07%, and 4.89% respectively, while the parameter count remained almost unchanged. Furthermore, the inference speed improved significantly from 18.87 FPS to 20.88 FPS, indicating that BiFPN offers superior efficiency in feature fusion. Further, while retaining the FPN structure, replacing the backbone network with EfficientNetV2S resulted in a significant increase in F1-score to 88.28% compared to the baseline model DBNet (77.77%), alongside a 17.4% reduction in model parameters. This demonstrates that a lightweight backbone network can reduce model complexity without compromising accuracy.

By integrating EfficientNetV2S and BiFPN, we formed the complete improved model, referred to as Ours. Compared to the original DBNet, ours achieved the best performance across all evaluation metrics: the F1-score improved by 14.17% (from 77.77% to 91.94%), the parameter count decreased by 17.96% (from 24.55 M to 20.14 M), and the FPS decreased only slightly by 0.26 (from 18.87 to 18.61). These results fully demonstrate that combining EfficientNetV2S with BiFPN provides complementary advantages in enhancing feature representation capability and inference efficiency.

To further assess the performance improvement of the proposed model in character detection tasks, three LCD interface images were randomly selected from the test set as validation samples. Detection was performed using the enhanced EBiDNet (green bounding boxes) and the original DBNet (red bounding boxes), with the results shown in Figure 12. It can be observed that the original DBNet occasionally produces bounding boxes that encompass multiple characters (first group in Figure 12a) or generates boxes that are excessively large and fail to accurately follow the character contours (second group in Figure 12a). In contrast, EBiDNet can more precisely isolate individual characters and achieve accurate localization even in complex backgrounds, demonstrating superior robustness and adaptability. These findings further confirm the effectiveness and reliability of the proposed approach in practical application scenarios.

4.5. Comparison Experiments

As shown in Table 4, EBiDNet achieves the best performance across the three core metrics—precision, recall, and F1-score—with precision, recall, and F1-score reaching 93.63%, 90.32%, and 91.94%, respectively, all significantly outperforming the other compared models. Compared with the relatively strong FCE model, EBiDNet exhibits a notable improvement in F1-score, indicating that it maintains high precision while possessing a stronger ability to capture all relevant characters, thereby achieving superior recall completeness. Furthermore, when compared with various state-of-the-art (SOTA) models such as EAST, DB++, DRRG, and CT, EBiDNet demonstrates significant gains in both recall and overall performance, providing strong evidence of its enhanced robustness and generalization capability, particularly when handling images with complex backgrounds or severe interference.

To comprehensively evaluate the performance advantages of the proposed model, it was compared with several representative improved character detection models in the relevant field. The experimental results are shown in Table 5.

As shown in Table 5, EBiDNet achieves relatively high precision and stable recall across most comparison methods, with an F1-score of 91.94%, demonstrating strong overall detection capability. Compared with DDP-YOLOv8, EBiDNet improves precision by 7.33 percentage points, significantly reducing the false positive rate. This improvement is mainly attributed to the introduction of the BiFPN module into the DBNet framework, which enhances multiscale feature fusion and enables more accurate localization of character edges and shapes. However, due to the lightweight nature of the architecture, its responsiveness to low-contrast and small-scale characters is slightly inferior to that of DDP-YOLOv8, resulting in a 6.68-percentage-point decrease in recall.

Compared with the pixel-level interpolation and pooling optimization method in F1, EBiDNet achieves notable improvements in both precision and recall, indicating that the cross-layer feature interaction of BiFPN offers stronger generalization capability under complex backgrounds. Both iSFF-DBNet and DB-EAC exhibit insufficient recall in challenging scenarios, whereas EBiDNet, benefiting from multiscale feature fusion and edge-information enhancement, outperforms these methods in both metrics.

Relative to DBNet-CST, EBiDNet improves precision by 1.43 percentage points, recall by 9.22 percentage points, and F1-score by 5.64 percentage points, suggesting that the bidirectional weighted fusion of BiFPN effectively mitigates DBNet-CST’s missed detections for extremely weak character features. Compared with the multi-granularity attention-based method in F2, EBiDNet achieves 6.12 percentage points higher recall but 5.19 percentage points lower precision, reflecting that EBiDNet’s false-positive suppression in specific scenarios such as sparse inkjet characters is slightly inferior to that of the multi-granularity attention mechanism. EDNet, which combines an EfficientNetV2-S backbone with multiscale key attention (MSKA) and efficient multiscale attention (EMA) modules, achieves the highest precision and recall under perspective distortion and complex backgrounds, leaving EBiDNet with a notable performance gap in overall metrics.

In summary, EBiDNet strikes a favorable balance between precision and recall for most methods, though its precision in extremely low-contrast, small-scale characters, and certain sparse-pattern scenarios still warrants further improvement.

5. Conclusions

This paper proposes EBiDNet, a high-precision character detection model designed for LCD interfaces in process control devices. Built on an optimized DBNet framework, the model employs EfficientNetV2-S as the backbone, which enhances feature extraction capability while reducing parameter count and computational cost. A BiFPN module is incorporated to capture critical character regions and multiscale contextual information more accurately through bidirectional cross-scale fusion and channel attention mechanisms. Experimental results demonstrate that:

(1): Superior detection performance: EBiDNet achieves an F1-score of 91.94% on the original dataset, improving by 14.17 percentage points over DBNet while reducing the parameter count by 17.96%, confirming the synergistic benefits of EfficientNetV2-S and BiFPN;
(2): Enhanced robustness and adaptability: EBiDNet demonstrates strong robustness across diverse and complex industrial scenarios, effectively handling a wide range of liquid crystal display character interfaces;
(3): Comparative advantages: EBiDNet demonstrates significant improvements in both detection accuracy and the integrity of character regions compared with mainstream methods such as DBNet, EAST, and FCE. However, relative to some other advanced improved algorithms, the proposed method still exhibits certain limitations. Future work could explore the incorporation of channel and spatial attention mechanisms, adversarial data augmentation, and deformable convolutions to further enhance character detection performance in complex scenarios.

Author Contributions

Conceptualization, K.W. and Y.W.; Data curation, K.W. and Z.Y.; Formal analysis, K.W.; Funding acquisition, Y.W. and Z.Y.; Investigation, Y.W. and K.W.; Methodology, Y.W. and K.W.; Resources, Y.W.; Supervision, Y.W. and Z.Y.; Validation, K.W. and Y.W.; Writing—original draft, K.W.; Writing—review and critical editing, Y.W. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shaanxi Province Qin Chuangyuan “Scientist + Engineer” Team Building Project (Grant No. 2023KXJ-162).

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, S.F.; Lai, C.C. The Defect Classification of TFT-LCD Array Photolithography Process via Using Back-Propagation Neural Network. Appl. Mech. Mater. 2013, 378, 340–345. [Google Scholar] [CrossRef]
Shah, R.; Doss, A.S.A.; Lakshmaiya, N. Advancements in AI-Enhanced Collaborative Robotics: Towards Safer, Smarter, and Human-Centric Industrial Automation. Results Eng. 2025, 27, 105704. [Google Scholar] [CrossRef]
Xue, C.; Bai, X. Probabilistic Carbon Emission Flow Calculation of Power System with Latin Hypercube Sampling. Energy Rep. 2025, 14, 751–765. [Google Scholar] [CrossRef]
El Zomor, M.A.; Ahmed, M.H.; Ahmed, F.S.; Elhelaly, M.A. Failure Analysis of Bolts in Deluge Valve Bonnet in Cooling Tower System in Petrochemical Plant. Sci. Rep. 2025, 15, 14133. [Google Scholar] [CrossRef] [PubMed]
Peng, W.X. Research on Industrial Inspection Technology Based on Computer Vision. Mod. Manuf. Technol. Equip. 2023, 59, 112–114. [Google Scholar] [CrossRef]
Zhao, X.; Kargoll, B.; Omidalizarandi, M.; Xu, X.; Alkhatib, H. Model Selection for Parametric Surfaces Approximating 3D Point Clouds for Deformation Analysis. Remote Sens. 2018, 10, 634. [Google Scholar] [CrossRef]
Liao, M.; Shi, B.; Bai, X.; Wang, X.; Liu, W. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar] [CrossRef]
Zhang, S.; Duan, A.; Sun, Y. A Text-Detecting Method Based on Improved CTPN. J. Phys. Conf. Ser. 2023, 2517, 012014. [Google Scholar] [CrossRef]
Ma, X.; Tian, Y.H.; Zhao, W. A Review of Neural Network-Based Machine Translation Research. Comput. Eng. Appl. 2025. online ahead of print. Available online: https://link.cnki.net/urlid/11.2127.TP.20250522.1548.014 (accessed on 22 May 2025).
Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An Efficient and Accurate Scene Text Detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2642–2651. [Google Scholar] [CrossRef]
Wang, R.; Yang, C.Y. Transmission Cable Model Identification Based on Dual-Stage Correction. Meas. Control Technol. 2025, 44, 26–34. [Google Scholar] [CrossRef]
Feng, L.; Chen, Y.; Zhou, T.; Hu, F.; Yi, Z. Review of Human Lung and Lung Lesion Regions Segmentation Methods Based on CT Images. J. Image Graph. 2022, 27, 722–749. [Google Scholar] [CrossRef]
Deng, D.; Liu, H.; Li, X.; Cai, D. PixelLink: Detecting Scene Text via Instance Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar] [CrossRef]
Liao, M.; Wan, Z.; Yao, C.; Chen, K.; Bai, X. Real-Time Scene Text Detection with Differentiable Binarization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11474–11481. [Google Scholar] [CrossRef]
Zhu, Y.; Chen, J.; Liang, L.; Kuang, Z.; Jin, L.; Zhang, W. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3122–3130. [Google Scholar] [CrossRef]
Liu, L.L.; Liu, G.M.; Qi, B.Y.; Deng, X.S.; Xue, D.Z.; Qian, S.S. Efficient Inference Techniques of Large Models in Real-World Applications: A Comprehensive Survey. Comput. Sci. 2025. online ahead of print. Available online: https://link.cnki.net/urlid/50.1075.TP.20250703.1601.002 (accessed on 3 July 2025).
Guan, S.; Niu, Z.; Kong, M.; Wang, S.; Hua, H. EDPNet (Efficient DB and PARSeq Network): A Robust Framework for Online Digital Meter Detection and Recognition Under Challenging Scenarios. Sensors 2025, 25, 2603. [Google Scholar] [CrossRef]
Wei, W.; Long, N.; Tian, Y.; Kang, B.; Wang, D.L.; Zhao, W.B. Text Detection Method for Power Equipment Nameplates Based on Improved DBNet. High Volt. Eng. 2023, 49 (Suppl. S1), 63–67. [Google Scholar] [CrossRef]
Yao, L.; Tang, C.; Wan, Y. Advanced Text Detection of Container Numbers via Dual-Branch Adaptive Multiscale Network. Appl. Sci. 2025, 15, 1492. [Google Scholar] [CrossRef]
Li, Z.X.; Zhou, Y.T. iSFF-DBNet: An improved text detection algorithm in e-commerce images. Comput. Eng. Sci. 2023, 45, 2008–2017. [Google Scholar]
Zheng, Q.; Zhang, Y. Text Detection and Recognition for X-Ray Weld Seam Images. Appl. Sci. 2024, 14, 2422. [Google Scholar] [CrossRef]
Huang, B.; Bai, A.; Wu, Y.; Yang, C.; Sun, H. DB-EAC and LSTR: DBnet based seal text detection and Lightweight Seal Text Recognition. PLoS ONE 2024, 19, e0301862. [Google Scholar] [CrossRef]
Mai, R.; Wang, J. UM-YOLOv10: Underwater Object Detection Algorithm for Marine Environment Based on YOLOv10 Model. Fishes 2025, 10, 173. [Google Scholar] [CrossRef]
Ding, Y.; Han, B.; Jiang, H.; Hu, H.; Xue, L.; Weng, J.; Tang, Z.; Liu, Y. Application of Improved YOLOv8 Image Model in Urban Manhole Cover Defect Management and Detection: Case Study. Sensors 2025, 25, 4144. [Google Scholar] [CrossRef]
Rossi, M.J.; Vervoort, R.W. Enhancing Inundation Mapping with Geomorphological Segmentation: Filling in Gaps in Spectral Observations. Sci. Total Environ. 2025, 997, 180180. [Google Scholar] [CrossRef] [PubMed]
Xie, W.; Cui, Y.R. Identification of Maize Leaf Diseases Based on Improved EfficientNetV2 Model. Jiangsu Agric. Sci. 2025, 53, 207–215. [Google Scholar] [CrossRef]
Yin, L.; Wang, N.; Li, J. Electricity Terminal Multi-Label Recognition with a “One-Versus-All” Rejection Recognition Algorithm Based on Adaptive Distillation Increment Learning and Attention MobileNetV2 Network for Non-Invasive Load Monitoring. Appl. Energy 2025, 382, 125307. [Google Scholar] [CrossRef]
Mao, Y.; Tu, L.; Xu, Z.; Jiang, Y.; Zheng, M. Combinatorial Spider-Hunting Strategy to Design Multilayer Skin-Like Pressure-Stretch Sensors with Precise Dual-Signal Self-Decoupled and Smart Object Recognition Ability. Res. Astron. Astrophys. 2025, 25, 1147. [Google Scholar] [CrossRef]
Zhuang, J.; Chen, W.; Huang, X.; Yan, Y. Band Selection Algorithm Based on Multi-Feature and Affinity Propagation Clustering. Remote Sens. 2025, 17, 193. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
Guo, Y.; Zhan, W.; Zhang, Z.; Zhang, Y.; Guo, H. FRPNet: A Lightweight Multi-Altitude Field Rice Panicle Detection and Counting Network Based on Unmanned Aerial Vehicle Images. Agronomy 2025, 15, 1396. [Google Scholar] [CrossRef]
Yu, X.; Lin, S.J. Research on Improved Text Detection Algorithm for Prosecutorial Scenarios Based on DBNet. SmartTech Innov. 2024, 30, 7. [Google Scholar] [CrossRef]
Liao, M.; Zou, Z.; Wan, Z.; Yao, C.; Bai, X. Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 919–931. [Google Scholar] [CrossRef]
Zhang, S.X.; Zhu, X.; Hou, J.B.; Liu, C.; Yang, C.; Wang, H.; Yin, X.C. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection. arXiv 2020, arXiv:2003.07493. [Google Scholar] [CrossRef]
Sheng, T.; Chen, J.; Lian, Z. CentripetalText: An Efficient Text Instance Representation for Scene Text Detection. arXiv 2022, arXiv:2107.05945. [Google Scholar] [CrossRef]
Li, X.T.; Deng, Y.H. DDP-YOLOv8 model for battery character defect detection. J. Electron. Meas. Instrum. 2025, online ahead of print. 1–10. Available online: https://link.cnki.net/urlid/11.2488.TN.20250512.1511.004 (accessed on 13 May 2025).
Deng, W.C.; Yu, X.C.; Zhu, J.B.; Ma, Q.S.; Chen, Y.; Ye, C.; Zhang, C.Z.; Ge, C.Y. Text detection model based on DBNet-CST. Commun. Inf. Technol. 2025, 1, 100–103. [Google Scholar]
Cai, W.T.; Zhao, H.; Wang, H.; Zhang, H. An Improved Inkjet Character Recognition Algorithm Based on Multi-Granularity Attention. Manuf. Autom. 2024, 46, 148–155. [Google Scholar] [CrossRef]

Figure 1. DBNet model.

Figure 2. Overall structure of EBiDNet.

Figure 3. Backbone network structure.

Figure 4. SE attention module structure.

Figure 5. MB Conv module structure.

Figure 6. FPN structure diagram.

Figure 7. Structural diagram of the BiFPN described in this paper.

Figure 8. DBhead network structure.

Figure 9. Appearance of electric actuator.

Figure 10. Dataset Acquisition Environment.

Figure 11. Dataset sample.

Figure 12. Comparison of detection results between DBNet and EBiDNet.

Table 1. Main network structure parameters.

Stage Operator	Operator	Stride	Channel	Layers
0	Conv 3 × 3	2	24	24
1	Fused-MB Conv1, k3 × 3	1	24	24
2	Fused-MB Conv4, k3 × 3	2	48	48
3	Fused-MB Conv4, k3 × 3	2	64	64
4	MB Conv4, k3 × 3, SE0.25	2	128	128
5	MB Conv6, k3 × 3, SE0.25	1	160	160
6	MB Conv6, k3 × 3, SE0.25	2	256	256
7	Conv 3 × 3 & Pooling & FC	-	1280	1280

Table 2. Training Environment and Related Parameters.

Name	Version
Operating System	Windows 10 Professional
CPU	Intel Core i5-13400F
GPU	RTX-4060
Python	3.10.16
Paddle	2.6
CUDA	11.8

Table 3. Ablation experiment.

Model	Precision	Recall	F1-Score	Params/M	FPS
DBNet	84.50%	72.04%	77.77%	24.55	18.87
DBNet + BiFPN	86.55%	79.11%	82.66%	24.09	20.88
DBNet + EfficientNetV2S	92.43%	84.48%	88.28%	20.29	17.02
DBNet + EfficientNetV2S + BiFPN (Ours)	93.63%	90.32%	91.94%	20.14	18.61

Table 4. Comparison with Other Advanced Methods.

Model	Precision	Recall	F1-Score
EAST	84.41%	69.17%	76.02%
FCE	90.08%	89.49%	89.78%
DB++ [33]	90.89%	82.66%	86.58%
DRRG [34]	89.92%	80.91%	85.18%
CT [35]	88.68%	81.70%	85.05%
EBiDNet (Ours)	93.63%	90.32%	91.94%

Table 5. Comparative Performance of Representative Improved Character Detection Methods.

Model	Precision	Recall	F1-Score
EDNet [17]	100.00%	99.59%	99.79%
F1 [18]	84.70%	79.20%	82.20%
iSFF-DBNet [20]	82.60%	68.10%	74.60%
DB-EAC [24]	90.29%	85.17%	87.65%
DDP-YOLOv8 [36]	86.30%	97.00%	91.30%
DBNet-CST [37]	92.20%	81.10%	86.30%
F2 [38]	98.82%	84.20%	90.92%
EBiDNet (Ours)	93.63%	90.32%	91.94%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, K.; Wu, Y.; Yan, Z. EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework. Symmetry 2025, 17, 1443. https://doi.org/10.3390/sym17091443

AMA Style

Wang K, Wu Y, Yan Z. EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework. Symmetry. 2025; 17(9):1443. https://doi.org/10.3390/sym17091443

Chicago/Turabian Style

Wang, Kun, Yinchuan Wu, and Zhengguo Yan. 2025. "EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework" Symmetry 17, no. 9: 1443. https://doi.org/10.3390/sym17091443

APA Style

Wang, K., Wu, Y., & Yan, Z. (2025). EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework. Symmetry, 17(9), 1443. https://doi.org/10.3390/sym17091443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EBiDNet: A Character Detection Algorithm for LCD Interfaces Based on an Improved DBNet Framework

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. DBNet Architecture

3.2. EBiDNet—Improved DBNet Network Structure

3.3. Backbone

3.4. Neck

3.5. Head

3.5.1. Head Network Architecture

3.5.2. Principle of Differentiable Binarization

3.5.3. Adaptive Threshold

3.5.4. Loss Calculation

4. Experiments and Results

4.1. Dataset

4.1.1. Construction of the Dataset Acquisition Platform

4.1.2. Data Collection and Partitioning

4.1.3. Training Augmentation Strategies

4.2. Training Details

4.3. Evaluation Indicators

4.4. Ablation Experiment

4.5. Comparison Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI