Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays

Zhu, Junchao; Gao, Nan

doi:10.3390/electronics14132612

Open AccessArticle

Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays

by

Junchao Zhu

and

Nan Gao

^*

College of Computer Science, Zhejiang University of Technology, Hangzhou 310014, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2612; https://doi.org/10.3390/electronics14132612 (registering DOI)

Submission received: 28 April 2025 / Revised: 13 June 2025 / Accepted: 23 June 2025 / Published: 27 June 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Small-scale object detection remains a significant challenge in semi-supervised object detection (SSOD), particularly in panoramic dental X-rays. Due to the small lesion size, low contrast, and complex anatomical background, conventional teacher models often fail to extract accurate lesion features, leading to noisy pseudo labels and suboptimal detection performance. Additionally, most existing SSOD methods rely on high-confidence thresholds to select pseudo labels, which may mistakenly discard valuable predictions with low scores but accurate localization—especially for small-scale targets. To address these challenges, we propose Entropy Teacher, a novel SSOD framework specifically designed for small-scale dental disease detection. Our method introduces an Entropy-Guided Feature Pyramid that integrates entropy-guided representations to enhance fine-grained structural learning. Moreover, we develop a low-confidence pseudo-label mining (LCPLM) strategy with a class-adaptive thresholding mechanism to effectively recover high-quality pseudo labels below conventional confidence thresholds. Extensive experiments on the Dental Disease Dataset and ChestX-Det demonstrate that Entropy Teacher achieves state-of-the-art performance, surpassing the baseline Unbiased Teacher by +3.8

A P_{50}

and +4.5

A P_{S}

. These results confirm the effectiveness of entropy-guided representations and low-confidence mining in improving small-scale lesion detection under limited supervision.

Keywords:

semi-supervised learning; small-scale dental disease detection; entropy map; low-confidence pseudo-label mining

1. Introduction

Object detection in medical imaging plays a vital role in computer-aided diagnosis, enabling automated localization of abnormal regions [1]. In particular, panoramic dental X-rays (also known as orthopantomograms) are widely used in routine dental assessments due to their ability to capture comprehensive anatomical structures in a single scan. Accurate detection of dental diseases, such as caries, apical shadows, and high/low-density lesions, is crucial for early diagnosis and treatment planning. However, detecting these lesions remains challenging due to their small size, low contrast, and high similarity to surrounding tissues. Moreover, collecting large-scale, pixel-level annotated datasets in the medical domain is both time-consuming and labor-intensive [2,3,4,5], which severely limits the applicability of fully supervised detection frameworks [6,7,8,9] in real-world scenarios.

Semi-supervised object detection [10,11,12,13] (SSOD) has recently emerged as a promising paradigm to alleviate the reliance on extensive annotations by leveraging large amounts of unlabeled data. Most SSOD methods adopt a teacher–student framework [10,14], where the teacher model generates pseudo labels to supervise the student model on unlabeled images. Although this approach has shown impressive results on natural images, it faces two major limitations when applied to small-scale lesion detection in medical images.

Firstly, limitations of conventional feature pyramids for small-scale lesion detection. While Feature Pyramid Networks [15] (FPNs) are widely used in object detection for multi-scale representation, they often fail to capture the subtle appearance and boundary features of small-scale lesions in panoramic dental X-rays. This is due to the low contrast and high similarity between lesions and surrounding anatomical structures. As shown in Figure 1 [16], many lesions are tiny and blend into the background, leading to inaccurate localization and noisy pseudo labels. In contrast, entropy maps have demonstrated strong potential in highlighting regions with rich structural information. By quantifying local uncertainty and complexity, they enhance texture and boundary details, making them particularly effective for small object detection in low-contrast medical images [17,18]. As shown in Figure 2, entropy-enhanced features offer clearer discrimination between lesions and background, making them a promising alternative to conventional FPN enhancements. In Figure 2, we visualize the attention heatmaps before and after integrating entropy maps with the original view. Before incorporating entropy information, the model tends to focus on irrelevant anatomical structures, such as the outer edges of the panoramic image or the central areas of teeth, which are generally unrelated to most dental diseases. However, after fusing entropy-based representations, the model shifts its attention toward more diagnostically relevant regions, such as the tops, sides, and roots of teeth—areas where caries and apical shadows commonly occur. In addition, as shown in Table 1, we validate the effectiveness of entropy maps on the Dental Disease Dataset. It can be observed that integrating entropy-based features improves the overall detection accuracy

A P_{50}

by 2.7 and enhances the performance on small-scale lesions

A P_{S}

by 2.9. These demonstrate the effectiveness of entropy maps in guiding the model toward informative lesion features and reducing interference from background structures.

Secondly, the over-reliance on fixed high-threshold pseudo-label filtering limits the effectiveness of semi-supervised training for small-scale lesions. Most existing SSOD frameworks discard predictions with classification scores below a predefined threshold (e.g., 0.9). However, small lesions—due to their low contrast and subtle visual features—often yield low confidence scores even when correctly localized. As a result, many potentially valuable pseudo labels are mistakenly excluded, leading to insufficient supervision for small objects. Moreover, low confidence does not necessarily imply poor localization quality. In many cases, predictions with relatively low scores still correspond to accurate bounding boxes [19]. To address this, we propose a low-confidence pseudo-label mining (LCPLM) strategy combined with a class-adaptive threshold mechanism. This strategy adaptively identifies reliable pseudo labels below the traditional confidence threshold by evaluating the Entropy-Guided feature pyramid. It enables the model to retain high-quality supervisory signals from hard-to-detect lesions that would otherwise be discarded, ultimately improving detection performance on small-scale targets.

To address these challenges, we propose a novel SSOD framework, Entropy Teacher, specifically designed for small-scale dental disease detection in panoramic X-rays. Our approach introduces an Entropy-Guided Feature Pyramid that integrates entropy-based representations to enhance feature learning for subtle lesions. In addition, we develop a low-confidence pseudo-label mining (LCPLM) strategy with a class-adaptive thresholding mechanism, enabling effective utilization of high-quality pseudo labels with low confidence scores. Our main contributions are summarized as follows:

We propose Entropy Teacher, a novel SSOD framework that combines entropy-guided feature fusion and low-confidence pseudo-label mining to improve the detection of small-scale lesions in panoramic dental X-rays.
We design an Entropy-Information Aggregation (EIA) module to extract entropy-based features and fuse them with original image features, leading to enhanced fine-grained representations and more accurate localization of small-scale lesions.
We introduce an LCPLM strategy with class-adaptive thresholding, which identifies and leverages reliable pseudo labels with low classification scores, thereby improving supervision for hard-to-detect lesions.

2. Related Works

2.1. Semi-Supervised Object Detection

In recent years, SSOD has gained significant attention as an effective approach to reducing reliance on large-scale labeled data. STAC [20] was the first to introduce pseudolabeling with strong data augmentation and consistency training in SSOD. However, this method employs a two-stage pseudo-label generation process, where pseudo-labels cannot be dynamically updated during training, thereby limiting performance improvement. Subsequently, various improvements based on the Mean Teacher framework have been proposed. Refs. [10,11,12,13,19,21,22,23] leverage the exponential moving average (EMA) mechanism to iteratively update pseudo-labels in the teacher network, enabling end-to-end pseudo-label generation.

For instance, Unbiased Teacher [10] adopts a confidence thresholdbased pseudo-labeling strategy to mitigate confirmation bias in SSOD. Soft Teacher [12] assigns classification confidence scores as classification loss weights to suppress the negative impact of missing potential objects in pseudo-labeling. PseCo [24] enhances pseudo-label reliability through a perturbation-consistency pseudolabeling strategy, effectively reducing noise and improving SSOD performance. Although these methods have achieved promising results in natural scene object detection, they struggle to address the complex multi-scale lesion distribution in panoramic dental X-rays. In particular, pseudo-label noise has a significant impact on detecting small lesion regions. Therefore, our research focuses on enhancing feature fusion with entropy image and mining low confidence but high-quality pseudo labels to enhance small-scale dental disease detection.

2.2. Semi-Supervised Learning in Medical Imaging

Due to the difficulty of obtaining precisely annotated medical data, semi-supervised learning has been widely adopted in the medical domain. For instance, Das et al. [2] introduced a Scale-Invariant Module and a Pseudo-Label Optimizer specifically designed to enhance disease detection accuracy in chest X-ray images. Saga et al. [25] utilized YOLOv6 and DeepLab v3+ for facial detection to assist in the treatment of speech disorders in children. Roy et al. [26] developed a semi-supervised human activity recognition system based on various clustering strategies.

Unlike these existing applications of semi-supervised learning in the medical field, our work focuses on detecting complex small-scale diseases in panoramic X-rays. Our goal is to improve the detection performance of small-scale lesions through a semi-supervised object detection framework, thereby assisting clinicians in making more effective diagnoses.

3. Materials and Methods

In the problem of semi-supervised object detection, a model is trained with a labeled

D_{L} = {(X_{i}^{S}, Y_{i}^{S}) |_{i = 1}^{N}}

and an unlabeled image set

D_{u} = {X_{j}^{u} |_{j = 1}^{M}}

, where N and M are the numbers of labeled and unlabeled data.

X_{i}^{S}

represent the i-th labeled image;

Y_{i}^{S}

denote the label of the i-th labeled image, including the bounding box and class information;

X_{j}^{u}

represent the j-th unlabeled image. The goal is to make use of both labeled and unlabeled data to learn the object detection model. Built upon the pseudo labeling framework, our method follows a score filtering mechanism [10] to generate pseudo labels

Y^{u}

for unlabeled image

X^{u}

.

An overview of our approach is illustrated in Figure 3. We propose Entropy Teacher to mitigate pseudo-label noise and enable accurate detection of small-scale dental diseases in panoramic X-rays. Specifically, we introduce an Entropy-Guided Feature Pyramid, which incorporates an EIA module to fuse entropy map features and enhance representations in low-contrast regions. In addition, we adopt an LCPLM strategy to identify high-quality pseudo labels with low confidence scores. To further improve recall, a class-adaptive threshold strategy is employed, which dynamically adjusts pseudo-label selection based on the confidence distribution of each category. Although our framework is model-agnostic, we adopt Faster R-CNN [6] with FPN [15] to ensure consistency with previous SSOD works in medical imaging.

3.1. Basic Pseudo-Labeling Framework

Following standard practice in SSOD, we adopt a pseudo-label-based teacher–student framework for training. Specifically, each training batch is composed of both labeled and unlabeled data, sampled at a ratio of 1:4. The labeled data are used to supervise the student model via standard supervised loss with ground-truth annotations, while the unlabeled data rely on pseudo labels generated by the teacher model.

Since unlabeled data lack annotations, the teacher model provides supervision by generating pseudo labels. Its weights are updated through the exponential moving average (EMA) of the student’s parameters. For each unlabeled sample, the teacher model generates pseudo labels on a weakly augmented view of the image, which are then used to supervise the student model on the strongly augmented view. The student model is updated using both the labeled and unlabeled samples. The overall training objective is defined as follows:

L = L_{s} + α \cdot L_{u}

(1)

where

L_{s}

and

L_{u}

denote supervised loss of labeled images and unsupervised loss of unlabeled images respectively,

α

controls contribution of unsupervised loss.

In this framework, the teacher model parameters

θ^{'}

are updated by the student model parameters using EMA, as follows:

θ_{t}^{'} = m θ_{t - 1}^{'} + (1 - m) θ_{t}

(2)

where t represents the current time step, and the momentum decay is defined as 0.999.

3.2. Entropy-Guided Feature Pyramid

In panoramic dental X-rays, small-scale lesions often exhibit low contrast and weak semantic cues, making them difficult to localize using traditional visual features alone. To address this issue, we introduce an EIA module that integrates structural information from entropy maps with semantic features extracted from the original image. Entropy maps are capable of capturing local uncertainty and texture complexity, thereby highlighting regions that are rich in structural detail. By aggregating these representations with standard visual features, our EIA module enhances the model’s ability to focus on diagnostically relevant regions. The module is lightweight, plug-and-play, and trained in an end-to-end manner alongside the base detector. As shown in Appendix A, we calculate the local entropy map of the image by sliding the window. In addition to the provided code, we have now cited the original publication by Robert Haralick [27], who first introduced entropy features in the context of image analysis.

Given an unlabeled image

X^{u}

, we introduce its corresponding entropy map, which shares the same spatial resolution as

X^{u}

. Using the feature encoder

f (θ)

, we extract multi-scale feature pyramids

{P_{i}^{+}}

from the original image and

{P_{i}^{E}}

from the entropy map, where each

P_{i}^{+} \in R^{256 \times H \times W}

and

P_{i}^{E} \in R^{1 \times H \times W}

.

To achieve this, inspired by [28], we first compute a spatial attention map

S A

based on the entropy features by EIA. Specifically, we invert the normalized entropy features

(1 - P_{i}^{E}) \in R^{1 \times H \times W}

to emphasize stable, low-entropy regions that are typically more reliable for detection. This inversion reduces the impact of noisy, high-entropy background areas while amplifying well-structured lesion regions.

We then use a two-layer convolutional subnetwork to model spatial dependencies and adaptively learn attention weights. The architecture consists of a 1 × 1 convolution that expands the feature to an intermediate dimension 16, followed by a ReLU activation, and then a second 1 × 1 convolution that projects back to a single-channel attention map. Concretely,

S A = σ (W_{S A}^{2} \cdot δ (W_{S A}^{1} \cdot (1 - P_{i}^{E}) + b_{S A}^{1}) + b_{S A}^{2}),

where

W_{S A}^{1} \in R^{16 \times 1 \times 1 \times 1}

projects the input to 16 channels,

W_{S A}^{2} \in R^{1 \times 16 \times 1 \times 1}

reduces it back to a single channel,

δ (\cdot)

denotes the ReLU function, and

σ (\cdot)

is the Sigmoid function that normalizes the attention map to [0, 1]. As a result,

S A \in R^{1 \times H \times W}

.

We apply this attention map to fuse the original and entropy features in a soft attention manner:

P_{i}^{\times} = S A \cdot P_{i}^{+} + (1 - S A) \cdot P_{i}^{E^{'}},

where

P_{i}^{E^{'}} \in R^{C \times H \times W}

is obtained by expanding

P_{i}^{E}

through channel-wise replication and a 1 × 1 convolution for channel alignment.

This fusion strategy allows the model to dynamically prioritize structural cues from the entropy map in uncertain or ambiguous regions, while retaining semantic richness from the original view in well-understood areas. The final fused features

P_{i}^{\times}

and the original features

P_{i}^{+}

are used to generate predictions

{\hat{y}}^{\times}

and

{\hat{y}}^{+}

, respectively.

Notably, the EIA module is lightweight, requiring only two pointwise convolution layers and a few activation functions, resulting in negligible additional computation and parameters.

3.3. Low-Confidence Pseudo-Label Mining Strategy

Semi-supervised object detection frameworks typically filter pseudo labels using a fixed high-confidence threshold

τ_{h}

(e.g., 0.9). However, this approach is suboptimal for small-scale lesion detection, where subtle features and low contrast often result in low classification scores—even for accurately localized objects. To address this, we adopt a two-threshold scheme: a fixed lower threshold

τ_{l} = 0.5

and a class-adaptive upper threshold

τ_{h}

(introduced in Section 3.3.1). Based on these thresholds, pseudo labels are categorized as follows:

High-confidence pseudo labels ( $s \geq τ_{h}$ ): directly used to supervise the student model.
Weak pseudo labels ( $τ_{l} \leq s < τ_{h}$ ): potentially valuable but uncertain, evaluated by our mining strategy.
Low-confidence pseudo labels ( $s < τ_{l}$ ): discarded as unreliable.

We then propose a two-stage strategy to leverage both confident and weak pseudo labels: (1) a class-adaptive thresholding mechanism to compute

τ_{h}

for each class, and (2) a Quality-Aware Mining strategy to select high-quality weak pseudo labels based on cross-view consistency. This design improves both recall and robustness in detecting small lesions.

3.3.1. Class-Adaptive Threshold for Pseudo-Label Filtering

Instead of using a fixed threshold across all classes, we dynamically adjust the upper threshold

τ_{h}

based on the average classification confidence of each class. Specifically, given the set of foreground

N_{c}^{f g}

and background predictions

N^{b g}

for a class c, wo define

τ_{h}

as follows:

τ_{h} = {(\frac{\frac{1}{N_{c}^{f g}} \sum_{i = 1}^{N_{c}^{f g}} S_{i}^{f g}}{\frac{1}{N^{b g}} \sum_{j = 1}^{N^{b g}} S_{j}^{b g}})}^{γ}

(3)

where

S_{i}^{f g}

and

S_{j}^{b g}

are the classification scores of the i-th foreground and j-th background prediction boxes, respectively, and

γ

is a hyperparameter controlling the threshold scaling (we set

γ = 0.05

). The intuition behind this design is to avoid setting an overly low threshold at the early training stages, as this may introduce a large number of noisy pseudo labels. By applying a power function with a small exponent (e.g.,

γ = 0.05

), even low foreground-to-background confidence ratios produce relatively high thresholds, which leads to conservative pseudo-label selection at the beginning. As training progresses and the model becomes more confident in foreground predictions, the ratio increases, and the threshold grows more slowly. This allows more pseudo labels to be accepted without significantly increasing the risk of noise, thus improving recall while maintaining quality.

3.3.2. Quality-Aware Mining of Low-Confidence Pseudo Labels

To recover reliable pseudo labels from low-confidence predictions, we define a lower threshold

τ_{l}

(empirically set to 0.5) and consider all candidate boxes within the interval

τ_{l} \leq s < τ_{h}

For each candidate, we assess its quality based on the consistency of classification scores across two views: the original view and the entropy-guided fused view.

Given proposal sets

{C_{i}^{\times}}_{i = 1}^{M}

from the entropy-fused features and

{C_{j}^{+}}_{j = 1}^{N}

from the original features, both having IoU > 0.7 with the candidate, we define the improvement score

Δ q

as

Δ q = \frac{1}{M} \sum_{i = 1}^{M} C_{i}^{\times} - \frac{1}{N} \sum_{j = 1}^{N} C_{j}^{+}

(4)

A higher

Δ q

indicates that the prediction has significantly benefited from entropy-based enhancement, suggesting its underlying reliability. If

Δ q

exceeds a promotion threshold

δ

, the candidate is selected as a valid pseudo label and incorporated into the training set. This mechanism enables the model to retain informative but uncertain predictions, improving recall and robustness—especially for small-scale lesions. Owing to the enhanced feature representation enabled by the entropy-guided feature pyramid, we believe that the observed improvement in the score of

Δ q

reflects a genuine pseudo label quality gain.

3.4. Loss Function of Entropy Teacher

In our framework, the final pseudo labels generated by the teacher model are used to supervise the student detector on unlabeled data. Specifically, the student model makes predictions using two separate feature branches: one from the original augmented image view and another from the entropy-guided feature fusion. Both branches are supervised using the same set of pseudo labels. The overall unsupervised loss is formulated as

L_{u} = λ_{1} L_{u}^{1} (y^{1}, \hat{y}) + λ_{2} L_{u}^{2} (y^{2}, \hat{y})

(5)

where

y^{1}

and

y^{2}

denote the predictions from the original-view feature pyramid and the entropy-fused feature pyramid, respectively, and

\hat{y}

represents the pseudo labels produced by the teacher model. These pseudo labels include both high-confidence predictions (above the class-adaptive threshold

τ_{h}

) and additional low-confidence pseudo labels selected through our LCPLM strategy.

λ_{1}

and

λ_{2}

are weighting coefficients that balance the two loss terms.

For labeled data, the standard supervised loss

L_{s}

is computed using the ground-truth annotations. The final training objective combines both supervised and unsupervised components:

L_{total} = L_{s} + α \cdot L_{u}

(6)

where

α

is a hyperparameter that controls the relative contribution of the unsupervised loss. In all experiments, we follow the baseline setting [10] for the design of detection losses, including classification and bounding box regression terms.

As shown on the right side of Figure 3, to ensure fair comparison during inference, all auxiliary components—such as entropy maps and the EIA module—are discarded, and predictions are made using only the original image view.

4. Results

4.1. Dataset and Evaluation Metrics

To evaluate the effectiveness and generalization capability of our method, we tested it on two datasets: Dental Disease Dataset [16] and ChestX-Det dataset [29].

Dental Disease Dataset: This comprises 245 annotated panoramic X-rays from patients with dental diseases and 1000 unlabeled panoramic X-rays. These images contain a total of 1157 annotated disease instances, categorized into four types: low-density shadow, high-density shadow, root tip shadow, and obstructed teeth. Following the definition in MS COCO [30], we classify lesions based on their pixel area: small-scale diseases are those with an area smaller than

32 \times 32

pixels, medium-scale diseases fall within the range of

32 \times 32

to

96 \times 96

pixels, and large-scale diseases exceed

96 \times 96

pixels. The dataset statistics are presented in Table 2, showing that the majority of instances belong to the small-scale category. For the Dental Disease Dataset, due to the difficulty of annotating small-scale diseases and the resulting scarcity of labeled data, we used all available labeled images for training.

ChestX-Det Dataset: This dataset consists of 3578 chest X-ray images, annotated by three board-certified radiologists with 13 common disease or abnormality categories. For the ChestX-Det dataset, we randomly sampled 10%, 30%, and 50% of the images as labeled data, with the remaining images used as unlabeled data accordingly. Following standard practice, we performed five-fold cross-validation for each experimental setting and reported the average results.

The evaluation follows the standard metrics of the PASCAL VOC dataset [31], including

A P_{50}

,

A P_{S}

,

A P_{M}

,

A P_{L}

and

A R

. Specifically,

A P_{50}

represents the mean average precision across all categories at an IoU threshold of 0.5, while

A P_{S}

,

A P_{M}

and

A P_{L}

correspond to the average precision for small-, medium-, and large-scale diseases at the same IoU threshold. Higher values indicate better detection performance. The definition of

A P

and

A R

is as follows:

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

A P = \int_{0}^{1} P (R) d R, A R = \frac{1}{N} \sum_{i = 1}^{N} R_{i}

(9)

where

T P

denotes the number of detected bounding boxes with an IoU greater than the specified threshold, while

F P

represents the number of detected bounding boxes with an IoU below the threshold.

F N

refers to the number of missed ground-truth boxes.

4.2. Implementation Details

Following the mainstream choices in semi-supervised object detection, we adopt Faster R-CNN [6] with a ResNet-50 [32] backbone and a FPN [15] as our detection model. Our proposed method is built upon the Unbiased Teacher framework [10], and we reuse its data augmentations and hyperparameters without any additional modifications to ensure a fair comparison. The entire framework is implemented in Python 3.8 using PyTorch 1.12.1, and the detector is developed based on the MMDetection toolbox (version 2.28.2). We employ standard libraries such as NumPy 1.23.5, OpenCV 4.5.5, and scikit-image 0.20.0 for image processing and entropy map computation. All models are trained on two NVIDIA RTX 4090 GPUs under Ubuntu 20.04 with CUDA 11.7, using a base learning rate of 0.01. The training configurations for each dataset are as follows:

ChestX-Det Dataset: Under the partial-label settings, all models are trained for 180,000 iterations, with each iteration using one labeled image and four unlabeled images per GPU. The learning rate is decayed by a factor of 0.1 at 120K and 160K iterations.

Dental Disease Dataset: Due to the scarcity of labeled data, models are trained for 20,000 iterations with a constant learning rate, using one labeled image and four unlabeled images per GPU at each iteration.

4.3. Effectiveness of Our Approach

We evaluate the effectiveness of our proposed Entropy Teacher framework on two challenging medical datasets: the Dental Disease Dataset and ChestX-Det. As shown in Table 3 and Table 4, our method consistently outperforms both supervised and recent state-of-the-art semi-supervised object detection approaches under various settings.

Table 3 presents the detailed comparison results on the Dental Disease Dataset. Our Entropy Teacher achieves the highest performance across almost all metrics, with a significant improvement in both precision and recall for small-scale lesions. Specifically, our method reaches 55.9

A P_{50}

, surpassing the strongest baseline MixTeacher (54.4) by 1.5 points, and achieves the best

A P_{S}

of 45.3, indicating superior performance on small objects—a known challenge in dental disease detection. This notable improvement can be attributed to our low-confidence pseudo-label mining strategy, which effectively recovers valuable small object proposals that are often discarded by high-threshold filters. Additionally, our class-adaptive thresholding mechanism dynamically adjusts confidence thresholds for different categories based on the model’s current learning state, further improving the inclusion of high-quality pseudo-labels for small-scale diseases. In terms of recall, our model achieves the highest

A R_{S}

of 68.1 and

A R_{L}

of 80.5, showing its robustness across scales. This indicates that the Entropy-Guided Feature Pyramid design allows the model to better aggregate spatial and semantic cues, benefiting both small and large lesion localization.

As shown in Table 4, under low-label regimes (10%, 30%, and 50% labeled data), our Entropy Teacher consistently outperforms all competing methods. With only 10% labeled data, it achieves 22.2

A P_{50}

, outperforming the previous best MixTeacher (21.8) and Unbiased Teacher (20.6). When the amount of labeled data increases to 50%, our method still leads with 37.9

A P_{50}

, achieving +0.6 absolute gain over Unbiased Teacher and +1.5 over MixTeacher. These results validate the generalization capability of our method to different medical imaging domains. The integration of Entropy-Guided information (e.g., combining original and enhanced views) improves feature robustness, especially under limited supervision. Moreover, the low-confidence mining strategy enables the model to discover more reliable training signals, which is particularly beneficial when high-quality annotations are scarce.

4.4. Ablation Study

In this section, we conducted experiments on the Dental Disease Dataset to analyze and validate our method in detail.

4.4.1. Investigation of Designed Components

We conduct a step-by-step ablation study to validate the effectiveness of each component in our framework, as shown in Table 5. Starting with a fully supervised model trained solely on labeled data, the

A P_{50}

reaches 39.6. After introducing unlabeled data and integrating the augmented view feature pyramid

P^{+}

, performance improves significantly to 51.5

A P_{50}

and 39.5

A P_{S}

, demonstrating the benefits of cross-view feature enrichment. Next, the addition of entropy modality features

P^{E}

—even with a simple addition-based fusion—brings further gains of 2.0 in

A P_{50}

and 2.2 in

A P_{S}

. This confirms the utility of entropy-guided features for capturing subtle cues of small-scale dental lesions. Building upon this, incorporating the EIA module for adaptive fusion leads to a further increase in both

A P_{50}

and

A P_{S}

(54.2 and 42.4, respectively), highlighting EIA’s effectiveness in enhancing the model’s sensitivity to small-scale features. Finally, introducing the low-confidence pseudo-label mining (LCPLM) strategy achieves the largest performance boost, reaching 55.9

A P_{50}

and 45.3

A P_{S}

. The synergy between the Entropy-Guided Feature Pyramid and LCPLM allows the model to identify informative yet uncertain pseudo-labels, thus enriching supervision for small object regions and improving overall detection accuracy.

There exists a certain degree of class imbalance in the Dental Disease Dataset; for instance, there are only three instances of small-scale obstructed teeth. Our method partially addresses this issue through two key mechanisms. First, the class-adaptive thresholding (CAT) module adjusts confidence thresholds based on the score distributions of each class, thereby avoiding the use of uniform and overly stringent thresholds for rare categories. Second, the LCPLM strategy enhances supervision for underrepresented classes by selectively mining reliable low-confidence predictions, which provides additional training signals for categories with limited samples, such as obstructed teeth.

4.4.2. Generalization to Other Semi-Supervised Frameworks

Detecting small-scale diseases remains a major challenge in semi-supervised detection. To evaluate the generalization capability of our proposed Entropy-Guided Feature Pyramid (EGFP), we integrate it into three representative frameworks: Unbiased Teacher, Soft Teacher, and PseCo. The results are shown in Table 6. For Unbiased Teacher and Soft Teacher, which originally adopt only a unimodal augmented-view feature pyramid, we directly replace their feature extraction module with our EGFP. In contrast, PseCo already incorporates multi-view scale-invariant learning; thus, we retain its original design and augment it by adding the entropy-based modality as an additional input. Experimental results show that integrating the proposed EGFP leads to consistent improvements across all three frameworks. Notably, Soft Teacher benefits the most, with a significant increase in

A P_{S}

, suggesting that the combination of Soft Teacher’s region-level supervision with entropy-aware modality yields stronger guidance for small-scale object learning.

4.4.3. Impact of Pseudo-Label Confidence Threshold $τ_{h}$ on Detection Performance

In the LCPLM strategy, we use the classification score threshold

τ_{h}

to filter pseudo labels and select low-confidence candidates. This makes the setting of

τ_{h}

particularly critical, as it directly influences which pseudo labels are preserved or discarded. To investigate its impact, we conduct an ablation study comparing fixed thresholds with our proposed class-adaptive threshold (CAT), as shown in Table 7. Results show that among the fixed thresholds,

τ_{h} = 0.85

achieves the best performance, outperforming both 0.8 and 0.9. However, CAT surpasses all fixed values, achieving the highest AP across all scales, especially on small objects (45.3

A P_{S}

), validating the effectiveness of dynamically adjusting the threshold according to class-wise confidence distributions.

4.4.4. Impact of Lower Threshold $τ_{l}$ in LCPLM Strategy

To assess the impact of the lower threshold

τ_{l}

in our LCPLM strategy, we conduct an ablation study using different

τ_{l}

values. As shown in Table 8, setting

τ_{l} = 0.5

yields the best performance across all metrics, achieving 55.9 in

A P_{50}

and 45.3 in

A P_{S}

. Increasing the threshold to 0.6 and 0.7 results in consistent performance drops, especially on small-scale lesions. These results indicate that a lower

τ_{l}

allows the model to explore more potentially valuable low-confidence pseudo labels, which is essential for detecting subtle and small lesions in panoramic X-rays. Overly conservative thresholds (e.g.,

τ_{l} = 0.7

) may miss important training signals, leading to inferior performance.

4.4.5. Influence of Promotion Threshold $Δ q$ in LCPLM

To further analyze the behavior of our LCPLM strategy, we conduct an ablation study on the promotion threshold

Δ q

, which determines the minimum required score improvement from entropy-guided features for a low-confidence pseudo-label to be accepted. As shown in Table 9, when

Δ q = 0.0

, no additional pseudo-label selection based on score improvement is performed, resulting in 4.15 average boxes per image and an

A P_{S}

of 42.4. Setting

Δ q = 0.1

introduces a moderate quality constraint, increasing the average number of pseudo labels to 4.81 and significantly boosting

A P_{S}

to 45.3—the highest among all settings. In contrast, when

Δ q = 0.2

, the constraint becomes too strict, reducing the average pseudo labels and slightly degrading performance across all scales. These results confirm that a moderate

Δ q

strikes a good balance between precision and recall in pseudo-label selection, especially benefiting the detection of small-scale lesions.

4.5. Qualitative Visualization

Figure 4 compares the visualization results of Soft Teacher, PseCo, and our method on the Dental Disease Dataset. These results demonstrate the impressive capability of our Entropy Teacher, particularly in identifying low-contrast small objects—such as the two lesion regions indicated by white arrows in the first row—which were missed by previous methods like Soft Teacher and PseCo. Purple represents diseases that are difficult to detect.

5. Discussion

Despite recent advances in semi-supervised object detection (SSOD), existing frameworks still face considerable challenges in detecting small-scale dental lesions in panoramic X-rays due to low contrast, high background complexity, and limited annotations. Methods such as Unbiased Teacher [10], Soft Teacher [12], and PseCo [24] have attempted to address these issues by enhancing consistency regularization or improving pseudo-label reliability. However, they rely heavily on high-confidence thresholds, which often discard valuable pseudo labels associated with small and low-contrast lesions. Additionally, their unimodal feature extraction fails to sufficiently capture the fine-grained features needed for accurate detection in complex dental imagery.

To overcome these challenges, we propose Entropy Teacher, a novel SSOD framework tailored to small-scale dental disease detection. Our method introduces an Entropy-Guided Feature Pyramid by integrating entropy-guided representations via an EIA module, which enhances the model’s capacity to focus on lesion areas and improves fine-grained feature extraction. In addition, we design an LCPLM strategy with a class-adaptive threshold mechanism, enabling the model to recover reliable low-confidence pseudo labels that would otherwise be discarded.

Quantitative results on the Dental Disease Dataset demonstrate that Entropy Teacher outperforms existing SSOD approaches. Specifically, it achieves 55.9

A P_{50}

and 45.3

A P_{S}

, improving upon the strongest baseline MixTeacher by +1.5

A P_{50}

and +4.1

A P_{S}

, respectively. The model also achieves 68.1

A R_{S}

and 80.5

A R_{L}

, confirming its robustness across different lesion scales. These improvements are attributed to the model’s ability to dynamically fuse entropy-guided features and mine informative but uncertain labels, thereby enriching the training signal for small lesions.

Compared to other frameworks, Entropy Teacher provides a unique combination of modality fusion and adaptive label mining. For instance, while PseCo incorporates perturbation consistency, it lacks entropy-based representation enhancement. Unbiased Teacher and Soft Teacher rely solely on confidence-based filtering, which limits supervision for small and ambiguous targets. Our ablation studies show that adding the entropy branch improves

A P_{50}

by +2.7, and the LCPLM strategy contributes another +1.7 gain. The synergy between EIA and LCPLM is especially effective in boosting detection recall and precision for small-scale lesions, a critical need in dental diagnostics.

We also validate the generalizability of the Entropy-Guided Feature Pyramid by integrating it into other SSOD frameworks, including Unbiased Teacher, Soft Teacher, and PseCo. All variants exhibit consistent performance gains, particularly on APS. Notably, integrating EGFP into Soft Teacher yields a +4.3

A P_{S}

increase, underscoring the universal applicability of our feature enhancement strategy.

Despite its strong performance, Entropy Teacher has certain limitations. First, although entropy maps enhance local information extraction, they are computed externally and may introduce slight overhead during training. Second, while we designed the EIA module to be lightweight, the dual-branch feature pyramid still increases memory usage slightly during training (though it is removed during inference). Third, our current method has only been tested on dental and chest X-rays. Its effectiveness across other modalities (e.g., CT or MRI) or anatomical regions remains to be validated.

Compared with recent learned uncertainty estimators (e.g., Monte Carlo Dropout), entropy maps offer a simpler yet effective alternative. While learned methods provide fine-grained uncertainty modeling, they require additional model complexity and training cost. In contrast, entropy maps are lightweight and rely solely on input image statistics. This is particularly suitable for panoramic X-rays, which exhibit consistent anatomical structures and well-defined boundaries. Therefore, entropy-based representations can effectively highlight lesion-relevant regions without relying on heavy probabilistic modeling.

Furthermore, the proposed Entropy Teacher framework holds promising clinical value. By alleviating the need for extensive manual annotation and improving detection sensitivity for subtle and small-scale lesions, it can assist dental professionals in achieving earlier and more accurate diagnoses. This has the potential to reduce missed diagnoses, improve treatment planning, and ultimately enhance patient outcomes. Future work may focus on integrating learned uncertainty maps instead of pre-computed entropy, further optimizing memory usage via lightweight backbone networks, and expanding the framework to other domains such as endoscopic imagery or histopathology. Moreover, exploring self-distillation or temporal consistency in longitudinal data could further enhance label reliability for small target detection.

6. Conclusions

This study proposes the Entropy Teacher framework, a novel semi-supervised approach tailored for small-scale dental disease detection in panoramic X-rays. By integrating an Entropy-Guided Feature Pyramid with entropy-guided representations and a low-confidence pseudo-label mining strategy, our method significantly enhances detection accuracy for subtle, low-contrast lesions. The experimental results demonstrated the following:

The Entropy Teacher framework achieved state-of-the-art performance on both the Dental Disease Dataset and ChestX-Det. On the Dental Disease Dataset, it attained 55.9 $A P_{50}$ and 45.3 $A P_{S}$ , outperforming the baseline Unbiased Teacher by +3.8 $A P_{50}$ and +4.5 $A P_{S}$ , respectively. This highlights its superior capability in detecting small-scale lesions under complex anatomical backgrounds.
The proposed Entropy-Guided Feature Pyramid, which fuses entropy maps with raw image features through an EIA module, improved small-scale detection precision by +3.9 $A P_{S}$ compared to the baseline. This demonstrates the effectiveness of entropy-guided attention in enhancing fine-grained feature representation.
The LCPLM strategy with class-adaptive thresholds dynamically recovered high-quality pseudo labels below conventional confidence thresholds, increasing the average number of retained small-lesion proposals by 15.8% and boosting recall $A R_{S}$ to 68.1. This addresses the critical issue of insufficient supervision for small-scale targets.
Ablation studies validated the generalizability of our framework. Integrating the Entropy-Guided Feature Pyramid into existing semi-supervised frameworks (e.g., Soft Teacher, PseCo) improved their performance by +2.1 – 4.3 $A P_{S}$ , demonstrating its compatibility and robustness across diverse architectures.

The Entropy Teacher framework holds significant clinical value for dental diagnostics. By improving detection sensitivity for small-scale lesions (e.g., low-density shadows, root tip shadows), it reduces missed diagnoses and enables earlier intervention, which is critical for preventing disease progression. For instance, accurate localization of early-stage periapical lesions or subtle caries can guide timely treatment, minimizing invasive procedures and improving patient outcomes. Additionally, its semi-supervised design alleviates reliance on costly expert annotations, making it scalable for deployment in resource-constrained healthcare systems. In future work, we intend to explore the applicability of entropy-guided pseudo-label mining to other medical imaging modalities, such as CT and MRI. These modalities exhibit different noise profiles and structural features, and validating the generalizability of our entropy-based strategy in such contexts will further confirm its robustness and clinical relevance.

Author Contributions

Conceptualization, J.Z. and N.G.; methodology, J.Z. and N.G.; software, J.Z.; validation, J.Z.; formal analysis, N.G.; investigation, N.G.; resources, N.G.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, N.G.; visualization, J.Z.; supervision, N.G.; project administration, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Future Science and Technology City Branch of Hangzhou Stomatology Hospital. The data has already been used in previous work [16]. We applied for an informed consent waiver. Reason for waiver: The data used in this study is derived from patients’ panoramic dental X-rays, which were taken during their hospital visits. The X-ray images have been fully anonymized, with all personal identifiers removed, ensuring that there are no direct or indirect identifiers of the patients’ identities. Furthermore, the study does not involve any clinical trials.

Informed Consent Statement

Patient consent was waived as all panoramic X-ray pictures were taken by the patient’s hospital in the clinic, all of which had information about the patient deleted, including his name, residence, age, ID number, etc., ensuring that there were no identifiers of the patient’s identity from the X-ray images. Furthermore, the study did not involve any clinical experiments.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author on request due to privacy issues.

Acknowledgments

Thank you very much to Junchao Zhu and Nan Gao for their contributions to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Local Entropy Calculation Code

[Python Code: Local Entropy Calculation for Image]

import numpy as np

import cv2

from skimage.measure import shannon_entropy

from skimage.util import view_as_windows

def calculate_entropy(image):

hist, _ = np.histogram(image.flatten(),

bins=256, range=[0, 256])

hist = hist / hist.sum()

hist = hist[hist > 0]

entropy = -np.sum(hist ∗ np.log2(hist))

return entropy

def local_entropy (image, window_size = 8):

if image.shape[0] == 3:

image = cv2.cvtColor(image.transpose(1, 2, 0),

cv2.COLOR_RGB2GRAY)

windows = view_as_windows(image, (window_size, window_size))

entropy_map = np.zeros(windows.shape[:2])

for i in range (windows.shape [0]):

for j in range (windows.shape [1]):

window = windows [i, j]

entropy_map [i, j] = shannon_entropy (window)

return entropy_map

References

Herrera, D.M.; Haworth, B.M.; Keynan, Y. Review of evidence for using chest X-rays for active tuberculosis screening in long-term care in Canada. Front. Public Health 2020, 8, 16. [Google Scholar] [CrossRef] [PubMed]
Das, A.; Gorade, V.; Kumar, K.; Chakraborty, S.; Mahapatra, D.; Roy, S. Confidence-Guided Semi-supervised Learning for Generalized Lesion Localization in X-Ray Images. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2024; Springer: Cham, Switzerland, 2024; pp. 242–252. [Google Scholar]
Singh, A.; Gorade, V.; Mishra, D. OPTIML: Dense semantic invariance using optimal transport for self-supervised medical image representation. arXiv 2024, arXiv:2404.11868. [Google Scholar]
Chen, B.; Fu, S.; Liu, Y.; Pan, J.; Lu, G.; Zhang, Z. CariesXrays: Enhancing caries detection in hospital-scale panoramic dental X-rays via feature pyramid contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 21940–21948. [Google Scholar]
Wang, X.; Guo, J.; Zhang, P.; Chen, Q.; Zhang, Z.; Cao, Y.; Fu, X.; Liu, B. A deep learning framework with pruning ROI proposal for dental caries detection in panoramic X-ray images. In Proceedings of the International Conference on Neural Information Processing, Changsha, China, 20–23 November 2023; pp. 524–536. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Liu, Y.; Ma, C.; He, Z.; Kuo, C.; Chen, K.; Zhang, P.; Wu, B.; Kira, Z.; Vajda, P. Unbiased Teacher for Semi-Supervised Object Detection. arXiv 2021, arXiv:2102.09480. [Google Scholar]
Sajid, S.; Aziz, Z.; Urmonov, O.; Kim, H. Improving Object Detection Accuracy with Self-Training Based on Bi-Directional Pseudo Label Recovery. Electronics 2024, 13, 2230. [Google Scholar] [CrossRef]
Xu, M.; Zhang, Z.; Hu, H.; Wang, J.; Wang, L.; Wei, F.; Bai, X.; Liu, Z. End-to-end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3060–3069. [Google Scholar]
Zhou, Q.; Yu, C.; Wang, Z.; Qian, Q.; Li, H. Instant-Teaching: An end-to-end semi-supervised object detection framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4081–4090. [Google Scholar]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Gao, N.; Zhu, J.C.; Chen, P.; Tang, J.J.; Jiang, F.; Liang, R.H. ACPNet: Enhancing Small-Scale Dieases Detection in Panoramic X-rays. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine, Lisboa, Portugal, 3–6 December 2024. [Google Scholar]
Lee, K.; Lee, S. A New Framework for Measuring 2D and 3D Visual Information in Terms of Entropy. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 2015–2027. [Google Scholar] [CrossRef]
Shen, C.; Shen, L.; Li, M.; Yu, M. EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024. [Google Scholar]
Liu, L.; Zhang, B.; Zhang, J.; Zhang, W.; Gan, Z.; Tian, G.; Zhu, W.; Wang, Y.; Wang, C. MixTeacher: Mining promising labels with mixed scale teacher for semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Sohn, K.; Zhang, Z.; Li, C.; Zhang, H.; Lee, C.; Pfister, T. A simple semi-supervised learning framework for object detection. arXiv 2020, arXiv:2005.04757. [Google Scholar]
Shehzadi, T.; Hashmi, A.; Stricker, D.; Afzal, Z. Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Eattle, WA, USA, 16–22 June 2024; pp. 5840–5850. [Google Scholar] [CrossRef]
Guo, Q.; Mu, Y.; Chen, J.Y.; Wang, T.Q.; Yu, Y.Z.; Luo, P. Scale-equivalent distillation for semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Yang, Q.; Wei, X.; Wang, B.; Hua, X.; Zhang, L. Interactive self-training with mean teachers for semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5941–5950. [Google Scholar]
Li, G.; Li, X.; Wang, Y.; Wu, Y.; Liang, D.; Zhang, S. Pseco: Pseudo labeling and consistency training for semi-supervised object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 457–472. [Google Scholar]
Sage, A.; Badura, P. Detection and Segmentation of Mouth Region in Stereo Stream Using YOLOv6 and DeepLab v3+ Models for Computer-Aided Speech Diagnosis in Children. Appl. Sci. 2024, 14, 7146. [Google Scholar] [CrossRef]
Roy, A.; Dutta, H.; Bhuyan, A.K.; Biswas, S. On-Device Semi-Supervised Activity Detection: A New Privacy-Aware Personalized Health Monitoring Approach. Sensors 2024, 24, 4444. [Google Scholar] [CrossRef] [PubMed]
Haralick, R.M.; Dinstein, I.H.; Shanmugam, K. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Liu, J.; Lian, J.; Yu, Y. ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities. arXiv 2020, arXiv:2006.10550v3. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Liu, C.; Zhang, W.; Lin, X.; Zhang, W.; Tan, X.; Han, J.; Li, X.; Ding, E.; Wang, J. Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Zhou, H.; Ge, Z.; Liu, S.; Mao, W.; Li, Z.; Yu, H.; Sun, J. Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection. In Computer Vision—ECCV 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Chen, B.; Chen, W.; Yang, S.; Xuan, Y.; Song, J.; Xie, D.; Pu, S.; Song, M.; Zhuang, Y. Label Matching Semi-Supervised Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]

Figure 1. The visualizations of the four categories of dental diseases—low-density shadow, high-density shadow, root tip shadow, and obstructed teeth—show that their lesions vary in shape, location, and appearance.

Figure 2. Visualization of attention maps before and after entropy-guided fusion. The left figure (a) shows the attention heatmap without entropy guidance, where the model focuses on irrelevant anatomical structures such as image borders and central teeth areas. The right figure (b) shows the attention heatmap after integrating entropy maps, where the model concentrates on diagnostically meaningful regions, such as tooth tips, sides, and roots—commonly associated with caries and apical shadows. This highlights the effectiveness of entropy maps in enhancing attention to small-scale lesions.

Figure 3. For unlabeled data, the model first employs the feature extraction network

f (θ)

—comprising a ResNet backbone and an FPN—to construct feature pyramids

{P_{i}^{+}}

from the augmented views and

{P_{i}^{E}}

from the corresponding entropy maps. These two pyramids are then dynamically fused through the EIA module to generate an Entropy-Guided Feature Pyramid

{P_{i}^{\times}}

. Subsequently, the model makes independent predictions using the R-CNN head on both

{P_{i}^{+}}

and

{P_{i}^{\times}}

. The red box represents the predicted result. In addition, we employ LCPLM strategy to further identify high-quality yet low-confidence pseudo labels. The final pseudo labels used for supervising the student model are filtered by a class-adaptive threshold. The teacher model’s parameters are updated via the EMA of the student’s weights. During inference, only the augmented view is used for prediction.

Figure 3. For unlabeled data, the model first employs the feature extraction network

f (θ)

—comprising a ResNet backbone and an FPN—to construct feature pyramids

{P_{i}^{+}}

from the augmented views and

{P_{i}^{E}}

from the corresponding entropy maps. These two pyramids are then dynamically fused through the EIA module to generate an Entropy-Guided Feature Pyramid

{P_{i}^{\times}}

. Subsequently, the model makes independent predictions using the R-CNN head on both

{P_{i}^{+}}

and

{P_{i}^{\times}}

. The red box represents the predicted result. In addition, we employ LCPLM strategy to further identify high-quality yet low-confidence pseudo labels. The final pseudo labels used for supervising the student model are filtered by a class-adaptive threshold. The teacher model’s parameters are updated via the EMA of the student’s weights. During inference, only the augmented view is used for prediction.

Figure 4. Our model’s detection results on the dental disease dataset are compared with those of Soft Teacher and PseCo. The white arrows indicate small targets that are only detected by our model.

Table 1. Experimental results of original view and original + entropy on Dental Disease Dataset.

View	${AP}_{50}$	${AP}_{S}$
Original	51.5	39.5
Original + Entropy	54.2 (+2.7)	42.4 (+2.9)

Table 2. The distribution of small (≤32 × 32), medium (>32 × 32 and ≤96 × 96), and large (>96 × 96) diseases in the Dental Disease Dataset.

Dataset	Abnormalities	Small	Medium	Large
Dental Disease Dataset	high-density shadow	164	274	60
	low-density shadow	114	57	0
	root tip shadow	110	92	10
	obstructed teeth	3	177	96

Table 3. Comparison of different methods on the Dental Disease Dataset. Metrics include

A P_{50}

,

A P

, and

A R

across small, medium, and large objects.

Table 3. Comparison of different methods on the Dental Disease Dataset. Metrics include

A P_{50}

,

A P

, and

A R

across small, medium, and large objects.

Model	${AP}_{50}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$	${AR}_{S}$	${AR}_{M}$	${AR}_{L}$
FCOS [9] (Supervised)	31.0	11.2	36.9	52.0	34.5	62.7	66.9
ARSL [33]	31.6	23.2	41.8	45.2	45.8	58.3	56.0
Dense Teacher [34]	44.3	32.6	47.9	53.6	51.0	60.2	69.4
Faster R-CNN [6] (Supervised)	41.2	25.7	42.3	70.8	52.0	63.0	76.9
Unbiased Teacher [10]	51.5	39.5	58.9	56.2	56.2	69.9	70.0
Soft Teacher [12]	52.2	39.0	57.3	57.4	60.0	75.7	72.0
MixTeacher [19]	54.4	41.2	59.5	62.4	62.8	74.1	78.5
LabelMatch [35]	52.6	41.1	53.3	70.4	59.8	68.1	75.4
PseCo [24]	53.3	40.8	60.7	61.1	64.2	71.9	74.0
Entropy Teacher (Ours)	55.9	45.3	59.4	70.7	68.1	75.2	80.5

Table 4. Comparison of different methods on ChestX-Det under 10%, 30%, and 50% labeled data. Metric is

A P_{50}

.

Table 4. Comparison of different methods on ChestX-Det under 10%, 30%, and 50% labeled data. Metric is

A P_{50}

.

Model	10%	30%	50%
FCOS [9] (Supervised)	8.6	13.1	17.1
ARSL [33]	7.9	12.1	17.5
Dense Teacher [34]	18.5	26.7	29.2
Faster R-CNN [6] (Supervised)	9.5	17.6	19.7
Unbiased Teacher [10]	20.6	30.0	37.3
Soft Teacher [12]	19.5	30.1	33.8
MixTeacher [19]	21.8	33.0	36.4
LabelMatch [35]	15.2	21.7	28.4
PseCo [24]	18.0	29.7	32.2
Entropy Teacher (Ours)	22.2	34.1	37.9

Table 5. An ablation analysis of our method, where LCPLM refers to low-confidence pseudo-label mining strategy. ✔ represents the use of this module.

FPN			LCPLM	${AP}_{50}$	${AP}_{S}$
$P^{+}$	$P^{E}$	EIA	LCPLM	${AP}_{50}$	${AP}_{S}$
				39.6	34.8
✔				51.5 (+11.9)	39.5 (+4.7)
✔	✔			53.5 (+13.9)	41.7 (+6.9)
✔	✔	✔		54.2 (+14.6)	42.4 (+7.6)
✔	✔	✔	✔	55.9 (+16.3)	45.3 (+10.5)

Table 6. Generalization study of our Entropy-Guided Feature Pyramid (EGFP) integrated into different semi-supervised frameworks.

Method	${AP}_{50}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$
Unbiased Teacher	51.5	39.5	58.9	56.2
Unbiased Teacher + EGFP	54.2 (+2.7)	42.4 (+2.9)	58.1	69.8
Soft Teacher	52.2	39.0	57.3	57.4
Soft Teacher + EGFP	54.8 (+2.6)	43.3 (+4.3)	57.1	68.6
PseCo	53.3	40.8	60.7	61.1
PseCo + EGFP	54.1 (+0.8)	42.9 (+2.1)	59.1	71.1

Table 7. Ablation study on pseudo-label filtering thresholds

τ_{h}

. CAT denotes the proposed class-adaptive threshold.

Table 7. Ablation study on pseudo-label filtering thresholds

τ_{h}

. CAT denotes the proposed class-adaptive threshold.

$τ_{h}$	${AP}_{50}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$
0.8	54.3	43.4	58.4	69.9
0.85	55.5	44.9	58.8	71.2
0.9	54.8	44.1	58.1	70.3
CAT	55.9	45.3	59.4	70.7

Table 8. Ablation study on lower threshold

τ_{l}

of LCPLM.

Table 8. Ablation study on lower threshold

τ_{l}

of LCPLM.

$τ_{l}$	${AP}_{50}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$
0.5	55.9	45.3	59.4	70.7
0.6	54.3	44.9	58.4	70.2
0.7	53.5	42.9	57.5	69.3

Table 9. Ablation study on promotion threshold

Δ q

of LCPLM.

Table 9. Ablation study on promotion threshold

Δ q

of LCPLM.

$Δ q$	AvgBox	${AP}_{50}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$
0.0	4.15	54.2	42.4	56.7	68.4
0.1	4.81	55.9	45.3	59.4	70.7
0.2	4.32	55.5	44.7	58.7	70.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, J.; Gao, N. Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays. Electronics 2025, 14, 2612. https://doi.org/10.3390/electronics14132612

AMA Style

Zhu J, Gao N. Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays. Electronics. 2025; 14(13):2612. https://doi.org/10.3390/electronics14132612

Chicago/Turabian Style

Zhu, Junchao, and Nan Gao. 2025. "Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays" Electronics 14, no. 13: 2612. https://doi.org/10.3390/electronics14132612

APA Style

Zhu, J., & Gao, N. (2025). Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays. Electronics, 14(13), 2612. https://doi.org/10.3390/electronics14132612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropy Teacher: Entropy-Guided Pseudo Label Mining for Semi-Supervised Small Object Detection in Panoramic Dental X-Rays

Abstract

1. Introduction

2. Related Works

2.1. Semi-Supervised Object Detection

2.2. Semi-Supervised Learning in Medical Imaging

3. Materials and Methods

3.1. Basic Pseudo-Labeling Framework

3.2. Entropy-Guided Feature Pyramid

3.3. Low-Confidence Pseudo-Label Mining Strategy

3.3.1. Class-Adaptive Threshold for Pseudo-Label Filtering

3.3.2. Quality-Aware Mining of Low-Confidence Pseudo Labels

3.4. Loss Function of Entropy Teacher

4. Results

4.1. Dataset and Evaluation Metrics

4.2. Implementation Details

4.3. Effectiveness of Our Approach

4.4. Ablation Study

4.4.1. Investigation of Designed Components

4.4.2. Generalization to Other Semi-Supervised Frameworks

4.4.3. Impact of Pseudo-Label Confidence Threshold τ h on Detection Performance

4.4.4. Impact of Lower Threshold τ l in LCPLM Strategy

4.4.5. Influence of Promotion Threshold Δ q in LCPLM

4.5. Qualitative Visualization

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Local Entropy Calculation Code

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4.3. Impact of Pseudo-Label Confidence Threshold $τ_{h}$ on Detection Performance

4.4.4. Impact of Lower Threshold $τ_{l}$ in LCPLM Strategy

4.4.5. Influence of Promotion Threshold $Δ q$ in LCPLM