TDMP-Net constructs a 3D multi-source information space of “semantics–distance–entropy”, and specifically, this process completes the extraction of dimensional information in three steps: for “distance information”, first calculate the mean value of the class features of support set samples output by the collaborative optimization feature extractor to obtain class prototypes, then calculate the distance between query set samples and class prototypes, which serves as the “distance information” in the 3D space; for “semantic information”, process the features of support set samples through a feedforward neural network and subsequently predict the class scores of the query set, with these scores used as the “semantic information” in the 3D space; and for “entropy information”, based on the previously obtained distance information and semantic information, calculate the fused feature entropy of the two, and this entropy value is taken as the “entropy information” in the 3D space to quantify the uncertainty of feature distribution.
4.3.2. Semantic Information
For class
j, the semantic information is defined as
where
denotes the classifier function with parameters
.
represents the semantic information vector of the query sample
, where each element corresponds to the probability that the query belongs to one of the
n classes:
.
To estimate the class center more accurately and reduce the interference of critical samples, we introduce a critical sample classification mechanism to estimate the critical score of samples. Specifically, for n-class query samples, the semantic enhancement module introduces a critical class decision: the first n classes represent the probabilities that the query sample belongs to each class, and the -th class represents the probability that the query sample belongs to the critical class. In the threshold dynamic data augmentation module, we input the semantic information of the first n classes, where the sum of probabilities of the classes is 1, i.e., . In the multi-source information decision, we input n classes, where the sum of probabilities of the n classes is 1, i.e., . The critical class mechanism effectively reduces the weight of critical samples in prototype estimation, thereby estimating the class center more accurately.
4.3.3. Entropy Information
To fully quantify the uncertainty of query samples in the multi-source information space and provide support for subsequent module decisions, TDMP-Net designs two typical entropy calculation paradigms, each corresponding to different uncertainty management requirements.
Fused Probability Entropy (FPE). This paradigm first fuses the probability distributions of distance information and semantic information into a unified fused distribution, then calculates the entropy of this fused distribution. Its core is to quantify the overall uncertainty of the query sample in the comprehensive feature space and output a single entropy value to reflect the global ambiguity. The fused probability
that the query sample
belongs to class j is defined as
where
to ensure the validity of the probability distribution.
is a hyperparameter that can adjust the importance of distance information and semantic information according to task characteristics. Subsequently, the Shannon entropy of the fused probability vector
is calculated and taken as the FPE
, which is used to quantify the overall uncertainty of the query sample:
where the physical meaning of
is clear: a higher value indicates stronger ambiguity in the sample’s class attribution in the distance–semantic joint space, while a lower value means the sample’s class attribution is more certain. This entropy is mainly applied in the threshold dynamic data augmentation module: it serves as the core threshold criterion to screen samples. Specifically, if
is lower, it indicates that the sample has high confidence in class determination, and the sample is incorporated into the support set augmentation process. This helps correct the class prototype drift caused by insufficient support set samples or distribution bias.
Source Information Entropy (SIE). This paradigm first calculates the entropy of each information source independently, then quantifies the relative uncertainty differences between sources through the ratio of the two entropies. Its core is to evaluate the reliability of each information source and guide the weight adjustment of multi-source features. First, the entropy of the distance information vector
(denoted as
) and the entropy of the semantic information vector
(denoted as
) are calculated separately:
This information is mainly used in the multi-source information decision (MSID): it guides the dynamic adjustment of feature weights. For example, when , the MSID assigns a higher weight to to avoid classification errors caused by unreliable distance metrics; conversely, the MSID increases the weight of to leverage the stability of prototype-based classification.
The two entropy calculation methods are not mutually exclusive but form a complementary uncertainty management mechanism in TDMP-Net: the FPE provides a global uncertainty indicator for screening samples that need augmentation, solving the prototype drift problem; the SIE provides a local reliability metric for balancing multi-source feature weights, solving the metric constraint problem.
4.3.4. Threshold Dynamic Data Augmentation Module
To address the prototype drift problem, the threshold dynamic data augmentation module (TDDAM) is designed based on the transductive few-shot learning assumption. Via multi-source information, the module filters out query set samples similar to the support set, and further leverages such similar query samples to augment support prototypes, ultimately improving the accuracy of class prototype estimation.
We retrieve the most similar R samples from the query set Q for each support sample in
, the support set of class c. The objective of this retrieval is defined as
where
denotes the set of class indices, and
represents the similarity metric between the query set
Q and the support set
. The retrieved similar samples satisfy the following screening condition:
where
denotes the set of feasible transport plans,
is the distance metric between samples,
C is the cost matrix,
T is the transport plan, and
denotes the inner product of
C and
T. The final set of retrieved similar samples is expressed as
The cost
C is a cost matrix, where each element
represents the matching cost between the
i-th query sample
and the
j-th support sample
. Physically, this cost is the ratio of matching confidence to prediction certainty, which characterizes the matching reliability of sample pairs by coupling the two dimensions of “confidence” and “prediction certainty” while avoiding the drawback that a single probability index is susceptible to noise interference, and its calculation is as follows:
The denominator in the formula represents the entropy of the fused probability distribution (formed by combining the semantic enhancement module probability and the prototype branch probability ). The negative sign ensures the entropy is non-negative, which conforms to the definition of information entropy.
A threshold determination method based on statistical distribution is used to calculate
R (the number of similar samples to retrieve). The core idea of this method is to construct a threshold using the mean
and standard deviation
of sample data, count the number of samples that meet the threshold condition, and finally compute the average and convert it to an integer as
R:
where
m is the number of query samples,
n is a learnable parameter,
is the mean of the cost matrix, calculated as
(where
k is the number of support samples), and
is the standard deviation of the cost matrix, calculated as
;
is an indicator function, defined as
where
denotes the
i-th element in the cost matrix
C.
Subsequently, an optimal transport plan is established between the support set
and the query set
Q using the augmented information
. This transport plan is denoted as
and is calculated using the Sinkhorn algorithm [
18]. Then, the augmented information
from the query set is adapted to the task through barycentric mapping, which minimizes the total transport cost. The augmented support set
is calculated as follows.
First, the adapted representation of each augmented sample
is determined by minimizing the total transport cost:
where
is an element of the optimal transport plan
, and
denotes the cost associated with the augmented sample set
.
The solution to this minimization problem corresponds to the weighted average of the support samples
, leading to the final augmented support set:
where
denotes a diagonal matrix constructed from the input vector,
is a column vector of ones with length
(number of samples in
), and the superscript
denotes matrix inversion.
After adapting the augmented information
, we combine it with the original support sample representations to compute the final class prototype for class
c:
where
denotes the union of the original support set
and the augmented support set
, and
denotes the operation of computing the average of sample features in the set.