3.3.1. Dynamic Decision Boundaries and Pseudo-Label Learning
The training strategy begins with the generation of robust pseudo-labels to guide the learning process when ground-truth annotations are unavailable. Due to the limited robustness of conventional fixed-threshold methods under varying operational conditions, this work proposes an instance-level dynamic decision mechanism based on multi-criterion fusion. First, class prototypes
are constructed via clustering of target features, while source anchors
are derived from the classification weight matrix
of the pre-trained source model. On this basis, a dynamic commonness score function is defined as shown in Equation (8).
where
denotes cosine similarity. This function incorporates both target-domain similarity and source-domain calibration. To overcome the limitations of fixed thresholds, a dynamic decision boundary, as shown in Equation (9) related to both the instance and the category is learned based on the distribution of target domain features.
where
and
are obtained by fitting a two-component Gaussian Mixture Model (GMM) to the magnitudes of all target feature vectors. Based on the dynamic decision boundary described above, the pseudo-label assignment rule is defined as shown in Equation (10).
A weighted pseudo-label learning loss function was designed to facilitate model training using generated pseudo-labels while mitigating the negative impact of label noise, as shown in Equation (11).
where
denotes the softmax probability that target sample
belongs to the
k-th class,
is the one-hot encoded pseudo-label, and
is a weighting factor based on
t-distribution confidence to effectively suppress the influence of low-quality pseudo-labels.
where
represents the degrees of freedom of the
t-distribution.
3.3.3. Transfer Enhancement Method Based on Dynamic Semantic Calibration
A dynamic semantic calibration mechanism is proposed to mitigate the negative transfer effect where source domain private classes (SPCs) may interfere with the correct feature alignment of target domain shared classes in scenarios lacking passive data. This mechanism’s key innovation lies in its exclusive use of the pre-trained source model’s parameters and target domain features to intelligently reconstruct the target feature distribution. This enhances the feature consistency of shared classes while suppressing interference from private classes.
First, a category weight evaluation system based on consensus within the target domain is established. Each column
of the classifier weight matrix
from the pre-trained source model is regarded as the feature space prototype for the
k-th class in the source domain. To evaluate the likelihood of each source class being a shared class in the target domain, a dual-filtered weighting mechanism is designed. This mechanism focuses on target domain samples in the current training batch
that have been assigned pseudo-labels, thereby excluding unknown-class samples labeled as “unknown”. The set of batch sample indices with pseudo-label
k is defined as shown in Equation (14).
The initial class weight
is computed by considering both the number of target samples assigned to the class and their average confidence.
where
denotes the predictive entropy of sample
and
represents the entropy of a uniform distribution over
C classes for normalization. The weighting term is designed to assign higher contributions to low-entropy samples. The underlying rationale is that a class with more pseudo-labeled samples and higher confidence in the target domain is more likely to be a shared class. Conversely, a class with fewer samples or generally lower confidence is more likely to be a source-private class or affected by unknown classes. To eliminate scale variations and enhance numerical stability, min-max normalization is applied to the initial weights as shown in Equation (16).
where
is a small positive value included to prevent division by zero. The normalized weight
clearly quantifies the relative probability of each source class being identified as a shared class within the current target batch.
Subsequently, semantic-shift reconstruction based on the normalized weights
is performed. The features of target domain samples are dynamically reconstructed using the evaluated weights. The primary objective of this reconstruction is to adaptively calibrate the feature distribution in the absence of source data by pulling sample features
closer to the prototypes of classes with high weights, while pushing them away from prototypes of classes with low weights.
where
denotes the vector pointing from the sample feature
to the prototype
of the
k-th source class. The coefficient
dynamically scales the contribution of this vector. For classes with high
, the term significantly pulls
toward the corresponding prototype. For classes with low
, the contribution is negligible. The factor
controls the overall strength of the reconstruction. This operation essentially performs a dynamically weighted blending between the sample feature
and each source class prototype
in the feature space, guided by the class weight
. This process encourages the model to cluster features of potential shared classes in the target domain toward their corresponding source prototypes, while allowing features of private or unknown classes to remain unchanged or be repelled. As a result, spatial consistency of shared-class features is enhanced, and the negative impact of private classes is effectively suppressed.
Finally, a transfer enhancement loss function is constructed that incorporates both reconstruction reliability and information richness. The reconstructed feature is fed into the classifier to obtain its prediction distribution . The loss function consists of two key components:
Reconstructed Feature Classification Loss
: This is a standard cross-entropy loss that supervises the classification of the reconstructed features using their currently assigned pseudo-labels
. To mitigate the inherent noise in these pseudo-labels, this loss is computed only for samples whose pseudo-label is not “unknown”:
The objective is to maintain or enhance the discriminative capability of the restructured features on shared class tasks, enabling convergence to more robust solutions even when confronted with noisy pseudo labels.
Mutual Information Regularization Loss
: To enhance the semantic consistency between the feature representation
and its predictive distribution
during optimization, and to increase the confidence of predictions, we introduce an approximation of the mutual information loss based on the Jensen–Shannon (JS) divergence:
where
denotes a K-dimensional uniform distribution. Maximizing the mutual information between
and
is equivalent to minimizing their
divergence, which encourages the predictive distribution
to diverge from the uniform distribution toward a more peaked form, thereby enhancing the model’s confidence in its predictions.
The final transfer enhancement loss
is formulated as a weighted combination of the two aforementioned losses:
where
denotes the regularization strength coefficient, with a default value of
.
This study presents a progressive optimization strategy designed to address the core challenges in target domain training, including the inaccessibility of source data, the complete absence of labels in the target domain and the potential presence of unknown fault classes. The primary objective is to gradually extract diagnostic information from the target domain while minimizing interference through self-supervised learning and knowledge transfer mechanisms.
First, pseudo-labels robust to label noise are generated based on the principle of feature space disentanglement. Then, neighborhood-expanded contrastive learning is applied to enhance intra-class feature consistency. Finally, dynamic calibration of cross-domain feature distribution is performed to suppress negative transfer effects caused by interference from source-private classes. The total loss function for training the target domain model is defined as shown in Equation (21). The specific training process is shown in Algorithm 1.
where
denotes the pseudo-label learning loss,
denotes the neighborhood contrastive loss,
denotes the transfer enhancement loss, and
λ denotes the weighting coefficient for the contrastive loss. The hyperparameter
λ is set to 0.2. A sensitivity analysis of the hyperparameter
λ is presented in
Section 4.5.3.
Algorithm 1: The proposed FD-SFUniDA method |
Input: Pre-trained source model gs, hs; Unlabelled target domain data ; Max-epoch Imax; Batch size bn; Trade-off hyperparameters λ |
Output: feature extractor gt, ht after updating parameters by target data |
Initialize gt←gs, ht←hs |
Perform SVD of classifier weights |
For epoch = 1 to Imax |
For iter = 1 to N do |
Generate pseudo-labels |
Compute pseudo-label learning loss |
Construct positive/negative pairs |
Compute neighborhood extension contrast loss |
Reconstruct features using semantic calibration |
Compute transfer enhancement loss |
Combine losses: |
Update gt and ht parameters |
End for |
End for |