Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes

Li, Xinhai; Meng, Chenxu; Zhou, Heng; Guo, Yi; Xue, Bowen; Yu, Tianzuo; Lu, Yunan

doi:10.3390/electronics14132736

Open AccessArticle

Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes

by

Xinhai Li

¹,

Chenxu Meng

¹,

Heng Zhou

¹,

Yi Guo

²

,

Bowen Xue

³,

Tianzuo Yu

³ and

Yunan Lu

^3,*

¹

Zhongshan Power Supply Bureau, China Southern Power Grid Co., Ltd., Zhongshan 528400, China

²

Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

³

Department of Computing, The Hong Kong Polytechnic University, Hong Kong 999077, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2736; https://doi.org/10.3390/electronics14132736

Submission received: 27 May 2025 / Revised: 26 June 2025 / Accepted: 4 July 2025 / Published: 7 July 2025

(This article belongs to the Special Issue Neural Networks: From Software to Hardware)

Download

Browse Figures

Versions Notes

Abstract

Label Distribution Learning (LDL) has emerged as a powerful paradigm for addressing label ambiguity, offering a more nuanced quantification of the instance–label relationship compared to traditional single-label and multi-label learning approaches. This paper focuses on the challenge of noisy label distributions, which is ubiquitous in real-world applications due to the annotator subjectivity, algorithmic biases, and experimental errors. Existing related LDL algorithms often assume a linear combination of true and random label distributions when modeling the noisy label distributions, an oversimplification that fails to capture the practical generation processes of noisy label distributions. Therefore, this paper introduces an assumption that the noise in label distributions primarily arises from the semantic confusion between labels and proposes a novel generative label distribution learning algorithm to model the confusion-based generation process of both the feature data and the noisy label distribution data. The proposed model is inferred using variational methods and its effectiveness is demonstrated through extensive experiments across various real-world datasets, showcasing its superiority in handling noisy label distributions.

Keywords:

label ambiguity; multi-label learning; label distribution learning

Graphical Abstract

1. Introduction

Label Distribution Learning (LDL) [1] is emerging as a potent paradigm for mitigating label ambiguity. In the context of LDL, the training instance is annotated with a label distribution vector, where each component signifies the degree to which each label describes the instance. Distinct from conventional single-label learning and multi-label learning, in which only the presence or absence of label relevance is indicated, LDL offers a more detailed quantification of the instance–label relationship. This nuanced approach enhances the applicability of LDL across various domains, including age estimation [2,3,4,5,6], affective analysis [7,8,9,10,11], and rating prediction [12,13,14,15].

Current works in LDL predominantly emphasize enhancing generalization performance. For instance, some research efforts have been directed towards developing more robust LDL models that effectively represent the mapping from features to label distributions [4,13,16]. Additionally, there studies have focused on devising regularized loss functions that encapsulate prior knowledge about the mapping from features to label distributions [17,18,19]. While these algorithms have demonstrated commendable performance across various datasets, they predominantly rely on high-quality training label distributions. However, the quality of training label distributions in many real-world applications is not assured. Human-annotated label distributions are particularly vulnerable to factors such as annotator subjectivity and expertise levels. Moreover, label distributions produced by non-human systems such as label smoothing algorithms and experimental apparatuses are prone to the limitations of algorithmic assumptions, data distribution biases, and intrinsic errors on the part of the experimental apparatus. These combined factors pose significant challenges in guaranteeing the quality of training label distributions, potentially leading to the unreliability of existing LDL algorithms. Although numerous efforts have been made to tackle this problem, they typically rely on the straightforward assumption that a noisy label distribution is a linear combination of the true label distribution and a random label distribution [20]. However, this simplified assumption falls short of accurately capturing the complex and nuanced processes of generating noisy label distributions in practical scenarios.

Therefore, in this paper we investigate the mechanisms underlying the generation of noisy label distributions and propose a novel generative label distribution learning algorithm. Specifically, we first propose that the noise in label distributions primarily arises from the semantic confusion between labels. Taking the image recognition task as an example, it is evident that cats and dogs can be accurately distinguished, whereas horses and donkeys are prone to being confused. Unlike the linear noise assumption, label noise based on semantic confusion more precisely captures the underlying noise generation mechanism between labels and is better able to model the complex uncertainties present in human-annotated label distributions. To address this noise generation mechanism, we construct a probabilistic graphical model to fundamentally model this process. The generation and inference processes of the proposed model are illustrated in Figure 1. During the generation process, the latent true label distribution is obtained by normalizing a normal vector through the softmax function. Meanwhile, we assume that the probabilities of semantic confusion between labels adhere to a Dirichlet distribution, allowing the complex dependencies between labels to be captured. Subsequently, in order to utilize the mutual confusion among labels, we assume that the value of each label in the noisy label distribution is affected by the weighted combination of the values of other labels, where the weights are the probabilities of other labels being misclassified as that label. Furthermore, the feature vector for each instance is assumed to be generated from a normal distribution, the parameters of which are determined by the nonlinear transformation of the true label distribution, thereby establishing the association between features and label distributions. In the inference process, we derive the posterior distribution of the true label distribution by variational methods and derive a differentiable variational lower bound through the aforementioned generation process. Finally, we conduct extensive experiments on a wide range of real-world datasets to validate the effectiveness and superiority of the proposed algorithm.

2. Related Work

To predict the label distributions for unseen instances, the primary approach involves learning a mapping from feature vectors to label distributions based on the feature vectors of training instances and their corresponding label distributions (a process termed LDL, i.e., label distribution learning) and subsequently utilizing this mapping to output the label distribution for test instances. The crucial challenge of LDL lies in the design of LDL algorithms with robust generalization performance. Existing research on this issue predominantly focuses on two aspects, namely, model representation and loss functions.

Studies on model representation have primarily concentrated on the manner in which the mapping from feature vectors to label distributions is represented. To date, the findings about this issue can be categorized into three distinct classes. The first class comprises maximum entropy LDL models [1]. Maximum entropy LDL models are derived from the principle of maximum entropy [21]. Formally, these models typically perform exponential normalization on the linear or nonlinear mapping of feature vectors to produce the label distribution. The second class is prototype-based models, wherein the mapping from features to label distributions is constructed through existing examples in the training set. For example, the AAkNN algorithm [1] extends the traditional k-nearest neighbors algorithm to accommodate the task of LDL. The GMML-kLDL algorithm [22] introduces a distance metric that captures label correlation and classification information, which is then used to identify the nearest neighbors of a test instance and leverage their label distribution information to predict the label distribution of the test instance. Furthermore, the LDL-LCR algorithm [23] seeks several nearest neighbors of a test instance within the training set and reconstructs the feature vector of the test instance based on these neighbors, ultimately combining the reconstruction coefficients with the neighbors’ label distributions to derive the label distribution of the test instance. The LDSVR algorithm [13] utilizes kernel techniques to assess the correlation between test and training instances, then combines these correlations as weights to generate the label distribution for the test instance. The third class is ensemble models, which aim to directly extend existing ensemble learning models to meet the requirements of LDL. Specifically, the LDLogitBoost algorithm [16] extends the LogitBoost algorithm [24] to LDL by introducing weighted regressors, resulting in a boosting algorithm for LDL. The LDLFs algorithm [3,4] applies the differentiable decision tree model to LDL with the goal of enhancing the model’s ability to fit the distribution form. In addition, the ENN-LDL algorithm [25] integrates ensemble neural networks into LDL through an ensemble architecture designed to improve the model’s capacity to represent the highly correlated label distributions. Similarly, the StructRF algorithm [26] introduces the structured random forest model into label distribution learning tasks, which aims to enhance the representation of highly correlated label distributions.

Studies on loss functions can be roughly divided into three categories. The first class focuses on mining and utilizing the label correlation or sample correlation within label distributions. For example, the LDLLC algorithm [27] constructs a distance metric to calculate the label correlation based on the training label distributions, then uses the distance between any two labels to regulate the distance between the corresponding column vectors in the parameter matrix. This ensures that the learning process of the label distribution model maintains the label correlation based on the training label distributions. The LALOT algorithm [28] treats the process of mining label correlation as a metric learning problem, employing the optimal transport distance to capture the geometric information of the underlying label space. The LDL-SCL algorithm [29] introduces the local label correlation assumption, positing that label correlation within label distributions varies across different sample clusters; accordingly, it constructs a local correlation vector as an additional feature for each sample to capture this characteristic. The EDL-LRL algorithm [9], also based on the local label correlation, employs a low-rank structure on local samples to mine local label correlation. The LDL-LCLR algorithm [30] simultaneously learns global and local label correlations, then mines them through low-rank approximation and sample clustering, respectively. The LDLSF algorithm [31] proposes a new label correlation assumption in the field of LDL, i.e., that the correlated labels should have similar corresponding components in the output label distribution. The LDL-LDM algorithm [17] utilizes global and local label correlation in a data-driven manner which first learns the manifold structure of label distributions and promotes the model output to also distribute within this manifold. The LDL-LRR [18] and LDL-DPA [19] algorithms propose regularizing the learning process by using the label rankings within the label distributions. The second class explores how to improve the generalization ability of models through label embedding. For instance, the MSLP algorithm [32] achieves label embedding for label distribution learning via multi-scale location preservation. The BC-LDL algorithm [33] integrates the label distribution information into the binary coding process to produce high-quality binary codes. The DBC-LDL algorithm [34] considers an efficient discrete coding framework to learn the binary coding of instances. The third class designs loss functions for different variants of LDL task. For example, the IncomLDL algorithm [35] proposes that the label distribution matrix is of low rank and proposes an optimization objective based on trace norm minimization to address incomplete LDL scenarios where the training label distributions have missing values. The IncomLDL-LR algorithm [36] assumes that the label distribution of each example can be linearly reconstructed based on its neighbors; as such, it recovers the missing label distribution information through the label distributions of nearest neighbor examples based on feature vectors. The GRME [37] and IncomLDL-LCD [38] algorithms recover the missing label distribution values by mining the correlation between labels. GLDL [39] constructs a graph-structured LDL framework which explicitly models the instance relation and label relation. In addition, some LDL studies have focused on weakly supervised LDL; for example, certain LDL algorithms are specifically designed to address the noise in LDL [20,40,41]. It should be noted that although the GCIA algorithm [41] utilized confusion matrix to describe the noise generation process, which is similar to our work, these two approaches have two significant differences. First, GCIA implicitly captures the dependency between the ground-truth label distribution and the sample features using a latent clustering variable. In contrast, our method directly conditions the sample feature variable on the ground-truth label distribution. By modeling this dependency explicitly, our method aims to better exploit the relationship between features and labels. Second, GCIA treats the confusion matrix as a learnable parameter; however, this approach does not account for the inherent uncertainty of the confusion matrix in real-world scenarios. In contrast, our method models the confusion matrix as a set of Dirichlet random vectors to more effectively capture the uncertainty and variability of the semantic confusion. Finally, several LDL algorithms have proposed learning the label distributions by using more accessible label forms such as logical labels [42,43,44], ternary labels [45], or label rankings [46,47,48].

3. Methodology

The commonly used mathematical notation is shown in Table 1. We address training datasets that appear as data pairs

{(x_{n}, y_{n})}_{n = 1}^{N}

. Our goal is to learn a label distribution predictor

f : X^{D} \to Δ^{M}

(i.e., a mapping from the feature space to the label distribution space) using the training dataset

{(x_{n}, y_{n})}_{n = 1}^{N}

. The generation process of the observations can be formalized as follows.

Generate a sample of the latent logits $u \in {(0, 1)}^{M}$ from a standard normal prior:

$u \sim Norm (u ∣ 0, diag (1)) .$

(1)
Generate a sample of the confusion vector of each label $ω_{m} \in Δ^{M}$ from a Dirichlet prior:

${ω_{m}}_{m = 1}^{M} \sim \prod_{m = 1}^{M} Diri (ω_{m} ∣ {[α I (t = m) + 1]}_{t = 1}^{M})$

(2)

where the t-th element of $ω_{m}$ (i.e., $ω_{m t}$ ) denotes the probability of the t-th label being mis-annotated as the m-th label.
Generate a sample of the noisy label distribution from a Dirichlet distribution conditioned on the latent logits and the confusion vector:

$z ∣ u, {ω_{m}}_{m = 1}^{M} \sim Diri (z ∣ {[〈 ω_{m}, exp (u) 〉]}_{m = 1}^{M}) .$

(3)
Generate observations of the feature variables:

$x ∣ u \sim Norm (x ∣ H (u; Θ_{mean}^{(p)}), diag (λ^{- 1} 1))$

(4)

where $λ$ is the precision of the normal distribution for generating the feature vector.

The joint probability distribution of the complete data can be decomposed as follows:

p (z, x, u, {ω_{m}}_{m = 1}^{M}) = p (x | u) p (z | u, {ω_{m}}_{m = 1}^{M}) p (u) p ({ω_{m}}_{m = 1}^{M}) .

(5)

4. Variational Inference

We aim to infer the posterior distribution of the latent variables for practical decision-making. However, due to the complicated dependencies among the variables, obtaining an exact solution for the posterior distribution of the latent variables is challenging. Consequently, we employ a parameterized variational distribution

q (u, {ω_{m}}_{m = 1}^{M} | x, z)

to approximate the true posterior distribution. Formally, we aim to minimize the Kullback–Leibler (KL) divergence between the variational posterior and the true posterior [49]:

\underset{q (u, {ω_{m}}_{m = 1}^{M} | x, z) \in Q}{\arg \min} D_{KL} (q (u, {ω_{m}}_{m = 1}^{M} | x, z) ∥ p (u, {ω_{m}}_{m = 1}^{M} | x, z))

(6)

where

D_{KL}

denotes the KL divergence and

Q

denotes the variational posterior family. Because

p (u, {ω_{m}}_{m = 1}^{M} | x, z)

is intractable, we rewrite Equation (6) as Equation (7):

\begin{matrix} D_{KL} (q (u, {ω_{m}}_{m = 1}^{M} | x, z) ∥ p (u, {ω_{m}}_{m = 1}^{M} | x, z)) \\ = & log p (x, z) - \int q (u, {ω_{m}}_{m = 1}^{M} | x, z) log \frac{p (z, x, u, {ω_{m}}_{m = 1}^{M})}{q (u, {ω_{m}}_{m = 1}^{M} | x, z)} d u d ω_{1} \dots d ω_{M} \\ = & log p (x, z) - (E_{q (u, {ω_{m}}_{m = 1}^{M} | x, z)} [log p (x, z | u, {ω_{m}}_{m = 1}^{M})] \\ - D_{KL} (q (u, {ω_{m}}_{m = 1}^{M} | x, z) ∥ p (u, {ω_{m}}_{m = 1}^{M}))) . \end{matrix}

(7)

Because

log p (x, z)

is a constant that is independent of the inference process, the minimization process in Equation (6) is equivalent to the maximization process in Equation (8):

\begin{matrix} \underset{q (u, {ω_{m}}_{m = 1}^{M} | x, z) \in Q}{\arg \max} E_{q (u, {ω_{m}}_{m = 1}^{M} | x, z)} [log p (x, z | u, {ω_{m}}_{m = 1}^{M})] \\ - D_{KL} (q (u, {ω_{m}}_{m = 1}^{M} | x, z) ∥ p (u, {ω_{m}}_{m = 1}^{M})) . \end{matrix}

(8)

We denote the first and second terms in Equation (8) as

J_{rec}

and

J_{pri}

, respectively. Adhering to the mean-field principle, we assume the variational posterior as Equation (9):

\begin{matrix} q (u, {ω_{m}}_{m = 1}^{M} | x, z) = q (u | x) q ({ω_{m}}_{m = 1}^{M} | x, z) . \end{matrix}

(9)

It should be noted that due to the absence of noisy label information during the prediction phase we assume that

u

is variationally conditioned on

x

to facilitate the prediction process. Because

u

adheres to a normal prior, we also employ a normal distribution to model the variational posterior of

u

:

q (u | x) = Norm (u ∣ H (x; Θ_{mean}^{(q)}), SFP \circ H (x; Θ_{std}^{(q)}))

(10)

where

SFP (\cdot)

is the softplus function, i.e.,

SFP (\cdot) = log (1 + exp (\cdot))

. To prevent the gradient information of the optimization objective from being removed by randomness, we can sample from

q (u | x)

by Equation (11):

\begin{matrix} u = H (x; Θ_{mean}^{(q)}) + SFP \circ H (x; Θ_{std}^{(q)}) \cdot ϵ : = R_{Norm} (ϵ), ϵ \sim Norm (ϵ ∣ 0, diag (1)) . \end{matrix}

(11)

Similarly, we employ a Dirichlet distribution to model the variational posterior of

{ω_{m}}_{m = 1}^{M}

, as

ω

adheres to a Dirichlet prior:

q ({ω_{m}}_{m = 1}^{M} | x, z) = \prod_{m = 1}^{M} Diri (ω_{m} ∣ SFP \circ H ([x, z]; Θ_{ω_{m}}^{(q)})) .

(12)

As suggested by [50], the gradient-preserved sampling for

ω

can be approximately formalized as follows:

\begin{matrix} ω_{m} = {[\frac{{(δ_{t} ξ_{m t} Γ (ξ_{m t}))}^{\frac{1}{ξ_{m t}}}}{\sum_{i = 1}^{M} {(δ_{t} ξ_{m i} Γ (ξ_{m i}))}^{\frac{1}{ξ_{m i}}}}]}_{t = 1}^{M} : = R_{Diri} (δ), δ \sim Unif (δ ∣ 0, 1) \end{matrix}

(13)

where

ξ_{m} = SFP \circ H ([x, z]; Θ_{ω_{m}}^{(q)})

. According to the decomposition in Equation (9), the first term in Equation (8), i.e.,

J_{rec}

, can be estimated by Monte Carlo sampling:

\begin{matrix} J_{rec} & = E_{q (u, {ω_{m}}_{m = 1}^{M} | x, z)} [log p (x, z | u, {ω_{m}}_{m = 1}^{M})] \\ ≊ \frac{1}{K} \sum_{k = 1}^{K} log p (x, z | R_{Norm} (ϵ^{(k)}), {R_{Diri} (δ_{m}^{(k)})}_{m = 1}^{M}) \end{matrix}

(14)

where

ϵ^{(k)} \sim Norm (ϵ ∣ 0, diag (1))

and

δ_{m}^{(k)} \sim Unif (δ ∣ 0, 1)

. Intuitively, Equation (14) measures the quality of reconstructing the observations of features and labels from the variational posterior of latent variables, while maximizing Equation (14) encourages the latent variables to better explain the observations of features and noisy label distributions. The second term in Equation (8), i.e.,

J_{pri}

, can be transformed as follows:

\begin{matrix} J_{pri} = & D_{KL} (q (u, {ω_{m}}_{m = 1}^{M} | x, z) ∥ p (u, {ω_{m}}_{m = 1}^{M})) \\ = & D_{KL} (q (u | x) ∥ p (u)) + \sum_{m = 1}^{M} D_{KL} (q (ω_{m} | x, z) ∥ p (ω_{m})) \\ = & D_{KL} (Norm (u ∣ H (x; Θ_{mean}^{(q)}), SFP \circ H (x; Θ_{std}^{(q)})) ∥ Norm (u_{m} ∣ 0, diag (1))) + \\ \sum_{m = 1}^{M} D_{KL} (Diri (ω_{m} ∣ SFP \circ H ([x, z]; Θ_{ω_{m}}^{(q)})) ∥ Diri (ω_{m} ∣ {[α I (t = m) + 1]}_{t = 1}^{M})) . \end{matrix}

(15)

Intuitively, minimizing Equation (15) encourages the posteriors of the label distribution and confusion vectors to approach their priors. Computationally, Equation (15) entails the KL divergence between normal distributions as well as the KL divergence between Dirichlet distributions, both of which can be derived analytically. Therefore, by combining Equations (14) and (15), the final optimization objective can be summarized as follows:

\begin{matrix} maximize \frac{1}{N} \frac{1}{K} \sum_{k = 1}^{K} \sum_{n = 1}^{N} log Norm (x^{(n)} ∣ H (R_{Norm} (ϵ^{(k)}); Θ_{mean}^{(p)}), diag (λ^{- 1} 1)) \\ + log Diri (z^{(n)} ∣ {[〈 R_{Diri} (δ_{m}^{(k)}), exp (R_{Norm} (ϵ^{(k)})) 〉]}_{m = 1}^{M}) \\ - D_{KL} (Norm (u ∣ H (x^{(n)}; Θ_{mean}^{(q)}), SFP \circ H (x^{(n)}; Θ_{std}^{(q)})) ∥ Norm (u ∣ 0, diag (1))) \\ - \sum_{m = 1}^{M} D_{KL} (Diri (ω_{m} ∣ SFP \circ H ([x^{(n)}, z^{(n)}]; Θ_{ω_{m}}^{(q)})) ∥ Diri (ω_{m} ∣ {[α I (t = m) + 1]}_{t = 1}^{M})) . \end{matrix}

(16)

5. Experiments

5.1. Datasets and Evaluation Metrics

The datasets used in this paper are shown in Table 2. The SJAFFE [51] dataset comprises 213 facial images, each annotated by 60 experts who rated six basic emotions (happiness, sadness, surprise, fear, anger, and disgust) on a five-point intensity scale. The emotional intensity for each image is represented by the average score across all raters, which is subsequently normalized to construct the emotion label distribution. Similarly, the SBU-3DFE [52] dataset contains 2500 facial expression images, with each image evaluated by 23 experts following the same rating protocol as SJAFFE to generate corresponding label distributions. The Yeast datasets [1] (i.e. the datasets with IDs ranging from 3 to 8) encompass empirical data gathered from ten distinct biological experiments conducted on budding yeast. Each dataset encompasses 2465 yeast genes, with each gene represented by a 24-dimensional vector. The labels within each dataset denote the discrete time points during the biological experiment. The label distribution represents the expression levels of yeast genes at each time point.

We use six common LDL evaluation metrics: Cheb (Chebyshev distance), Clark (Clark distance), Canber (Canberra metric), KL (Kullback–Leibler divergence), Cosine (cosine coefficient), and Intersec (intersection similarity).

\begin{matrix} Cheb ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} max {| z_{n}^{ℓ_{m}} - {\hat{z}}_{n}^{ℓ_{m}} {|}}_{m = 1}^{M} \\ Clark ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} \sqrt{\sum_{m = 1}^{M} \frac{{(z_{n}^{ℓ_{m}} - {\hat{z}}_{n}^{ℓ_{m}})}^{2}}{{(z_{n}^{ℓ_{m}} + {\hat{z}}_{n}^{ℓ_{m}})}^{2}}} \\ Canber ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} \sum_{m = 1}^{M} \frac{| z_{n}^{ℓ_{m}} - {\hat{z}}_{n}^{ℓ_{m}} |}{z_{n}^{ℓ_{m}} + {\hat{z}}_{n}^{ℓ_{m}}} \\ KL ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} \sum_{m = 1}^{M} z_{n}^{ℓ_{m}} log \frac{z_{n}^{ℓ_{m}}}{{\hat{z}}_{n}^{ℓ_{m}}} \\ Cosine ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} \frac{〈 z_{n}, {\hat{z}}_{n} 〉}{{∥ z_{n} ∥}_{2} \cdot {∥ {\hat{z}}_{n} ∥}_{2}} \\ Intersec ({z_{n}}_{n = 1}^{N}, {{\hat{z}}_{n}}_{n = 1}^{N}) & = \frac{1}{N} \sum_{n = 1}^{N} \sum_{m = 1}^{M} min {{\hat{z}}_{n}^{ℓ_{m}}, z_{n}^{ℓ_{m}}} \end{matrix}

(17)

Lower values of distance-based measures (i.e., Cheb, Clark, Canber, KL) and higher value of similarity-based measures (i.e., Cosine and Intersec) represent better performance, which are denoted by ↓ and ↑, respectively.

5.2. Experimental Procedure

In this section, we illustrate the methodology employed to assess the efficacy of our approach. Overall, we introduce confusion-based noise into the label distribution of the noise-free training set, then utilize the noise-free test set to evaluate the performance of comparison algorithms trained on the training set with noisy label distributions. Specifically, given a noise-free LDL dataset, we randomly split the dataset into two chunks (70% for training and 30% for testing). We then obtain the noisy label distributions

\tilde{z}

by adding noise to the label distributions of the training instances according to Equation (18):

\tilde{z} = {[〈 π_{m}, z 〉]}_{m = 1}^{M}, {π_{m}}_{m = 1}^{M} \sim \prod_{m = 1}^{M} Diri (π ∣ {[α^{*} I (t = m) + 1]}_{t = 1}^{M})

(18)

where the

\tilde{α}

parameter controls the extent of noiselessness of the dataset. We then train LDL models on the noisy training set and evaluate the prediction performance on the noise-free test set. We repeat the above process ten times and record the mean and standard deviation.

5.3. Comparison Algorithms

We compare our proposed algorithms with seven existing LDL algorithms: AAkNN [1], SABFGS [1], BD-LDL [53], Duo-LDL [54], LDL-LRR [18], LDL-LDM [17], and LDL-DPA [19]. The hyperparameter settings for each algorithm adhere to the recommendations provided in their respective papers. The hyperparameter k in AAkNN is set to 5. The hyperparameters

λ_{1}

and

λ_{2}

in BD-LDL are selected among

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

. The hyperparameters

λ

and

β

in LDL-LRR are selected among

{10^{- 6}, 10^{- 5}, \dots, 10^{- 1}}

and

{10^{- 3}, 10^{- 2}, \dots, 10^{2}}

, respectively. The hyperparameters

λ_{1}

,

λ_{2}

, and

λ_{3}

in LDL-LDM are selected among

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

, while the hyperparameter g in LDL-LDM is selected among

{1, 2, \dots, 14}

. The hyperparameters

λ_{1}

,

λ_{2}

, and

λ_{3}

in LDL-DPA are selected among

{10^{- 9}, 10^{- 8}, \dots, 10^{0}}

,

{10^{- 7}, 10^{- 6}, \dots, 10^{- 2}}

, and

{10^{- 4}, 10^{- 3}, \dots, 10^{1}}

, respectively. Finally, the hyperparameters

α

and

λ

of our proposed algorithm are set as 6 and

10^{- 1}

, respectively.

5.4. Results and Discussions

The main experimental results are presented in Table 3, Table 4, Table 5 and Table 6, where Table 3 and Table 4 display the algorithm performance conducted under

α^{*} = 4

(low noise), and Table 5 and Table 6 display the algorithm performance conducted under

α^{*} = 8

(high noise). Each experimental result within these tables is formalized as “

{}^{r / †}{mean}_{\pm std}

”, where “mean” represents the mean value of the algorithm’s performance, “std” denotes the standard deviation of the algorithm performance, “r” denotes the rank of the corresponding algorithm among all comparison algorithms, and “†” denotes the statistical significance under the pairwise two-tailed t-test at 0.05 significance level; in addition, “

† = \circ

” denotes that the corresponding algorithm is significantly inferior to our algorithm, “

† = •

” denotes that the corresponding algorithm is significantly superior to our algorithm, and “

† = *

” denotes that there is no significant difference between the corresponding algorithm and our proposed algorithm. The best performance is highlighted by boldface. In these experiments, we systematically evaluate the performance of different weakly supervised learning methods when the data quality (

α^{*}

) varies. When the data are noisy (

α^{*} = 4

), our method demonstrates a significant advantage on the SJAFFE and SBU-3DFE datasets; in particular, it is about 10–15% ahead of the second-best algorithm in terms of the distance metrics (Cheb, Clark, and Canber), which suggests that our method is more robust to the noisy data. While the performance of all methods improves as the data quality is improved to

α^{*} = 8

, our method still maintains the leading position, especially in terms of the KL and Cosine metrics on the Yeast datasets, for which the standard deviation is significantly smaller than that of the other methods (e.g., the standard deviation of the KL metrics for Yeast-cold is only

0.0008

). These results indicate that the proposed method has stable prediction ability under different data distributions. Notably, LDL-LDM slightly outperforms our method for the Cheb metric (

0.1425

vs.

0.1428

) on the SBU-3DFE dataset with high-quality data (

α^{*} = 8

), which may stem from this algorithm’s advantage in learning clean labels. On the other hand, Duo-LDL and BD-LDL perform relatively poorly in all types of settings. This is particularly the case in noisy data environments, where the performance fluctuates drastically (e.g., the standard deviation of the Clark metric for Duo-LDL on SJAFFE at

α^{*} = 4

reaches

0.3578

), which reflects the sensitivity of these methods to data quality. Overall, the experimental results validate the superiority of our method in both robustness and accuracy.

5.5. Further Analysis

In this section, we provide an empirical demonstration and discussion on the characteristics of our algorithm, which is beneficial for its practical application.

5.5.1. Hyperparameter Analysis

Here, we investigate the impact of the hyperparameters

λ

and

α

on the performance of our algorithm. Specifically, we define the value range of

λ

and

α

as

(λ, α) \in A

, where

A = {10^{- 4}, 10^{- 3}, \dots, 10} \times {2, 4, 6, \dots, 12}

. As described in Section 5.2, we conduct the experimental procedure with

α^{*} = 8

for each pair of hyperparameter values within

A

, obtaining the performance of the algorithm under each pair of hyperparameter values within

A

. The experimental results are visualized in Figure 2. In the heatmap, the x-axis and y-axis represent the values of

λ

and

α

, respectively, while the color indicates the performance quantified by the cosine coefficient or Kullback–Leibler divergence. The box plot shows the marginal impact of the hyperparameters

λ

or

α

on the algorithm’s performance. It can be observed that the algorithm generally performs better when

λ

is small and that the impact of

α

on performance is relatively minor. Therefore, in the actual implementation of the algorithm,

λ

can be typically set to a small value.

5.5.2. Convergence

Here, we demonstrate the convergence of the inference process of our proposed algorithm. As illustrated in Figure 3, the proposed algorithm demonstrates steady convergence behavior across multiple datasets. The curves are differentiated by color for each dataset and transparency for varying

α^{*}

values. The results indicate that our algorithm rapidly approaches the optimal objective value within the first 100 iterations. This initial phase exhibits a steep descent, suggesting efficient gradient updates or search direction alignment. Subsequently, the optimization stabilizes, with marginal improvements in later iterations, indicating convergence to a near-optimal solution. The consistency of this trend across most datasets highlights the robustness of our algorithm to different data distributions.

6. Conclusions

This paper identifies a critical limitation of noise modeling in existing label distribution learning methods, i.e., the oversimplified assumption that noisy label distributions can be directly modeled as a mixture of the true label distribution and a random noise. To address this gap, we rigorously investigate the underlying generation mechanisms of noisy label distribution and propose an assumption that noisy label distributions primarily stem from semantic confusion among labels (i.e., inter-label semantic ambiguity). Grounded in this assumption, we develop a generative label distribution learning framework that explicitly accounts for label-wise semantic correlations and confusion patterns. To comprehensively validate our proposal, we conduct extensive experiments under varying noise levels across eight real-world benchmark datasets. The results demonstrate that our algorithm effectively models noisy label distributions originating from label semantic confusion, consistently achieving state-of-the-art performance in nearly all scenarios.

Author Contributions

Conceptualization, X.L., C.M., H.Z. and Y.L.; methodology, B.X., T.Y. and Y.L.; software, B.X. and T.Y.; validation, B.X. and T.Y.; formal analysis, B.X. and T.Y.; investigation, B.X. and T.Y.; resources, X.L., C.M. and H.Z.; data curation, B.X. and T.Y.; writing—original draft preparation, B.X. and T.Y.; writing—review and editing, X.L., C.M., H.Z. and Y.G.; visualization, B.X. and T.Y.; supervision, X.L.; project administration, X.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of China Southern Power Grid Company Limited, grant number 032000KC23120050/GDKJXM20231537.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from PALM lab and are available at https://palm.seu.edu.cn/xgeng/LDL/download.htm (accessed on 1 July 2025) with the permission of PALM lab.

Conflicts of Interest

Authors Xinhai Li, Chenxu Meng, and Heng Zhou were employed by the Zhongshan Power Supply Bureau, China Southern Power Grid Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from the Science and Technology Project of China Southern Power Grid Company Limited. The funder was not involved in the study design, in the collection, analysis, and interpretation of data, in the writing of this article, or in the decision to submit it for publication.

References

Geng, X. Label Distribution Learning. IEEE Trans. Knowl. Data Eng. 2016, 28, 1734–1748. [Google Scholar] [CrossRef]
Geng, X.; Yin, C.; Zhou, Z.H. Facial Age Estimation by Learning from Label Distributions. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2401–2412. [Google Scholar] [CrossRef] [PubMed]
Shen, W.; Guo, Y.; Wang, Y.; Zhao, K.; Wang, B.; Yuille, A. Deep Differentiable Random Forests for Age Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 404–419. [Google Scholar] [CrossRef]
Shen, W.; Zhao, K.; Guo, Y.; Yuille, A. Label Distribution Learning Forests. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Hou, P.; Geng, X.; Huo, Z.W.; Lv, J. Semi-Supervised Adaptive Label Distribution Learning for Facial Age Estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2015–2021. [Google Scholar]
Gao, B.B.; Zhou, H.Y.; Wu, J.; Geng, X. Age Estimation Using Expectation of Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 712–718. [Google Scholar]
He, T.; Jin, X. Image Emotion Distribution Learning with Graph Convolutional Networks. In Proceedings of the International Conference on Multimedia Retrieval, Ottawa, ON, Canada, 10–13 June 2019; pp. 382–390. [Google Scholar]
Yang, J.; Sun, M.; Sun, X. Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 224–230. [Google Scholar]
Jia, X.; Zheng, X.; Li, W.; Zhang, C.; Li, Z. Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9841–9850. [Google Scholar]
Peng, K.C.; Chen, T.; Sadovnik, A.; Gallagher, A. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 860–868. [Google Scholar]
Machajdik, J.; Hanbury, A. Affective Image Classification Using Features Inspired by Psychology and Art Theory. In Proceedings of the ACM International Conference on Multimedia, Florence, Italy, 25–29 October 2010; pp. 83–92. [Google Scholar]
Ren, Y.; Geng, X. Sense Beauty by Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2648–2654. [Google Scholar]
Geng, X.; Hou, P. Pre-Release Prediction of Crowd Opinion on Movies by Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3511–3517. [Google Scholar]
Liu, S.; Huang, E.; Zhou, Z.; Xu, Y.; Kui, X.; Lei, T.; Meng, H. Lightweight Facial Attractiveness Prediction Using Dual Label Distribution. IEEE Trans. Cogn. Dev. Syst. 2025. early access. [Google Scholar] [CrossRef]
Tang, Y.; Ni, Z.; Zhou, J.; Zhang, D.; Lu, J.; Wu, Y.; Zhou, J. Uncertainty-Aware Score Distribution Learning for Action Quality Assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9836–9845. [Google Scholar]
Xing, C.; Geng, X.; Xue, H. Logistic Boosting Regression for Label Distribution Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4489–4497. [Google Scholar]
Wang, J.; Geng, X. Label Distribution Learning by Exploiting Label Distribution Manifold. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 839–852. [Google Scholar] [CrossRef] [PubMed]
Jia, X.; Shen, X.; Li, W.; Lu, Y.; Zhu, J. Label Distribution Learning by Maintaining Label Ranking Relation. IEEE Trans. Knowl. Data Eng. 2023, 35, 1695–1707. [Google Scholar] [CrossRef]
Jia, X.; Qin, T.; Lu, Y.; Li, W. Adaptive Weighted Ranking-Oriented Label Distribution Learning. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11302–11316. [Google Scholar] [CrossRef]
Kou, Z.; Wang, J.; Jia, Y.; Liu, B.; Geng, X. Instance-Dependent Inaccurate Label Distribution Learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 1425–1437. [Google Scholar] [CrossRef]
Guiasu, S.; Shenitzer, A. The Principle of Maximum Entropy. Math. Intell. 1985, 7, 42–48. [Google Scholar] [CrossRef]
Zhai, Y.; Dai, J. Geometric Mean Metric Learning for Label Distribution Learning. In Proceedings of the International Conference on Neural Information Processing, Sydney, NSW, Australia, 12–15 December 2019; pp. 260–272. [Google Scholar]
Xu, S.; Ju, H.; Shang, L.; Pedrycz, W.; Yang, X.; Li, C. Label Distribution Learning: A Local Collaborative Mechanism. Int. J. Approx. Reason. 2020, 121, 59–84. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Additive Logistic Regression: A Statistical View of Boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
Zhai, Y.; Dai, J.; Shi, H. Label Distribution Learning Based on Ensemble Neural Networks. In Proceedings of the International Conference on Neural Information Processing, Siem Reap, Cambodia, 13–16 December 2018; pp. 593–602. [Google Scholar]
Chen, M.; Wang, X.; Feng, B.; Liu, W. Structured Random Forest for Label Distribution Learning. Neurocomputing 2018, 320, 171–182. [Google Scholar] [CrossRef]
Jia, X.; Li, W.; Liu, J.; Zhang, Y. Label Distribution Learning by Exploiting Label Correlations. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3310–3317. [Google Scholar]
Zhao, P.; Zhou, Z.H. Label Distribution Learning by Optimal Transport. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4506–4513. [Google Scholar]
Zheng, X.; Jia, X.; Li, W. Label Distribution Learning by Exploiting Sample Correlations Locally. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4556–4563. [Google Scholar]
Ren, T.; Jia, X.; Li, W.; Zhao, S. Label Distribution Learning with Label Correlations via Low-Rank Approximation. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3325–3331. [Google Scholar]
Ren, T.; Jia, X.; Li, W.; Chen, L.; Li, Z. Label Distribution Learning with Label-Specific Features. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3318–3324. [Google Scholar]
Peng, C.L.; Tao, A.; Geng, X. Label Embedding Based on Multi-Scale Locality Preservation. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2623–2629. [Google Scholar]
Wang, K.; Geng, X. Binary Coding Based Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2783–2789. [Google Scholar]
Wang, K.; Geng, X. Discrete Binary Coding Based Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3733–3739. [Google Scholar]
Xu, M.; Zhou, Z.H. Incomplete Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3175–3181. [Google Scholar]
Zeng, X.Q.; Chen, S.F.; Xiang, R.; Wu, S.X.; Wan, Z.Y. Filling Missing Values by Local Reconstruction for Incomplete Label Distribution Learning. Int. J. Wirel. Mob. Comput. 2019, 16, 314–321. [Google Scholar] [CrossRef]
Xu, C.; Gu, S.; Tao, H.; Hou, C. Fragmentary Label Distribution Learning via Graph Regularized Maximum Entropy Criteria. Pattern Recognit. Lett. 2021, 145, 147–156. [Google Scholar] [CrossRef]
Xu, S.; Shang, L.; Shen, F.; Yang, X.; Pedrycz, W. Incomplete Label Distribution Learning via Label Correlation Decomposition. Inf. Fusion 2025, 113, 102600. [Google Scholar] [CrossRef]
Jin, Y.; Gao, R.; He, Y.; Zhu, X. GLDL: Graph Label Distribution Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 12965–12974. [Google Scholar]
Lu, Y.; Li, W.; Liu, D.; Li, H.; Jia, X. Adaptive-Grained Label Distribution Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 19161–19169. [Google Scholar]
He, L.; Lu, Y.; Li, W.; Jia, X. Generative Calibration of Inaccurate Annotation for Label Distribution Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 12394–12401. [Google Scholar]
Xu, N.; Tao, A.; Geng, X. Label Enhancement for Label Distribution Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 1632–1643. [Google Scholar]
Xu, N.; Liu, Y.P.; Geng, X. Label Enhancement for Label Distribution Learning. IEEE Trans. Knowl. Data Eng. 2021, 33, 1632–1643. [Google Scholar] [CrossRef]
Gao, Y.; Zhang, Y.; Geng, X. Label Enhancement for Label Distribution Learning via Prior Knowledge. In Proceedings of the International Joint Conference on Artificial Intelligence, Yokohama, Japan, 11–17 July 2020; pp. 3223–3229. [Google Scholar]
Lu, Y.; Jia, X. Predicting Label Distribution from Ternary Labels. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; pp. 70431–70452. [Google Scholar]
Lu, Y.; Jia, X. Predicting Label Distribution from Multi-Label Ranking. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 36931–36943. [Google Scholar]
Lu, Y.; Li, W.; Li, H.; Jia, X. Predicting Label Distribution from Tie-Allowed Multi-Label Ranking. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15364–15379. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Li, W.; Li, H.; Jia, X. Ranking-Preserved Generative Label Enhancement. Mach. Learn. 2023, 112, 4693–4721. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
Joo, W.; Lee, W.; Park, S.; Moon, I.C. Dirichlet Variational Autoencoder. Pattern Recognit. 2020, 107, 107514. [Google Scholar] [CrossRef]
Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding Facial Expressions with Gabor Wavelets. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar]
Yin, L.; Wei, X.; Sun, Y.; Wang, J.; Rosato, M.J. A 3D Facial Expression Database For Facial Behavior Research. In Proceedings of the International Conference on Automatic Face and Gesture Recognition, Southampton, UK, 10–12 April 2006; pp. 211–216. [Google Scholar]
Liu, X.; Zhu, J.; Zheng, Q.; Li, Z.; Liu, R.; Wang, J. Bidirectional Loss Function for Label Enhancement and Distribution Learning. Knowl.-Based Syst. 2021, 213, 106690. [Google Scholar] [CrossRef]
Zychowski, A.; Mandziuk, J. Duo-LDL Method for Label Distribution Learning Based on Pairwise Class Dependencies. Appl. Soft Comput. 2021, 110, 107585. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of our proposed model. The latent variables

u

and

ω

correspond to the true label distribution and label confusion vector, respectively. The observed variables

x

and

z

correspond to the feature vector and noisy label distribution provided in the dataset, respectively.

Figure 1. Schematic diagram of our proposed model. The latent variables

u

and

ω

correspond to the true label distribution and label confusion vector, respectively. The observed variables

x

and

z

correspond to the feature vector and noisy label distribution provided in the dataset, respectively.

Figure 2. Performance with different values of hyperparameters on the SJAFFE dataset with

α^{*} = 8

.

Figure 2. Performance with different values of hyperparameters on the SJAFFE dataset with

α^{*} = 8

.

Figure 3. Convergence process of our proposed algorithm. The color of the curve corresponds to the dataset, while the transparency of the curve corresponds to the value of

α^{*}

.

Figure 3. Convergence process of our proposed algorithm. The color of the curve corresponds to the dataset, while the transparency of the curve corresponds to the value of

α^{*}

.

Table 1. Commonly used notation in this paper.

Notation	Description
$〈 v, u 〉$	Inner product of vectors $v$ and $u$
$X^{D}$	D-dimensional feature space, i.e., $R^{D}$
$Δ^{M}$	M-dimensional distribution space, i.e., ${v \in R_{+}^{M} ∣ 〈 v, 1 〉 = 1}$
$x \in X^{D}$	Random vector of feature variables
$z \in Δ^{M}$	Label distribution
$H (v; Θ)$	Multi-layer perceptron with $v$ as input and $Θ$ as parameters
${∥ v ∥}_{2}$	$L_{2}$ norm of a vector $v$
$f \circ h$	Composition of functions h and f, i.e., $f \circ h (\cdot) : = f (h (\cdot))$
$Norm$	Normal distribution
$Diri$	Dirichlet distribution

Table 2. Dataset statistics.

ID	Dataset	# Instances	# Features	# Labels
1	SJAFFE [51]	213	243	6
2	SBU-3DFE [52]	2500	243	6
3	Yeast-alpha [1]	2465	24	18
4	Yeast-cdc [1]	2465	24	15
5	Yeast-cold [1]	2465	24	14
6	Yeast-diau [1]	2465	24	7
7	Yeast-dtt [1]	2465	24	6
8	Yeast-elu [1]	2465	24	6

Table 3. Prediction performance on the first four datasets with

α^{*} = 4

.

Table 3. Prediction performance on the first four datasets with

α^{*} = 4

.

	Cheb (↓)	Clark (↓)	Canber (↓)	KL (↓)	Cosine (↑)	Intersec (↑)
SJAFFE
Ours	${}^{1 /*}{0.1352}_{\pm 0.0151}$	${}^{1 /*}{0.4605}_{\pm 0.0462}$	${}^{1 /*}{0.9507}_{\pm 0.0895}$	${}^{1 /*}{0.1016}_{\pm 0.0249}$	${}^{1 /*}{0.9051}_{\pm 0.0223}$	${}^{1 /*}{0.8302}_{\pm 0.0200}$
LDL-DPA	${}^{4 /•}{0.1568}_{\pm 0.0093}$	${}^{5 /•}{0.5436}_{\pm 0.0268}$	${}^{6 /•}{1.1121}_{\pm 0.0555}$	${}^{3 /•}{0.1420}_{\pm 0.0160}$	${}^{4 /•}{0.8747}_{\pm 0.0127}$	${}^{6 /•}{0.8005}_{\pm 0.0105}$
LDL-LDM	${}^{6 /•}{0.1592}_{\pm 0.0095}$	${}^{7 /•}{0.5544}_{\pm 0.0287}$	${}^{7 /•}{1.1346}_{\pm 0.0611}$	${}^{7 /•}{0.1480}_{\pm 0.0175}$	${}^{6 /•}{0.8707}_{\pm 0.0133}$	${}^{7 /•}{0.7965}_{\pm 0.0111}$
LDL-LRR	${}^{4 /•}{0.1568}_{\pm 0.0096}$	${}^{3 /•}{0.5427}_{\pm 0.0270}$	${}^{3 /•}{1.1092}_{\pm 0.0571}$	${}^{4 /•}{0.1421}_{\pm 0.0160}$	${}^{5 /•}{0.8745}_{\pm 0.0130}$	${}^{3 /•}{0.8006}_{\pm 0.0109}$
Duo-LDL	${}^{8 /•}{0.2003}_{\pm 0.0432}$	${}^{8 /•}{0.9613}_{\pm 0.3578}$	${}^{8 /•}{1.8097}_{\pm 0.6623}$	${}^{8 /•}{1.7878}_{\pm 1.5211}$	${}^{8 /•}{0.8046}_{\pm 0.0675}$	${}^{8 /•}{0.7261}_{\pm 0.0765}$
BD-LDL	${}^{2 /•}{0.1539}_{\pm 0.0096}$	${}^{6 /•}{0.5480}_{\pm 0.0286}$	${}^{4 /•}{1.1101}_{\pm 0.0611}$	${}^{5 /•}{0.1444}_{\pm 0.0156}$	${}^{2 /•}{0.8753}_{\pm 0.0121}$	${}^{2 /•}{0.8018}_{\pm 0.0114}$
SABFGS	${}^{3 /•}{0.1565}_{\pm 0.0095}$	${}^{4 /•}{0.5432}_{\pm 0.0275}$	${}^{5 /•}{1.1113}_{\pm 0.0570}$	${}^{2 /•}{0.1419}_{\pm 0.0163}$	${}^{3 /•}{0.8750}_{\pm 0.0131}$	${}^{3 /•}{0.8006}_{\pm 0.0111}$
AAkNN	${}^{7 /•}{0.1594}_{\pm 0.0109}$	${}^{2 /•}{0.5384}_{\pm 0.0303}$	${}^{2 /•}{1.0891}_{\pm 0.0626}$	${}^{6 /•}{0.1470}_{\pm 0.0180}$	${}^{7 /•}{0.8701}_{\pm 0.0144}$	${}^{3 /•}{0.8006}_{\pm 0.0126}$
SBU-3DFE
Ours	${}^{1 /*}{0.1550}_{\pm 0.0083}$	${}^{1 /*}{0.4858}_{\pm 0.0288}$	${}^{1 /*}{0.9951}_{\pm 0.0509}$	${}^{1 /*}{0.1160}_{\pm 0.0139}$	${}^{1 /*}{0.8902}_{\pm 0.0122}$	${}^{1 /*}{0.8169}_{\pm 0.0099}$
LDL-DPA	${}^{3 /•}{0.1668}_{\pm 0.0018}$	${}^{3 /•}{0.5802}_{\pm 0.0031}$	${}^{3 /•}{1.1726}_{\pm 0.0063}$	${}^{3 /•}{0.1532}_{\pm 0.0026}$	${}^{3 /•}{0.8647}_{\pm 0.0021}$	${}^{4 /•}{0.7889}_{\pm 0.0015}$
LDL-LDM	${}^{2 /•}{0.1641}_{\pm 0.0021}$	${}^{2 /•}{0.5661}_{\pm 0.0043}$	${}^{2 /•}{1.1468}_{\pm 0.0080}$	${}^{2 /•}{0.1472}_{\pm 0.0029}$	${}^{2 /•}{0.8690}_{\pm 0.0024}$	${}^{2 /•}{0.7930}_{\pm 0.0019}$
LDL-LRR	${}^{4 /•}{0.1670}_{\pm 0.0023}$	${}^{4 /•}{0.5805}_{\pm 0.0033}$	${}^{4 /•}{1.1731}_{\pm 0.0070}$	${}^{5 /•}{0.1535}_{\pm 0.0028}$	${}^{4 /•}{0.8646}_{\pm 0.0024}$	${}^{4 /•}{0.7889}_{\pm 0.0019}$
Duo-LDL	${}^{8 /•}{0.1894}_{\pm 0.0185}$	${}^{8 /•}{0.7377}_{\pm 0.2364}$	${}^{8 /•}{1.4694}_{\pm 0.3941}$	${}^{8 /*}{0.7575}_{\pm 1.2036}$	${}^{8 /•}{0.8269}_{\pm 0.0315}$	${}^{8 /•}{0.7500}_{\pm 0.0378}$
BD-LDL	${}^{7 /•}{0.1723}_{\pm 0.0020}$	${}^{7 /•}{0.6014}_{\pm 0.0043}$	${}^{7 /•}{1.2334}_{\pm 0.0083}$	${}^{7 /•}{0.1655}_{\pm 0.0038}$	${}^{7 /•}{0.8562}_{\pm 0.0024}$	${}^{7 /•}{0.7802}_{\pm 0.0018}$
SABFGS	${}^{4 /•}{0.1670}_{\pm 0.0022}$	${}^{4 /•}{0.5805}_{\pm 0.0034}$	${}^{4 /•}{1.1731}_{\pm 0.0068}$	${}^{4 /•}{0.1534}_{\pm 0.0026}$	${}^{4 /•}{0.8646}_{\pm 0.0023}$	${}^{3 /•}{0.7890}_{\pm 0.0018}$
AAkNN	${}^{6 /•}{0.1675}_{\pm 0.0023}$	${}^{6 /•}{0.5819}_{\pm 0.0038}$	${}^{6 /•}{1.1823}_{\pm 0.0092}$	${}^{6 /•}{0.1560}_{\pm 0.0034}$	${}^{6 /•}{0.8638}_{\pm 0.0029}$	${}^{6 /•}{0.7878}_{\pm 0.0020}$
Yeast-alpha
Ours	${}^{1 /*}{0.0171}_{\pm 0.0006}$	${}^{1 /*}{0.2915}_{\pm 0.0108}$	${}^{1 /*}{0.9806}_{\pm 0.0395}$	${}^{1 /*}{0.0098}_{\pm 0.0006}$	${}^{1 /*}{0.9904}_{\pm 0.0007}$	${}^{1 /*}{0.9456}_{\pm 0.0022}$
LDL-DPA	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{4 /•}{0.4181}_{\pm 0.0021}$	${}^{4 /•}{1.4358}_{\pm 0.0084}$	${}^{2 /•}{0.0197}_{\pm 0.0005}$	${}^{2 /•}{0.9810}_{\pm 0.0000}$	${}^{3 /•}{0.9206}_{\pm 0.0005}$
LDL-LDM	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{2 /•}{0.4176}_{\pm 0.0025}$	${}^{2 /•}{1.4343}_{\pm 0.0085}$	${}^{2 /•}{0.0197}_{\pm 0.0005}$	${}^{2 /•}{0.9810}_{\pm 0.0000}$	${}^{2 /•}{0.9207}_{\pm 0.0005}$
LDL-LRR	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{3 /•}{0.4179}_{\pm 0.0024}$	${}^{3 /•}{1.4357}_{\pm 0.0084}$	${}^{2 /•}{0.0197}_{\pm 0.0005}$	${}^{2 /•}{0.9810}_{\pm 0.0000}$	${}^{3 /•}{0.9206}_{\pm 0.0005}$
Duo-LDL	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{7 /•}{0.4221}_{\pm 0.0026}$	${}^{7 /•}{1.4478}_{\pm 0.0088}$	${}^{7 /•}{0.0200}_{\pm 0.0000}$	${}^{7 /•}{0.9807}_{\pm 0.0005}$	${}^{7 /•}{0.9201}_{\pm 0.0006}$
BD-LDL	${}^{8 /•}{0.0258}_{\pm 0.0004}$	${}^{8 /•}{0.4614}_{\pm 0.0039}$	${}^{8 /•}{1.5807}_{\pm 0.0127}$	${}^{8 /•}{0.0243}_{\pm 0.0005}$	${}^{8 /•}{0.9766}_{\pm 0.0005}$	${}^{8 /•}{0.9125}_{\pm 0.0007}$
SABFGS	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{4 /•}{0.4181}_{\pm 0.0021}$	${}^{5 /•}{1.4360}_{\pm 0.0084}$	${}^{2 /•}{0.0197}_{\pm 0.0005}$	${}^{2 /•}{0.9810}_{\pm 0.0000}$	${}^{3 /•}{0.9206}_{\pm 0.0005}$
AAkNN	${}^{2 /•}{0.0230}_{\pm 0.0000}$	${}^{6 /•}{0.4182}_{\pm 0.0023}$	${}^{6 /•}{1.4366}_{\pm 0.0086}$	${}^{2 /•}{0.0197}_{\pm 0.0005}$	${}^{2 /•}{0.9810}_{\pm 0.0000}$	${}^{3 /•}{0.9206}_{\pm 0.0005}$
Yeast-cdc
Ours	${}^{1 /*}{0.0217}_{\pm 0.0007}$	${}^{1 /*}{0.2891}_{\pm 0.0088}$	${}^{1 /*}{0.8857}_{\pm 0.0294}$	${}^{1 /*}{0.0115}_{\pm 0.0007}$	${}^{1 /*}{0.9887}_{\pm 0.0008}$	${}^{1 /*}{0.9417}_{\pm 0.0018}$
LDL-DPA	${}^{4 /•}{0.0285}_{\pm 0.0005}$	${}^{6 /•}{0.4309}_{\pm 0.0019}$	${}^{4 /•}{1.3419}_{\pm 0.0046}$	${}^{2 /•}{0.0245}_{\pm 0.0005}$	${}^{2 /•}{0.9772}_{\pm 0.0004}$	${}^{2 /•}{0.9129}_{\pm 0.0003}$
LDL-LDM	${}^{2 /•}{0.0284}_{\pm 0.0005}$	${}^{2 /•}{0.4307}_{\pm 0.0018}$	${}^{2 /•}{1.3408}_{\pm 0.0043}$	${}^{2 /•}{0.0245}_{\pm 0.0005}$	${}^{2 /•}{0.9772}_{\pm 0.0004}$	${}^{2 /•}{0.9129}_{\pm 0.0003}$
LDL-LRR	${}^{6 /•}{0.0286}_{\pm 0.0005}$	${}^{2 /•}{0.4307}_{\pm 0.0018}$	${}^{3 /•}{1.3416}_{\pm 0.0045}$	${}^{2 /•}{0.0245}_{\pm 0.0005}$	${}^{2 /•}{0.9772}_{\pm 0.0004}$	${}^{2 /•}{0.9129}_{\pm 0.0003}$
Duo-LDL	${}^{7 /•}{0.0290}_{\pm 0.0005}$	${}^{7 /•}{0.4330}_{\pm 0.0090}$	${}^{7 /•}{1.3457}_{\pm 0.0232}$	${}^{7 /•}{0.0248}_{\pm 0.0008}$	${}^{7 /•}{0.9771}_{\pm 0.0007}$	${}^{7 /•}{0.9125}_{\pm 0.0014}$
BD-LDL	${}^{8 /•}{0.0311}_{\pm 0.0003}$	${}^{8 /•}{0.4681}_{\pm 0.0035}$	${}^{8 /•}{1.4550}_{\pm 0.0108}$	${}^{8 /•}{0.0290}_{\pm 0.0005}$	${}^{8 /•}{0.9735}_{\pm 0.0005}$	${}^{8 /•}{0.9054}_{\pm 0.0008}$
SABFGS	${}^{4 /•}{0.0285}_{\pm 0.0005}$	${}^{2 /•}{0.4307}_{\pm 0.0018}$	${}^{4 /•}{1.3419}_{\pm 0.0041}$	${}^{2 /•}{0.0245}_{\pm 0.0005}$	${}^{2 /•}{0.9772}_{\pm 0.0004}$	${}^{2 /•}{0.9129}_{\pm 0.0003}$
AAkNN	${}^{2 /•}{0.0284}_{\pm 0.0005}$	${}^{5 /•}{0.4308}_{\pm 0.0018}$	${}^{6 /•}{1.3420}_{\pm 0.0051}$	${}^{6 /•}{0.0246}_{\pm 0.0005}$	${}^{2 /•}{0.9772}_{\pm 0.0004}$	${}^{2 /•}{0.9129}_{\pm 0.0003}$

* Best performance in each comparison group is boldfaced, with rank and statistical significance indicators shown in the top-left corner of each result.

Table 4. Prediction performance on the last four datasets with

α^{*} = 4

.

Table 4. Prediction performance on the last four datasets with

α^{*} = 4

.

	Cheb (↓)	Clark (↓)	Canber (↓)	KL (↓)	Cosine (↑)	Intersec (↑)
Yeast-cold
Ours	${}^{1 /*}{0.0569}_{\pm 0.0021}$	${}^{1 /*}{0.1560}_{\pm 0.0060}$	${}^{1 /*}{0.2701}_{\pm 0.0117}$	${}^{1 /*}{0.0145}_{\pm 0.0011}$	${}^{1 /*}{0.9862}_{\pm 0.0009}$	${}^{1 /*}{0.9334}_{\pm 0.0031}$
LDL-DPA	${}^{4 /•}{0.0654}_{\pm 0.0008}$	${}^{4 /•}{0.1829}_{\pm 0.0021}$	${}^{5 /•}{0.3243}_{\pm 0.0035}$	${}^{3 /•}{0.0195}_{\pm 0.0005}$	${}^{4 /•}{0.9814}_{\pm 0.0005}$	${}^{4 /•}{0.9198}_{\pm 0.0009}$
LDL-LDM	${}^{2 /•}{0.0611}_{\pm 0.0007}$	${}^{2 /•}{0.1691}_{\pm 0.0023}$	${}^{2 /•}{0.2971}_{\pm 0.0045}$	${}^{2 /•}{0.0167}_{\pm 0.0005}$	${}^{2 /•}{0.9840}_{\pm 0.0005}$	${}^{2 /•}{0.9266}_{\pm 0.0012}$
LDL-LRR	${}^{4 /•}{0.0654}_{\pm 0.0008}$	${}^{3 /•}{0.1827}_{\pm 0.0021}$	${}^{4 /•}{0.3242}_{\pm 0.0038}$	${}^{3 /•}{0.0195}_{\pm 0.0005}$	${}^{3 /•}{0.9815}_{\pm 0.0005}$	${}^{5 /•}{0.9197}_{\pm 0.0011}$
Duo-LDL	${}^{6 /•}{0.0656}_{\pm 0.0010}$	${}^{6 /•}{0.1835}_{\pm 0.0024}$	${}^{6 /•}{0.3251}_{\pm 0.0042}$	${}^{6 /•}{0.0196}_{\pm 0.0005}$	${}^{4 /•}{0.9814}_{\pm 0.0005}$	${}^{6 /•}{0.9195}_{\pm 0.0010}$
BD-LDL	${}^{8 /•}{0.0760}_{\pm 0.0018}$	${}^{8 /•}{0.2126}_{\pm 0.0049}$	${}^{8 /•}{0.3607}_{\pm 0.0073}$	${}^{8 /•}{0.0272}_{\pm 0.0013}$	${}^{8 /•}{0.9765}_{\pm 0.0011}$	${}^{8 /•}{0.9127}_{\pm 0.0018}$
SABFGS	${}^{3 /•}{0.0653}_{\pm 0.0007}$	${}^{4 /•}{0.1829}_{\pm 0.0021}$	${}^{3 /•}{0.3240}_{\pm 0.0032}$	${}^{3 /•}{0.0195}_{\pm 0.0005}$	${}^{4 /•}{0.9814}_{\pm 0.0005}$	${}^{3 /•}{0.9199}_{\pm 0.0009}$
AAkNN	${}^{7 /•}{0.0661}_{\pm 0.0007}$	${}^{7 /•}{0.1839}_{\pm 0.0019}$	${}^{7 /•}{0.3253}_{\pm 0.0037}$	${}^{7 /•}{0.0197}_{\pm 0.0005}$	${}^{7 /•}{0.9812}_{\pm 0.0004}$	${}^{6 /•}{0.9195}_{\pm 0.0008}$
Yeast-diau
Ours	${}^{1 /*}{0.0450}_{\pm 0.0017}$	${}^{1 /*}{0.2378}_{\pm 0.0061}$	${}^{1 /*}{0.5083}_{\pm 0.0119}$	${}^{1 /*}{0.0175}_{\pm 0.0007}$	${}^{1 /*}{0.9836}_{\pm 0.0008}$	${}^{1 /*}{0.9291}_{\pm 0.0015}$
LDL-DPA	${}^{5 /•}{0.0623}_{\pm 0.0005}$	${}^{5 /•}{0.3301}_{\pm 0.0024}$	${}^{5 /•}{0.7049}_{\pm 0.0051}$	${}^{3 /•}{0.0322}_{\pm 0.0006}$	${}^{4 /•}{0.9715}_{\pm 0.0005}$	${}^{3 /•}{0.9028}_{\pm 0.0008}$
LDL-LDM	${}^{2 /•}{0.0616}_{\pm 0.0005}$	${}^{2 /•}{0.3242}_{\pm 0.0025}$	${}^{2 /•}{0.6936}_{\pm 0.0051}$	${}^{2 /•}{0.0311}_{\pm 0.0006}$	${}^{2 /•}{0.9723}_{\pm 0.0005}$	${}^{2 /•}{0.9042}_{\pm 0.0009}$
LDL-LRR	${}^{6 /•}{0.0624}_{\pm 0.0005}$	${}^{5 /•}{0.3301}_{\pm 0.0024}$	${}^{5 /•}{0.7049}_{\pm 0.0054}$	${}^{3 /•}{0.0322}_{\pm 0.0006}$	${}^{4 /•}{0.9715}_{\pm 0.0005}$	${}^{3 /•}{0.9028}_{\pm 0.0008}$
Duo-LDL	${}^{4 /•}{0.0622}_{\pm 0.0010}$	${}^{3 /•}{0.3299}_{\pm 0.0055}$	${}^{3 /•}{0.7039}_{\pm 0.0164}$	${}^{7 /•}{0.0323}_{\pm 0.0009}$	${}^{3 /•}{0.9716}_{\pm 0.0011}$	${}^{3 /•}{0.9028}_{\pm 0.0025}$
BD-LDL	${}^{8 /•}{0.0684}_{\pm 0.0013}$	${}^{8 /•}{0.3903}_{\pm 0.0083}$	${}^{8 /•}{0.8333}_{\pm 0.0158}$	${}^{8 /•}{0.0447}_{\pm 0.0022}$	${}^{8 /•}{0.9641}_{\pm 0.0014}$	${}^{8 /•}{0.8886}_{\pm 0.0020}$
SABFGS	${}^{6 /•}{0.0624}_{\pm 0.0005}$	${}^{3 /•}{0.3299}_{\pm 0.0024}$	${}^{4 /•}{0.7048}_{\pm 0.0054}$	${}^{3 /•}{0.0322}_{\pm 0.0006}$	${}^{4 /•}{0.9715}_{\pm 0.0005}$	${}^{3 /•}{0.9028}_{\pm 0.0008}$
AAkNN	${}^{3 /•}{0.0620}_{\pm 0.0007}$	${}^{7 /•}{0.3303}_{\pm 0.0023}$	${}^{7 /•}{0.7067}_{\pm 0.0050}$	${}^{3 /•}{0.0322}_{\pm 0.0006}$	${}^{4 /•}{0.9715}_{\pm 0.0005}$	${}^{7 /•}{0.9025}_{\pm 0.0008}$
Yeast-dtt
Ours	${}^{1 /*}{0.0464}_{\pm 0.0025}$	${}^{1 /*}{0.1274}_{\pm 0.0069}$	${}^{1 /*}{0.2213}_{\pm 0.0113}$	${}^{1 /*}{0.0099}_{\pm 0.0009}$	${}^{1 /*}{0.9903}_{\pm 0.0011}$	${}^{1 /*}{0.9453}_{\pm 0.0027}$
LDL-DPA	${}^{3 /•}{0.0571}_{\pm 0.0006}$	${}^{4 /•}{0.1612}_{\pm 0.0016}$	${}^{3 /•}{0.2892}_{\pm 0.0029}$	${}^{3 /•}{0.0151}_{\pm 0.0003}$	${}^{5 /•}{0.9851}_{\pm 0.0003}$	${}^{3 /•}{0.9286}_{\pm 0.0008}$
LDL-LDM	${}^{2 /•}{0.0523}_{\pm 0.0007}$	${}^{2 /•}{0.1466}_{\pm 0.0018}$	${}^{2 /•}{0.2614}_{\pm 0.0037}$	${}^{2 /•}{0.0129}_{\pm 0.0003}$	${}^{2 /•}{0.9876}_{\pm 0.0005}$	${}^{2 /•}{0.9352}_{\pm 0.0008}$
LDL-LRR	${}^{3 /•}{0.0571}_{\pm 0.0006}$	${}^{5 /•}{0.1613}_{\pm 0.0016}$	${}^{5 /•}{0.2894}_{\pm 0.0030}$	${}^{3 /•}{0.0151}_{\pm 0.0003}$	${}^{4 /•}{0.9852}_{\pm 0.0004}$	${}^{5 /•}{0.9284}_{\pm 0.0008}$
Duo-LDL	${}^{6 /•}{0.0575}_{\pm 0.0013}$	${}^{7 /•}{0.1627}_{\pm 0.0031}$	${}^{7 /•}{0.2918}_{\pm 0.0055}$	${}^{6 /•}{0.0154}_{\pm 0.0007}$	${}^{5 /•}{0.9851}_{\pm 0.0006}$	${}^{7 /•}{0.9278}_{\pm 0.0012}$
BD-LDL	${}^{8 /•}{0.0786}_{\pm 0.0023}$	${}^{8 /•}{0.2331}_{\pm 0.0098}$	${}^{8 /•}{0.3852}_{\pm 0.0146}$	${}^{8 /•}{0.0599}_{\pm 0.0336}$	${}^{8 /•}{0.9738}_{\pm 0.0019}$	${}^{8 /•}{0.9096}_{\pm 0.0028}$
SABFGS	${}^{3 /•}{0.0571}_{\pm 0.0006}$	${}^{3 /•}{0.1611}_{\pm 0.0017}$	${}^{4 /•}{0.2893}_{\pm 0.0030}$	${}^{3 /•}{0.0151}_{\pm 0.0003}$	${}^{3 /•}{0.9853}_{\pm 0.0005}$	${}^{3 /•}{0.9286}_{\pm 0.0008}$
AAkNN	${}^{6 /•}{0.0575}_{\pm 0.0008}$	${}^{6 /•}{0.1622}_{\pm 0.0018}$	${}^{6 /•}{0.2909}_{\pm 0.0034}$	${}^{7 /•}{0.0155}_{\pm 0.0005}$	${}^{7 /•}{0.9850}_{\pm 0.0005}$	${}^{6 /•}{0.9281}_{\pm 0.0011}$
Yeast-elu
Ours	${}^{1 /*}{0.0239}_{\pm 0.0013}$	${}^{1 /*}{0.2909}_{\pm 0.0054}$	${}^{1 /*}{0.8788}_{\pm 0.0148}$	${}^{1 /*}{0.0129}_{\pm 0.0006}$	${}^{1 /*}{0.9872}_{\pm 0.0006}$	${}^{1 /*}{0.9370}_{\pm 0.0009}$
LDL-DPA	${}^{3 /•}{0.0376}_{\pm 0.0005}$	${}^{4 /•}{0.4347}_{\pm 0.0017}$	${}^{4 /•}{1.3238}_{\pm 0.0074}$	${}^{2 /•}{0.0281}_{\pm 0.0003}$	${}^{5 /•}{0.9718}_{\pm 0.0004}$	${}^{2 /•}{0.9041}_{\pm 0.0006}$
LDL-LDM	${}^{2 /•}{0.0374}_{\pm 0.0005}$	${}^{2 /•}{0.4341}_{\pm 0.0019}$	${}^{2 /•}{1.3230}_{\pm 0.0075}$	${}^{2 /•}{0.0281}_{\pm 0.0003}$	${}^{2 /•}{0.9719}_{\pm 0.0003}$	${}^{2 /•}{0.9041}_{\pm 0.0006}$
LDL-LRR	${}^{3 /•}{0.0376}_{\pm 0.0005}$	${}^{4 /•}{0.4347}_{\pm 0.0018}$	${}^{4 /•}{1.3238}_{\pm 0.0075}$	${}^{2 /•}{0.0281}_{\pm 0.0003}$	${}^{2 /•}{0.9719}_{\pm 0.0003}$	${}^{2 /•}{0.9041}_{\pm 0.0006}$
Duo-LDL	${}^{3 /•}{0.0376}_{\pm 0.0005}$	${}^{7 /•}{0.4379}_{\pm 0.0037}$	${}^{7 /•}{1.3333}_{\pm 0.0128}$	${}^{7 /•}{0.0286}_{\pm 0.0005}$	${}^{7 /•}{0.9714}_{\pm 0.0005}$	${}^{7 /•}{0.9035}_{\pm 0.0010}$
BD-LDL	${}^{8 /•}{0.0407}_{\pm 0.0005}$	${}^{8 /•}{0.4924}_{\pm 0.0043}$	${}^{8 /•}{1.5042}_{\pm 0.0140}$	${}^{8 /•}{0.0361}_{\pm 0.0006}$	${}^{8 /•}{0.9653}_{\pm 0.0005}$	${}^{8 /•}{0.8925}_{\pm 0.0008}$
SABFGS	${}^{3 /•}{0.0376}_{\pm 0.0005}$	${}^{3 /•}{0.4345}_{\pm 0.0018}$	${}^{3 /•}{1.3236}_{\pm 0.0076}$	${}^{2 /•}{0.0281}_{\pm 0.0003}$	${}^{2 /•}{0.9719}_{\pm 0.0003}$	${}^{2 /•}{0.9041}_{\pm 0.0006}$
AAkNN	${}^{3 /•}{0.0376}_{\pm 0.0005}$	${}^{6 /•}{0.4351}_{\pm 0.0021}$	${}^{6 /•}{1.3259}_{\pm 0.0078}$	${}^{2 /•}{0.0281}_{\pm 0.0003}$	${}^{6 /•}{0.9717}_{\pm 0.0005}$	${}^{2 /•}{0.9041}_{\pm 0.0006}$

* Best performance in each comparison group is boldfaced, with rank and statistical significance indicators shown in the top-left corner of each result.

Table 5. Prediction performance on the first four datasets with

α^{*} = 8

.

Table 5. Prediction performance on the first four datasets with

α^{*} = 8

.

	Cheb (↓)	Clark (↓)	Canber (↓)	KL (↓)	Cosine (↑)	Intersec (↑)
SJAFFE
Ours	${}^{1 /*}{0.1287}_{\pm 0.0105}$	${}^{1 /*}{0.4447}_{\pm 0.0346}$	${}^{1 /*}{0.9253}_{\pm 0.0740}$	${}^{1 /*}{0.0900}_{\pm 0.0172}$	${}^{1 /*}{0.9142}_{\pm 0.0154}$	${}^{1 /*}{0.8357}_{\pm 0.0144}$
LDL-DPA	${}^{4 /*}{0.1331}_{\pm 0.0092}$	${}^{6 /•}{0.4789}_{\pm 0.0275}$	${}^{6 /•}{0.9885}_{\pm 0.0609}$	${}^{4 /•}{0.1059}_{\pm 0.0148}$	${}^{4 /•}{0.9046}_{\pm 0.0118}$	${}^{5 /•}{0.8263}_{\pm 0.0113}$
LDL-LDM	${}^{7 /•}{0.1356}_{\pm 0.0091}$	${}^{7 /•}{0.4932}_{\pm 0.0296}$	${}^{7 /•}{1.0201}_{\pm 0.0634}$	${}^{7 /•}{0.1121}_{\pm 0.0153}$	${}^{7 /•}{0.9002}_{\pm 0.0121}$	${}^{7 /•}{0.8213}_{\pm 0.0117}$
LDL-LRR	${}^{3 /*}{0.1328}_{\pm 0.0089}$	${}^{5 /•}{0.4756}_{\pm 0.0286}$	${}^{5 /•}{0.9811}_{\pm 0.0626}$	${}^{3 /•}{0.1044}_{\pm 0.0152}$	${}^{3 /•}{0.9054}_{\pm 0.0122}$	${}^{4 /•}{0.8272}_{\pm 0.0115}$
Duo-LDL	${}^{8 /•}{0.1705}_{\pm 0.0300}$	${}^{8 /•}{0.7557}_{\pm 0.3338}$	${}^{8 /•}{1.4350}_{\pm 0.5434}$	${}^{8 /•}{1.0511}_{\pm 1.2603}$	${}^{8 /•}{0.8534}_{\pm 0.0427}$	${}^{8 /•}{0.7748}_{\pm 0.0535}$
BD-LDL	${}^{5 /*}{0.1352}_{\pm 0.0093}$	${}^{3 /•}{0.4730}_{\pm 0.0244}$	${}^{3 /*}{0.9778}_{\pm 0.0541}$	${}^{6 /•}{0.1077}_{\pm 0.0126}$	${}^{6 /•}{0.9012}_{\pm 0.0106}$	${}^{6 /*}{0.8253}_{\pm 0.0103}$
SABFGS	${}^{2 /*}{0.1323}_{\pm 0.0086}$	${}^{4 /•}{0.4755}_{\pm 0.0225}$	${}^{4 /•}{0.9803}_{\pm 0.0497}$	${}^{2 /•}{0.1042}_{\pm 0.0130}$	${}^{2 /*}{0.9056}_{\pm 0.0107}$	${}^{3 /*}{0.8273}_{\pm 0.0098}$
AAkNN	${}^{6 /*}{0.1361}_{\pm 0.0100}$	${}^{2 /*}{0.4588}_{\pm 0.0275}$	${}^{2 /*}{0.9384}_{\pm 0.0589}$	${}^{5 /•}{0.1063}_{\pm 0.0146}$	${}^{5 /*}{0.9014}_{\pm 0.0125}$	${}^{2 /*}{0.8289}_{\pm 0.0117}$
SBU-3DFE
Ours	${}^{2 /*}{0.1428}_{\pm 0.0037}$	${}^{1 /*}{0.4466}_{\pm 0.0182}$	${}^{1 /*}{0.9295}_{\pm 0.0316}$	${}^{1 /*}{0.0962}_{\pm 0.0072}$	${}^{1 /*}{0.9078}_{\pm 0.0061}$	${}^{1 /*}{0.8308}_{\pm 0.0062}$
LDL-DPA	${}^{5 /*}{0.1451}_{\pm 0.0021}$	${}^{4 /•}{0.4914}_{\pm 0.0037}$	${}^{5 /•}{0.9952}_{\pm 0.0063}$	${}^{5 /•}{0.1089}_{\pm 0.0020}$	${}^{5 /•}{0.8983}_{\pm 0.0019}$	${}^{4 /•}{0.8206}_{\pm 0.0015}$
LDL-LDM	${}^{1 /*}{0.1425}_{\pm 0.0023}$	${}^{2 /•}{0.4658}_{\pm 0.0044}$	${}^{2 /*}{0.9508}_{\pm 0.0094}$	${}^{2 /•}{0.1013}_{\pm 0.0024}$	${}^{2 /*}{0.9045}_{\pm 0.0020}$	${}^{2 /*}{0.8274}_{\pm 0.0020}$
LDL-LRR	${}^{4 /•}{0.1449}_{\pm 0.0020}$	${}^{6 /•}{0.4916}_{\pm 0.0035}$	${}^{4 /•}{0.9950}_{\pm 0.0067}$	${}^{4 /•}{0.1088}_{\pm 0.0020}$	${}^{3 /•}{0.8985}_{\pm 0.0019}$	${}^{3 /•}{0.8207}_{\pm 0.0016}$
Duo-LDL	${}^{8 /•}{0.1724}_{\pm 0.0192}$	${}^{8 /•}{0.7074}_{\pm 0.2986}$	${}^{8 /•}{1.3653}_{\pm 0.4799}$	${}^{8 /*}{0.7589}_{\pm 1.1922}$	${}^{8 /•}{0.8566}_{\pm 0.0354}$	${}^{8 /•}{0.7764}_{\pm 0.0451}$
BD-LDL	${}^{7 /•}{0.1561}_{\pm 0.0021}$	${}^{7 /•}{0.5078}_{\pm 0.0044}$	${}^{7 /•}{1.0467}_{\pm 0.0096}$	${}^{7 /•}{0.1239}_{\pm 0.0027}$	${}^{7 /•}{0.8850}_{\pm 0.0023}$	${}^{7 /•}{0.8095}_{\pm 0.0020}$
SABFGS	${}^{3 /*}{0.1448}_{\pm 0.0022}$	${}^{4 /•}{0.4914}_{\pm 0.0041}$	${}^{3 /•}{0.9948}_{\pm 0.0079}$	${}^{3 /•}{0.1087}_{\pm 0.0020}$	${}^{3 /•}{0.8985}_{\pm 0.0019}$	${}^{4 /•}{0.8206}_{\pm 0.0014}$
AAkNN	${}^{6 /•}{0.1461}_{\pm 0.0024}$	${}^{3 /•}{0.4906}_{\pm 0.0044}$	${}^{6 /•}{0.9987}_{\pm 0.0097}$	${}^{6 /•}{0.1124}_{\pm 0.0028}$	${}^{6 /•}{0.8964}_{\pm 0.0025}$	${}^{6 /•}{0.8201}_{\pm 0.0021}$
Yeast-alpha
Ours	${}^{1 /*}{0.0166}_{\pm 0.0005}$	${}^{1 /*}{0.2737}_{\pm 0.0074}$	${}^{1 /*}{0.9127}_{\pm 0.0288}$	${}^{1 /*}{0.0088}_{\pm 0.0006}$	${}^{1 /*}{0.9914}_{\pm 0.0005}$	${}^{1 /*}{0.9496}_{\pm 0.0016}$
LDL-DPA	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{4 /•}{0.3873}_{\pm 0.0024}$	${}^{3 /•}{1.3713}_{\pm 0.0083}$	${}^{2 /•}{0.0170}_{\pm 0.0000}$	${}^{2 /•}{0.9836}_{\pm 0.0005}$	${}^{4 /•}{0.9271}_{\pm 0.0006}$
LDL-LDM	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{2 /•}{0.3864}_{\pm 0.0025}$	${}^{2 /•}{1.3145}_{\pm 0.0085}$	${}^{2 /•}{0.0170}_{\pm 0.0000}$	${}^{2 /•}{0.9836}_{\pm 0.0005}$	${}^{2 /•}{0.9275}_{\pm 0.0007}$
LDL-LRR	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{2 /•}{0.3872}_{\pm 0.0026}$	${}^{5 /•}{1.3176}_{\pm 0.0082}$	${}^{2 /•}{0.0170}_{\pm 0.0000}$	${}^{2 /•}{0.9836}_{\pm 0.0005}$	${}^{4 /•}{0.9271}_{\pm 0.0006}$
Duo-LDL	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{7 /•}{0.3910}_{\pm 0.0032}$	${}^{7 /•}{1.3304}_{\pm 0.0112}$	${}^{7 /•}{0.0173}_{\pm 0.0005}$	${}^{7 /•}{0.9832}_{\pm 0.0004}$	${}^{7 /•}{0.9266}_{\pm 0.0007}$
BD-LDL	${}^{8 /•}{0.0252}_{\pm 0.0006}$	${}^{8 /•}{0.4375}_{\pm 0.0047}$	${}^{8 /•}{1.4867}_{\pm 0.0155}$	${}^{8 /•}{0.0217}_{\pm 0.0005}$	${}^{8 /•}{0.9789}_{\pm 0.0006}$	${}^{8 /•}{0.9179}_{\pm 0.0010}$
SABFGS	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{4 /•}{0.3873}_{\pm 0.0023}$	${}^{4 /•}{1.3175}_{\pm 0.0078}$	${}^{2 /•}{0.0170}_{\pm 0.0000}$	${}^{2 /•}{0.9836}_{\pm 0.0005}$	${}^{3 /•}{0.9272}_{\pm 0.0006}$
AAkNN	${}^{2 /•}{0.0220}_{\pm 0.0000}$	${}^{4 /•}{0.3873}_{\pm 0.0025}$	${}^{6 /•}{1.3186}_{\pm 0.0089}$	${}^{2 /•}{0.0170}_{\pm 0.0000}$	${}^{2 /•}{0.9836}_{\pm 0.0005}$	${}^{4 /•}{0.9271}_{\pm 0.0006}$
Yeast-cdc
Ours	${}^{1 /*}{0.0207}_{\pm 0.0007}$	${}^{1 /*}{0.2821}_{\pm 0.0093}$	${}^{1 /*}{0.8627}_{\pm 0.0312}$	${}^{1 /*}{0.0110}_{\pm 0.0008}$	${}^{1 /*}{0.9894}_{\pm 0.0005}$	${}^{1 /*}{0.9432}_{\pm 0.0019}$
LDL-DPA	${}^{2 /•}{0.0269}_{\pm 0.0003}$	${}^{5 /•}{0.3979}_{\pm 0.0021}$	${}^{4 /•}{1.2301}_{\pm 0.0051}$	${}^{2 /•}{0.0210}_{\pm 0.0000}$	${}^{3 /•}{0.9806}_{\pm 0.0005}$	${}^{2 /•}{0.9201}_{\pm 0.0003}$
LDL-LDM	${}^{2 /•}{0.0269}_{\pm 0.0003}$	${}^{2 /•}{0.3972}_{\pm 0.0022}$	${}^{2 /•}{1.2285}_{\pm 0.0048}$	${}^{2 /•}{0.0210}_{\pm 0.0000}$	${}^{2 /•}{0.9808}_{\pm 0.0004}$	${}^{2 /•}{0.9201}_{\pm 0.0003}$
LDL-LRR	${}^{2 /•}{0.0269}_{\pm 0.0003}$	${}^{4 /•}{0.3977}_{\pm 0.0022}$	${}^{3 /•}{1.2300}_{\pm 0.0052}$	${}^{2 /•}{0.0210}_{\pm 0.0000}$	${}^{5 /•}{0.9805}_{\pm 0.0005}$	${}^{2 /•}{0.9201}_{\pm 0.0003}$
Duo-LDL	${}^{7 /•}{0.0271}_{\pm 0.0007}$	${}^{7 /•}{0.4053}_{\pm 0.0133}$	${}^{7 /•}{1.2512}_{\pm 0.0507}$	${}^{7 /•}{0.0217}_{\pm 0.0016}$	${}^{7 /•}{0.9799}_{\pm 0.0014}$	${}^{7 /•}{0.9184}_{\pm 0.0038}$
BD-LDL	${}^{8 /•}{0.0303}_{\pm 0.0005}$	${}^{8 /•}{0.4416}_{\pm 0.0048}$	${}^{8 /•}{1.3629}_{\pm 0.0131}$	${}^{8 /•}{0.0261}_{\pm 0.0007}$	${}^{8 /•}{0.9761}_{\pm 0.0006}$	${}^{8 /•}{0.9113}_{\pm 0.0008}$
SABFGS	${}^{2 /•}{0.0269}_{\pm 0.0003}$	${}^{3 /•}{0.3976}_{\pm 0.0021}$	${}^{5 /•}{1.2303}_{\pm 0.0051}$	${}^{2 /•}{0.0210}_{\pm 0.0000}$	${}^{3 /•}{0.9806}_{\pm 0.0005}$	${}^{2 /•}{0.9201}_{\pm 0.0003}$
AAkNN	${}^{2 /•}{0.0269}_{\pm 0.0003}$	${}^{6 /•}{0.3982}_{\pm 0.0021}$	${}^{6 /•}{1.2315}_{\pm 0.0057}$	${}^{2 /•}{0.0210}_{\pm 0.0000}$	${}^{5 /•}{0.9805}_{\pm 0.0005}$	${}^{6 /•}{0.9200}_{\pm 0.0005}$

* Best performance in each comparison group is boldfaced, with rank and statistical significance indicators shown in the top-left corner of each result.

Table 6. Prediction performance on the last four datasets with

α^{*} = 8

.

Table 6. Prediction performance on the last four datasets with

α^{*} = 8

.

	Cheb (↓)	Clark (↓)	Canber (↓)	KL (↓)	Cosine (↑)	Intersec (↑)
Yeast-cold
Ours	${}^{2 /*}{0.0568}_{\pm 0.0019}$	${}^{1 /*}{0.1542}_{\pm 0.0048}$	${}^{1 /*}{0.2660}_{\pm 0.0082}$	${}^{1 /*}{0.0145}_{\pm 0.0008}$	${}^{1 /*}{0.9864}_{\pm 0.0007}$	${}^{1 /*}{0.9344}_{\pm 0.0020}$
LDL-DPA	${}^{3 /•}{0.0632}_{\pm 0.0006}$	${}^{5 /•}{0.1746}_{\pm 0.0016}$	${}^{5 /•}{0.3030}_{\pm 0.0031}$	${}^{3 /•}{0.0177}_{\pm 0.0005}$	${}^{3 /•}{0.9832}_{\pm 0.0004}$	${}^{3 /•}{0.9256}_{\pm 0.0007}$
LDL-LDM	${}^{1 /*}{0.0567}_{\pm 0.0008}$	${}^{2 /*}{0.1552}_{\pm 0.0023}$	${}^{2 /*}{0.2684}_{\pm 0.0043}$	${}^{1 /*}{0.0145}_{\pm 0.0005}$	${}^{1 /*}{0.9864}_{\pm 0.0005}$	${}^{2 /*}{0.9337}_{\pm 0.0009}$
LDL-LRR	${}^{3 /•}{0.0632}_{\pm 0.0006}$	${}^{3 /•}{0.1744}_{\pm 0.0020}$	${}^{3 /•}{0.3029}_{\pm 0.0034}$	${}^{3 /•}{0.0177}_{\pm 0.0005}$	${}^{3 /•}{0.9832}_{\pm 0.0004}$	${}^{3 /•}{0.9256}_{\pm 0.0007}$
Duo-LDL	${}^{7 /•}{0.0643}_{\pm 0.0038}$	${}^{7 /•}{0.1780}_{\pm 0.0103}$	${}^{7 /•}{0.3086}_{\pm 0.0138}$	${}^{7 /•}{0.0185}_{\pm 0.0020}$	${}^{7 /•}{0.9826}_{\pm 0.0013}$	${}^{7 /•}{0.9245}_{\pm 0.0028}$
BD-LDL	${}^{8 /•}{0.0673}_{\pm 0.0018}$	${}^{8 /•}{0.1859}_{\pm 0.0042}$	${}^{8 /•}{0.3176}_{\pm 0.0064}$	${}^{8 /•}{0.0212}_{\pm 0.0010}$	${}^{8 /•}{0.9809}_{\pm 0.0009}$	${}^{8 /•}{0.9225}_{\pm 0.0016}$
SABFGS	${}^{3 /•}{0.0632}_{\pm 0.0006}$	${}^{3 /•}{0.1744}_{\pm 0.0020}$	${}^{3 /•}{0.3029}_{\pm 0.0033}$	${}^{3 /•}{0.0177}_{\pm 0.0005}$	${}^{3 /•}{0.9832}_{\pm 0.0004}$	${}^{3 /•}{0.9256}_{\pm 0.0007}$
AAkNN	${}^{6 /•}{0.0639}_{\pm 0.0007}$	${}^{6 /•}{0.1760}_{\pm 0.0021}$	${}^{6 /•}{0.3062}_{\pm 0.0037}$	${}^{6 /•}{0.0183}_{\pm 0.0005}$	${}^{6 /•}{0.9829}_{\pm 0.0003}$	${}^{6 /•}{0.9248}_{\pm 0.0009}$
Yeast-diau
Ours	${}^{1 /*}{0.0443}_{\pm 0.0014}$	${}^{1 /*}{0.2326}_{\pm 0.0057}$	${}^{1 /*}{0.4979}_{\pm 0.0122}$	${}^{1 /*}{0.0169}_{\pm 0.0009}$	${}^{1 /*}{0.9844}_{\pm 0.0008}$	${}^{1 /*}{0.9306}_{\pm 0.0017}$
LDL-DPA	${}^{5 /•}{0.0584}_{\pm 0.0005}$	${}^{5 /•}{0.3011}_{\pm 0.0025}$	${}^{5 /•}{0.6395}_{\pm 0.0053}$	${}^{3 /•}{0.0272}_{\pm 0.0004}$	${}^{3 /•}{0.9755}_{\pm 0.0005}$	${}^{4 /•}{0.9112}_{\pm 0.0009}$
LDL-LDM	${}^{2 /•}{0.0567}_{\pm 0.0005}$	${}^{2 /•}{0.2902}_{\pm 0.0023}$	${}^{2 /•}{0.6169}_{\pm 0.0045}$	${}^{2 /•}{0.0253}_{\pm 0.0005}$	${}^{2 /•}{0.9767}_{\pm 0.0005}$	${}^{2 /•}{0.9141}_{\pm 0.0009}$
LDL-LRR	${}^{5 /•}{0.0584}_{\pm 0.0005}$	${}^{3 /•}{0.3008}_{\pm 0.0027}$	${}^{4 /•}{0.6394}_{\pm 0.0053}$	${}^{3 /•}{0.0272}_{\pm 0.0004}$	${}^{3 /•}{0.9755}_{\pm 0.0005}$	${}^{4 /•}{0.9112}_{\pm 0.0009}$
Duo-LDL	${}^{5 /•}{0.0584}_{\pm 0.0005}$	${}^{7 /•}{0.3014}_{\pm 0.0023}$	${}^{6 /•}{0.6398}_{\pm 0.0064}$	${}^{6 /•}{0.0273}_{\pm 0.0005}$	${}^{6 /•}{0.9754}_{\pm 0.0005}$	${}^{3 /•}{0.9113}_{\pm 0.0011}$
BD-LDL	${}^{8 /•}{0.0671}_{\pm 0.0018}$	${}^{8 /•}{0.4045}_{\pm 0.0113}$	${}^{8 /•}{0.7997}_{\pm 0.0162}$	${}^{8 /•}{0.0980}_{\pm 0.0258}$	${}^{8 /•}{0.9676}_{\pm 0.0011}$	${}^{8 /•}{0.8974}_{\pm 0.0017}$
SABFGS	${}^{4 /•}{0.0583}_{\pm 0.0005}$	${}^{4 /•}{0.3009}_{\pm 0.0026}$	${}^{3 /•}{0.6393}_{\pm 0.0051}$	${}^{3 /•}{0.0272}_{\pm 0.0004}$	${}^{3 /•}{0.9755}_{\pm 0.0005}$	${}^{4 /•}{0.9112}_{\pm 0.0009}$
AAkNN	${}^{3 /•}{0.0581}_{\pm 0.0006}$	${}^{6 /•}{0.3013}_{\pm 0.0025}$	${}^{7 /•}{0.6414}_{\pm 0.0051}$	${}^{6 /•}{0.0273}_{\pm 0.0005}$	${}^{6 /•}{0.9754}_{\pm 0.0005}$	${}^{7 /•}{0.9108}_{\pm 0.0008}$
Yeast-dtt
Ours	${}^{1 /*}{0.0446}_{\pm 0.0017}$	${}^{1 /*}{0.1224}_{\pm 0.0061}$	${}^{1 /*}{0.2116}_{\pm 0.0120}$	${}^{1 /*}{0.0093}_{\pm 0.0009}$	${}^{1 /*}{0.9911}_{\pm 0.0007}$	${}^{1 /*}{0.9479}_{\pm 0.0029}$
LDL-DPA	${}^{3 /•}{0.0526}_{\pm 0.0007}$	${}^{4 /•}{0.1485}_{\pm 0.0013}$	${}^{4 /•}{0.2575}_{\pm 0.0028}$	${}^{3 /•}{0.0130}_{\pm 0.0000}$	${}^{3 /•}{0.9879}_{\pm 0.0003}$	${}^{3 /•}{0.9371}_{\pm 0.0007}$
LDL-LDM	${}^{2 /*}{0.0458}_{\pm 0.0004}$	${}^{2 /•}{0.1271}_{\pm 0.0006}$	${}^{2 /•}{0.2215}_{\pm 0.0020}$	${}^{2 /•}{0.0100}_{\pm 0.0000}$	${}^{2 /*}{0.9907}_{\pm 0.0005}$	${}^{2 /•}{0.9455}_{\pm 0.0005}$
LDL-LRR	${}^{5 /•}{0.0527}_{\pm 0.0005}$	${}^{4 /•}{0.1485}_{\pm 0.0013}$	${}^{5 /•}{0.2576}_{\pm 0.0026}$	${}^{3 /•}{0.0130}_{\pm 0.0000}$	${}^{3 /•}{0.9879}_{\pm 0.0003}$	${}^{3 /•}{0.9371}_{\pm 0.0007}$
Duo-LDL	${}^{7 /•}{0.0546}_{\pm 0.0051}$	${}^{7 /•}{0.1540}_{\pm 0.0143}$	${}^{7 /•}{0.2668}_{\pm 0.0211}$	${}^{7 /•}{0.0139}_{\pm 0.0022}$	${}^{7 /•}{0.9872}_{\pm 0.0016}$	${}^{7 /•}{0.9350}_{\pm 0.0044}$
BD-LDL	${}^{8 /•}{0.0675}_{\pm 0.0022}$	${}^{8 /•}{0.1996}_{\pm 0.0080}$	${}^{8 /•}{0.3353}_{\pm 0.0143}$	${}^{8 /•}{0.0278}_{\pm 0.0033}$	${}^{8 /•}{0.9791}_{\pm 0.0017}$	${}^{8 /•}{0.9204}_{\pm 0.0031}$
SABFGS	${}^{3 /•}{0.0526}_{\pm 0.0007}$	${}^{3 /•}{0.1484}_{\pm 0.0013}$	${}^{3 /•}{0.2574}_{\pm 0.0028}$	${}^{3 /•}{0.0130}_{\pm 0.0000}$	${}^{3 /•}{0.9879}_{\pm 0.0003}$	${}^{3 /•}{0.9371}_{\pm 0.0007}$
AAkNN	${}^{6 /•}{0.0534}_{\pm 0.0008}$	${}^{6 /•}{0.1505}_{\pm 0.0018}$	${}^{6 /•}{0.2613}_{\pm 0.0033}$	${}^{6 /•}{0.0133}_{\pm 0.0005}$	${}^{6 /•}{0.9875}_{\pm 0.0005}$	${}^{6 /•}{0.9364}_{\pm 0.0008}$
Yeast-elu
Ours	${}^{1 /*}{0.0214}_{\pm 0.0007}$	${}^{1 /*}{0.2700}_{\pm 0.0050}$	${}^{1 /*}{0.8153}_{\pm 0.0138}$	${}^{1 /*}{0.0110}_{\pm 0.0005}$	${}^{1 /*}{0.9893}_{\pm 0.0007}$	${}^{1 /*}{0.9420}_{\pm 0.0011}$
LDL-DPA	${}^{2 /•}{0.0320}_{\pm 0.0000}$	${}^{5 /•}{0.3824}_{\pm 0.0022}$	${}^{5 /•}{1.1562}_{\pm 0.0078}$	${}^{2 /•}{0.0218}_{\pm 0.0004}$	${}^{3 /•}{0.9780}_{\pm 0.0000}$	${}^{4 /•}{0.9164}_{\pm 0.0007}$
LDL-LDM	${}^{2 /•}{0.0320}_{\pm 0.0000}$	${}^{2 /•}{0.3817}_{\pm 0.0021}$	${}^{2 /•}{1.1545}_{\pm 0.0079}$	${}^{2 /•}{0.0218}_{\pm 0.0004}$	${}^{2 /•}{0.9782}_{\pm 0.0004}$	${}^{2 /•}{0.9168}_{\pm 0.0008}$
LDL-LRR	${}^{2 /•}{0.0320}_{\pm 0.0000}$	${}^{3 /•}{0.3823}_{\pm 0.0022}$	${}^{3 /•}{1.1560}_{\pm 0.0077}$	${}^{2 /•}{0.0218}_{\pm 0.0004}$	${}^{3 /•}{0.9780}_{\pm 0.0000}$	${}^{4 /•}{0.9164}_{\pm 0.0007}$
Duo-LDL	${}^{7 /•}{0.0322}_{\pm 0.0004}$	${}^{7 /•}{0.3845}_{\pm 0.0023}$	${}^{7 /•}{1.1621}_{\pm 0.0091}$	${}^{6 /•}{0.0220}_{\pm 0.0000}$	${}^{3 /•}{0.9780}_{\pm 0.0000}$	${}^{6 /•}{0.9162}_{\pm 0.0008}$
BD-LDL	${}^{8 /•}{0.0361}_{\pm 0.0007}$	${}^{8 /•}{0.4474}_{\pm 0.0052}$	${}^{8 /•}{1.3617}_{\pm 0.0166}$	${}^{8 /•}{0.0298}_{\pm 0.0009}$	${}^{8 /•}{0.9718}_{\pm 0.0006}$	${}^{8 /•}{0.9032}_{\pm 0.0010}$
SABFGS	${}^{2 /•}{0.0320}_{\pm 0.0000}$	${}^{3 /•}{0.3823}_{\pm 0.0022}$	${}^{4 /•}{1.1561}_{\pm 0.0080}$	${}^{2 /•}{0.0218}_{\pm 0.0004}$	${}^{3 /•}{0.9780}_{\pm 0.0000}$	${}^{3 /•}{0.9166}_{\pm 0.0007}$
AAkNN	${}^{6 /•}{0.0321}_{\pm 0.0003}$	${}^{6 /•}{0.3835}_{\pm 0.0021}$	${}^{6 /•}{1.1603}_{\pm 0.0083}$	${}^{6 /•}{0.0220}_{\pm 0.0000}$	${}^{3 /•}{0.9780}_{\pm 0.0000}$	${}^{7 /•}{0.9161}_{\pm 0.0006}$

* Best performance in each comparison group is boldfaced, with rank and statistical significance indicators shown in the top-left corner of each result.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Meng, C.; Zhou, H.; Guo, Y.; Xue, B.; Yu, T.; Lu, Y. Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes. Electronics 2025, 14, 2736. https://doi.org/10.3390/electronics14132736

AMA Style

Li X, Meng C, Zhou H, Guo Y, Xue B, Yu T, Lu Y. Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes. Electronics. 2025; 14(13):2736. https://doi.org/10.3390/electronics14132736

Chicago/Turabian Style

Li, Xinhai, Chenxu Meng, Heng Zhou, Yi Guo, Bowen Xue, Tianzuo Yu, and Yunan Lu. 2025. "Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes" Electronics 14, no. 13: 2736. https://doi.org/10.3390/electronics14132736

APA Style

Li, X., Meng, C., Zhou, H., Guo, Y., Xue, B., Yu, T., & Lu, Y. (2025). Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes. Electronics, 14(13), 2736. https://doi.org/10.3390/electronics14132736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Learning from Semantically Confused Label Distribution via Auto-Encoding Variational Bayes

Abstract

1. Introduction

2. Related Work

3. Methodology

4. Variational Inference

5. Experiments

5.1. Datasets and Evaluation Metrics

5.2. Experimental Procedure

5.3. Comparison Algorithms

5.4. Results and Discussions

5.5. Further Analysis

5.5.1. Hyperparameter Analysis

5.5.2. Convergence

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI