Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks

Xu, Zhen; Chen, Sicong

doi:10.3390/electronics14234634

Open AccessArticle

Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks

by

Zhen Xu

^1,*,†

and

Sicong Chen

^2,†

¹

College of Computer Science and Artificial Intelligence Engineering, Wenzhou University, Wenzhou 325006, China

²

Kasco Signal Co., Ltd., Shanghai 200072, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(23), 4634; https://doi.org/10.3390/electronics14234634

Submission received: 31 October 2025 / Revised: 22 November 2025 / Accepted: 24 November 2025 / Published: 25 November 2025

(This article belongs to the Special Issue Computational Intelligence and Machine Learning: Models and Applications: 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Distributed multi-dimensional classification, where multiple nodes over a network induce a multi-dimensional classifier based on their own local data and a little information exchanged from neighbors, has received extensive attention in the academic community recently. Nevertheless, we observe that the classical distributed multi-dimensional classification formulation requires all training data to have definite feature attributes and complete labels. However, in real-world scenarios, due to measurement errors in distributed networks, the collected data samples consist of attributes with uncertainty. Additionally, a substantial proportion of multi-dimensional data faces challenges in label acquisition. Therefore, the key to achieving satisfactory performance in such a case is designing an effective method to model the input uncertainty and exploit weakly supervised information from the training data. Considering this, in this paper, we design a novel misclassification loss function that extracts effective information from uncertain data by treating it as the integral of misclassification loss over the potential data distribution. Additionally, we propose a new explicit feature mapping for constructing a nonlinear discriminant function. Based on this, we further put forward a novel manifold regularization term to recover multi-dimensional labels and simplify the original objective function to enable it to be optimized. By leveraging the gradient descent method, we optimize the simplified decentralized cost function and obtain the global optimal solution. We evaluate the performance of the proposed distributed semi-supervised multi-dimensional uncertain data classification algorithm, namely the dSMUDC algorithm, on several real datasets. The results of our experiments indicate that, in terms of all metrics, our proposed algorithm outperforms existing approaches to a significant extent.

Keywords:

semi-supervised learning; multi-dimensional classification; uncertain data classification; explicit feature map

1. Introduction

Currently, distributed learning (DL) stands as a widely used learning framework. In this framework, a specific global task can be executed across a group of individual nodes, with only a small amount of key information shared among neighbors [1,2,3,4,5,6,7,8,9]. Owing to its outstanding learning performance and the robustness of distributed networks to node/link failures, numerous DL approaches have been devised and widely applied across diverse fields in recent years. These applications span critical domains like intelligent computation [3,6], the Internet of Things (IoT) [7,8], and 6G-enabled intelligent transportation systems [5,9]. Recent studies have explored the performance enhancement of distributed computing in real-world scenarios. For instance, in [6], parallel computing is leveraged to improve the computational efficiency of convolutional neural networks. In [5], device-side data processing and resource allocation are optimized via DL frameworks to boost IoT system performance. Additional research has focused on DL frameworks tailored for 6G intelligent transportation systems [9], as well as the validation of real-world implementations in 6G IoT scenarios [7]. Collectively, these works lay a solid foundation for the practical deployment of DL across diverse critical domains.

Nevertheless, conventional DL algorithms typically target binary or multi-class classification tasks and make the assumption that data labels are of a single dimension [1,5]. In practical application scenarios, however, a substantial volume of training data can often be categorized across multiple dimensions [10,11,12,13,14,15]. An illustrative example is the census, where survey agencies may classify individuals according to multiple dimensions, including gender, age, occupation, education level, etc.

To address such kinds of data, a series of multi-dimensional classification methods have been proposed [10,11,12,13,14,15,16,17]. Generally speaking, the majority of existing algorithms focus on exploiting the correlations among multi-dimensional labels in an explicit manner [10,11,12,13,14,15,16,17]. Currently, typical methods for modeling label correlations include merging multi-dimensional labels into new labels [10], exploiting explicit label chain order to explore heterogeneous class spaces [13], and learning label-pair dependencies in decomposed multi-dimensional spaces [14]. Recently, a few distributed multi-dimensional classification (MDC) algorithms have been successively proposed [16,17]. For instance, in [16], the authors proposed to decompose multi-dimensional heterogeneous spaces into multiple homogeneous class spaces via one-vs.-one decomposition and employ various techniques to learn the high-order label correlations among decomposed labels. In [17], a distributed MDC is proposed to map the multi-dimensional label space into several low-dimensional homogeneous subspaces based on subspace learning and learn the second-order correlations among embedded labels. Existing centralized/distributed MDC methods have made considerable progress, but they still face certain potential limitations.

First, as multi-dimensional data, training data is characterized by labels across multiple distinct dimensions. However, in many practical applications, due to the difficulty and high cost of label acquisition, a large volume of unlabeled data is easily accessible, while accurately annotating such data remains challenging. In most cases, we can only obtain a small amount of labeled data alongside a large amount of unlabeled data. In such scenarios, most existing algorithms belong to supervised learning algorithms [10,11,12,13,14,15], making it difficult for them to achieve satisfactory performance. Semi-supervised learning algorithms, which are capable of handling both labeled and unlabeled data simultaneously, may be a more suitable choice.

Second, these conventional distributed MDC algorithms, along with the majority of distributed classification methods, often necessitate that the attribute values of each training data instance be definitive. However, in a wide range of real application scenarios, due to the measurement errors of the equipment, the attributes of the collected data exhibit a certain level of uncertainty [18,19,20,21]. Traditional MDC/DL approaches overlook the uncertainty information present in training data, potentially resulting in inadequate classification results.

In the past two decades, some studies have considered the problem of data uncertainty caused by observation inaccuracies [22,23,24,25,26], which usually intend to adjusting the weights of data samples with high uncertainty so as to make the induced model more effective. In recent years, a small number of studies have also modeled uncertain data by characterizing data distributions. Nevertheless, these methods are constrained by the requirement for data with specific attributes, preventing them from achieving their intended effectiveness. For example, the method proposed in [27] requires to employ a tuple-level model to characterize the data distribution, which is unavailable in many real-world scenarios. Besides, in the literature [28], the authors proposed to utilize a multi-dimensional Gaussian distribution to model the data uncertainty. However, this method can only be used for linearly separable binary classification problems, which limits its scope of applications.

Therefore, semi-supervised classification of multi-dimensional uncertain data constitutes a significant research challenge that needs investigation. This study presents a distributed semi-supervised classification algorithm for multi-dimensional uncertain data (dSMUDC) to address the challenges posed by absent labels and data uncertainty. The main contributions of this work are listed as follows:

1. To fully utilize the uncertainty information of uncertain data, we characterize uncertain data instances using a multi-dimensional Gaussian distribution. We then design the loss function for training data instances as the integral of misclassification losses over the Gaussian distribution.

2. To achieve nonlinear classification while simplifying integral computation, we reconstruct the probability density function of uncertain data using a new explicit feature mapping.

3. A one-vs.-one decomposition strategy is employed to transform the multi-dimensional heterogeneous class space into multiple homogeneous multi-label class spaces. Subsequently, the class labels of individual training data points are recovered by leveraging the manifold regularization term. Finally, the classifier is trained using the recovered class labels.

4. The theoretical analysis of the proposed algorithm is conducted.

The remainder of this work is organized as below. In Section 2, some preliminaries are briefly reviewed. In Section 3, a dSMUDC algorithm is developed, and its performance is analyzed. In Section 4, we show some numerical simulations on a series of datasets, and in Section 5, we draw some conclusions based on those simulation results.

2. Preliminaries

To ensure this paper is self-contained, some fundamental preliminaries should be briefly introduced.

2.1. Distributed Vector Quantization

Vector quantization (VQ) is a method to achieve data compression while retaining as much relevant information as possible [29,30]. Its main idea is to partition the input data and generate a collection of reproduction vectors to represent the original data in each partition. VQ has several advantages, including a high compression rate, simple decoding, and good detail preservation, which makes it suitable for various applications.

The technological details of the distributed VQ (dVQ) method are presented as follows [29,30]: Given the training data set

{x_{j, n}}

at each node j, we define the reproduction vector and the associated set by

{v_{j, l}}_{l = 1}^{V}

and

{M_{j, l}}_{l = 1}^{V}

, where V represents the number of reproduction vectors.

Initialization: We set a small loop indexed by

τ

, and initialize the values of V reproduction vectors

{v_{j, l, 0}}_{l = 1}^{V}

with random values when

τ = 0

.

Data Partition: At the following iterations

τ > 0

, we can calculate the Euclidean distance between the reproduction vectors

{v_{j, l, τ}}_{l = 1}^{V}

and the dispersedly stored training data

x

as

{dist}_{j, n, l} = | | x_{j, n} - v_{j, l, τ} {| |}_{2}

. According to the calculation results, we can classify the training data into the nearest partition

M_{j, l, τ}

, i.e.,

x_{j, n} \in M_{j, l, τ}

, if

{dist}_{j, n, l} < {dist}_{j, n, r}, \forall r

.

Local Update of Reproduction Vector: We update the reproduction vectors based on the locally partitioned results

v_{j, l, τ}^{'} = \frac{\sum_{n \in M_{j, l, τ}} x_{j, n}}{N_{j, l, τ}^{'}},

(1)

where the count of partition

M_{j, l, τ}

is denoted by

N_{j, l, τ}^{'}

.

Information Fusion of Local Estimation: We can obtain the global-like estimation of the reproduction vector by exchanging and fusing the immediate estimates among neighboring nodes, which is given by

v_{j, l, τ} = \frac{v_{j, l, τ}^{'} N_{j, l, τ}^{'} + \exp (- τ / J) \sum_{i \in B_{j}} v_{i, l, τ}^{'} N_{i, l, τ}^{'}}{N_{j, l, τ}^{'} + \exp (- τ / J) \sum_{i \in B_{j}} N_{i, l, τ}^{'}},

(2)

where

\exp (- τ / J)

represents a coefficient that varies with time. This parameter can progressively reduce the impact of estimates from one-hop neighbors as the partitioned results are regularly updated until convergence.

Loop Termination Criterion: As long as the difference between the updated reproduction vectors at two successive iterations is smaller than the threshold

ε

, the small loop indexed by

τ

terminates. Otherwise, the small loop continues, i.e.,

τ = τ + 1

.

2.2. Explicit Feature Map

To achieve non-linear classification, a non-linear discriminant function should be employed for classifier induction.

As we all know, the learning performance of kernel-based classifiers is largely dependent on the settings of the kernel function. However, in many real-world situations, it is difficult to select optimal parameters due to a lack of prior knowledge. In some circumstances with complicated data distributions, the densities of distinct areas might differ greatly, making it impossible for a discriminant function that employs a single global kernel to develop an efficient decision boundary across all regions. To solve these problems, referring to [29], an explicit feature mapping method has been proposed based on VQ to replace the original kernel map. By using this explicit mapping method, the complex kernel parameter selection can be avoided, and the ability to characterize the boundary of complex data distribution can be improved.

The process of constructing an explicit feature mapping function based on VQ can be summarized as follows:

Data partition: Given the training data set at each node j, we employ the dVQ algorithm to obtain the global consensus reproduction vectors

{v_{j, l}}_{l = 1}^{V}

and the partitions

{M_{j, l}}_{l = 1}^{V}

, where the reproduction vector

v_{j, l}

represents the mean vector of data instances in the partition

M_{j, l}

, which is located at the center of the partition.

Data Distribution Characterization: We characterize the data distribution in a small region using the Gaussian distribution. Supposing that the training data is regularly distributed in a limited region, the probability density function (pdf) of the data distribution can be articulated as

p (x | v_{j, l}, Π_{j, l}) = \frac{\exp (- \frac{1}{2} {(x - v_{j, l})}^{T} Π_{j, l}^{- 1} (x - v_{j, l}))}{{(2 π)}^{\frac{D}{2}} {(det (Π_{j, l}))}^{\frac{1}{2}}},

(3)

where the covariance matrix

Π_{j, l} \in R^{D \times D}

represents the degree of data dispersion in the partition

M_{j, l}

, which can be calculated by

Π_{j, l} = \frac{\sum_{n \in M_{j, l}} (x_{j, n} - v_{j, l}) {(x_{j, n} - v_{j, l})}^{T}}{N_{j, l}} .

(4)

Explicit Feature Map using Gaussian pdf: Utilizing the Gaussian pdfs, we may obtain the explicit feature map by consolidating the pdfs of all regions into a vector, represented as

ψ_{j} (x) = {(p (x | v_{j, 1}, Π_{j, 1}), \dots, p (x | v_{j, V}, Π_{j, V}))}^{T}

(5)

We use the modified mapping function with a multi-scale variant to boost its ability to cover data regions, making it applicable for regions with sparse training data instances, as referenced in [29]. To be specific, some positive coefficients scale the covariance matrix of the Gaussian model, so that the explicit feature map can be reformulated as

ψ_{j} (x) = {(χ_{j} (x | v_{j, 1}, Π_{j, 1}), \dots, χ_{j} (x | v_{j, V}, Π_{j, V}))}^{T}

(6)

where

χ (x | v_{j, l}, Π_{j, l})

denotes the Gaussian pdfs with the same mean vector and different covariance matrices

χ_{j} (x | v_{j, l}, Π_{j, l}) = {(p (x | v_{j, l}, s_{1} Π_{j, l}), \dots, p (x | v_{j, l}, s_{q} Π_{j, l}))}^{T}

(7)

with

s_{1}, \dots, s_{q}

being the scaling coefficients.

3. dSMUDC Algorithm

To achieve dSMUDC, we first formulate the issue of distributed semi-supervised classification of multi-dimensional uncertain data in Section 3.1. Then, we present the procedure of one-vs.-one decomposition encoding in Section 3.2 and design the global objective function in Section 3.3. Subsequently, it is followed by decentralized implementation in Section 3.4 and optimization in Section 3.5. Section 3.6 outlines the main steps of the decomposition decoding process. Ultimately, in Section 3.7, the performance of the proposed dSMUDC algorithm is analyzed.

3.1. Problem Formulation

In this work, a distributed network consisting of J nodes is considered. A total of

L + U

data are distributed across these J nodes. At each node j, there are

L_{j}

labeled uncertain data

{x_{j, n}, Θ_{j, n}, y_{j, n}}_{n = 1}^{L_{j}}

and

U_{j}

unlabeled uncertain data

{x_{j, n}, Θ_{j, n}, y_{j, n}}_{n = L_{j} + 1}^{L_{j} + U_{j}}

, where

{x_{j, n}, Θ_{j, n}}

denotes the input features of uncertain data, and

y_{j, n} \in {+ 1, 0}^{\sum_{m = 1}^{M} K_{m} \times 1}

denotes the corresponding labels.

The input features are represented as a multivariate Gaussian distribution with a mean vector

x_{j, n} \in R^{D \times 1}

and a covariance matrix

Θ_{j, n} \in R^{D \times D}

, as shown in Figure 1. More specifically, the Gaussian uncertainty pdf of the n-th data sample may be defined as [28]

f_{x_{j, n}} (x) = \frac{\exp (- \frac{1}{2} {(x - x_{j, n})}^{T} {(Θ_{j, n})}^{- 1} (x - x_{j, n}))}{{(2 π)}^{\frac{D}{2}} {| Θ_{j, n} |}^{\frac{1}{2}}} .

(8)

Besides, the available class label vector of labeled data is composed of a total of M heterogeneous class spaces

Y = Y_{1} \times \dots \times Y_{M}

, i.e.,

y_{j, n} = {[y_{j, n, 1}, \dots, y_{j, n, M}]}^{T}

. In each m-th dimensional class space, the corresponding label vector

y_{j, n, m} \in {+ 1, 0}^{K_{m} \times 1}

is a

K_{m}

-dimensional binary vector, where the k-th element

y_{j, n, m, k} = 1

if the k-th label is valid, and

y_{j, n, m, k} = 0

otherwise.

In order to execute distributed information fusion, each node j aims to identify the global optimal classifier utilizing its local data and the information shared with its neighboring nodes

B_{j}

, which includes all one-hop neighbors of node j as well as node j itself.

3.2. One-vs.-One Decomposition Encoding

The heterogeneity of multidimensional spaces is a substantial barrier in multidimensional classification challenges, as the output values of various spaces are incomparable. To address this issue, we use a one-vs.-one decomposition strategy to divide the heterogeneous multidimensional space into numerous homogeneous class spaces [16].

To be specific, taking the m-th dimensional class vector as an example, if the n-th training data sample is labeled, we utilize a one-vs.-one decomposition strategy to translate the

K_{m}

-dimensional vectors

y_{j, n, m}

in the original class space into

K_{m}^{'} = (\begin{matrix} K_{m} \\ 2 \end{matrix})

-dimensional ternary label vectors

y_{j, n, m}^{'} \in {- 1, 0, + 1}^{K_{m}^{'} \times 1}

.

Likewise, if the n-th training data sample is unlabeled, we can obtain a

K_{m}^{'}

-dimensional transformed label vector with all elements being zero.

To explain the process of one-vs.-one decomposition, we present the corresponding transformation formula.

For the m-dimensional class space of the original label vector

y_{j, n}

, if the element in the r-th dimension denotes the actual label, then the value of the l-th dimension in the newly converted vector

y_{j, n}^{'}

is [16]

y_{j, n, m, l}^{'} = \{\begin{matrix} + 1, & if l \in Q_{j, n, m}, \\ - 1, & if l \in R_{j, n, m}, \\ 0 . & otherwise . \end{matrix}

(9)

Here, when

y_{j, n, m, k} = 1

and

1 \leq k < K_{m}

, the set

Q_{j, n, m} = {[(k - 1) K_{m} - \sum_{i = 1}^{k} (i - 1) + 1],

[(k - 1) K_{m} - \sum_{i = 1}^{k} (i - 1) + 2] \dots, [k K_{m} - \sum_{i = 1}^{k} i]}

denotes the collection of values of sequence index l for all elements where the transformed

y_{j, n, m, l}^{'} = 1

. Correspondingly, the set

R_{j, n, m} = {k - 1, [k - 1 + \sum_{i = 0}^{1} (K_{m} - 2 - i)], [k + \sum_{i = 0}^{2} (K_{m} - 2 - i)], \dots, [k + \sum_{i = 0}^{k - 3} (K_{m} - 2 - i)]}

denotes the collection of values of sequence index l for all elements where the transformed

y_{j, n, m, l}^{'} = - 1

.

For the sake of clarity, the process of label encoding is illustrated in Figure 2.

Example 1.

Given four data points, with their original labels in the m-th dimension being

y_{1, m} = {[1, 0, 0, 0]}^{T}

,

y_{2, m} = {[0, 1, 0, 0]}^{T}

,

y_{3, m} = {[0, 0, 1, 0]}^{T}

,

y_{4, m} = {[0, 0, 0, 1]}^{T}

, the transformed labels and the corresponding positive/negative index sets are in sequence via (9)

\begin{matrix} y_{1, m}^{'} = {[+ 1, + 1, + 1, 0, 0, 0]}^{T}, Q_{1, m} = {1, 2, 3}, R_{1, m} = \emptyset, \\ y_{2, m}^{'} = {[- 1, 0, 0, + 1, + 1, 0]}^{T}, Q_{2, m} = {4, 5}, R_{2, m} = {1}, \\ y_{3, m}^{'} = {[0, - 1, 0, - 1, 0, + 1]}^{T}, Q_{3, m} = {6}, R_{3, m} = {2, 4} \\ y_{4, m}^{'} = {[0, 0, - 1, 0, - 1, - 1]}^{T}, Q_{4, m} = \emptyset, R_{4, m} = {3, 5, 6} . \end{matrix}

Based on the decomposition results above, given that the discriminant function is non-linear, the output of the discriminant function

u_{j, n, m, k}

corresponding to the k-th reconstructed class of the m-th class space can be expressed as

u_{j, n, m, k} = w_{j, m, k}^{T} \cdot ψ (x_{j, n}) + b_{j, m, k}, k = 1, \dots, K_{m}^{'},

(10)

where

w_{j, m, k}

and

b_{j, m, k}

denote the weight variable and bias variable, respectively. Besides,

ψ (x_{j, n})

denotes the non-linear map.

3.3. Design of Global Objective Function

This paper focuses on Gaussian uncertain data classification. For such data samples, not only their mean but also their Gaussian distribution area should be considered valid information. Therefore, this information should be taken into account for model induction.

To make full use of the abundant information offered by these Gaussian pdfs, we propose to formulate the loss function of the n-th uncertain data sample as the expected value of the misclassification loss over the Gaussian pdfs

f_{x_{j, n}} (x)

[28]. Guided by this idea, we develop the global objective function as follows:

\begin{matrix} F_{x} & = \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L} {(y_{n, m, k}^{'} - z_{j, n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L + U} {(z_{n, m, k} - \sum_{l = 1}^{L + U} ω_{n, l} z_{l, m, k})}^{2} \\ + \frac{λ_{C}}{L + U} \sum_{n = 1}^{L + U} \int_{R} h (z_{n, m, k} - u_{n, m, k}) f_{x_{n}} (x) d x + \frac{λ_{D}}{2} | | w_{m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2} b_{m, k}^{2}] . \end{matrix}

(11)

The objective function comprises five separate terms.

The first term serves to obtain valid supervisory information from labeled data with parameter

λ_{A}

, providing support for the estimation of label confidence.

The second term captures the estimated error in labeling confidence over the complete training dataset, controlled by the parameter

λ_{B}

. Because the correct label contains no noise, it is natural to deduce that samples that are close in the feature space have comparable labels. By employing this concept, we may assess the labeling confidence of the candidate label by the minimization of the estimated error. The similarity measure between the n-th and l-th data samples is given by

ω_{n, l}

, and its value satisfies

\sum_{l} ω_{n, l} = 1

.

The third term is the loss function for the n-th uncertain data sample, associated with a weight parameter indicated as

λ_{C}

. In this paper, we implement an exponential hinge loss function

h (\cdot)

to expedite convergence, which quantifies the misclassification loss of the sample, represented as

h (z_{n, m, k} - u_{n, m, k}) = \exp (β max (0, z_{n, m, k} - u_{n, m, k})),

(12)

where

β

denotes a positive coefficient.

The last two terms function as regularization components concerning the model parameters

w_{m, k}

and

b_{m, k}

, are employed to control the model’s complexity. Here,

λ_{D}

and

λ_{E}

represent the respective weight parameters.

3.4. Reformulation and Decentralization of Global Objective Function

In contrast to the conventional classification framework, the objective function outlined above incorporates all uncertainty distributions through integration calculations, which can help guide the classification model. Despite the simplicity of the global objective function’s expression, there are also two challenges during the decentralized optimization process. First, conventional classification approaches implicitly realize nonlinear mapping through kernel methods. Since the explicit form of the nonlinear feature mapping

ψ (\cdot)

is unknown, deriving a closed-form solution for the integral calculation becomes quite challenging. Second, an observation of the global objective function reveals that maximizing the first term requires the aggregation of all training data samples at a single fusion center, which is impractical in a distributed network.

Addressing these two challenges involves three steps: developing an explicit feature map, simplifying the integral computation, and redefining the estimated error function for label confidence.

3.4.1. Construction of Explicit Feature Map

To tackle the first problem, this study designs an explicit mapping function grounded in dVQ. Through the use of this explicit mapping function to introduce reasonable approximations to the objective function, we are able to derive a closed-form solution for the integral calculation.

With reference to [29,30], we intend to use the dVQ method to obtain a series of reproduction vectors that characterize the global data distribution and employ the explicit mapping function to construct the discriminant function. To be specific, under the initial setting where each node j has gathered

L_{j} + U_{j}

local training data samples

{x_{j, n}, Θ_{j, n}}_{n = 1}^{L_{j} + U_{j}}

in total, we initially implement the dVQ method to derive V quantization partitions

{M_{j, l}}_{l = 1}^{V}

, where

M_{j, l} = {x_{j, n}, Θ_{j, n}}_{n = 1}^{N_{j, l}}

. It should be noted that the uncertainty of training data must be included in the development of the explicit mapping function. The mean vectors and covariance matrices inside

M_{j, l}

may be calculated as follows:

{\tilde{v}}_{j, l} = \frac{\sum_{n \in M_{j, l}} x_{j, n}}{N_{j, l}},

(13)

{\tilde{Π}}_{j, l} = \frac{1}{N_{j, l}} \sum_{n \in M_{j, l}} (Θ_{j, n} + (x_{j, n} - {\tilde{v}}_{j, l}) {(x_{j, n} - {\tilde{v}}_{j, l})}^{T}) .

(14)

The newly formulated covariance matrix

{\tilde{Π}}_{j, l}

effectively captures the degree of dispersion of training data within partitions

M_{j, l}

, thereby facilitating a more accurate decision boundary, particularly in scenarios where data distributions differ across various network regions. Referring to the method in Section 2.2, we derive the new explicit feature map as follows [29]:

\begin{matrix} ψ (x_{j, n}) = [ & p (x_{j, n} | {\tilde{v}}_{j, 1}, s_{1} {\tilde{Π}}_{j, 1}), p (x_{j, n} | {\tilde{v}}_{j, 1}, s_{2} {\tilde{Π}}_{j, 1}), \dots, p (x_{j, n} | {\tilde{v}}_{j, 1}, s_{q} {\tilde{Π}}_{j, 1}), \\ p (x_{j, n} | {\tilde{v}}_{j, 2}, s_{1} {\tilde{Π}}_{j, 2}), p (x_{j, n} | {\tilde{v}}_{j, 2}, s_{2} {\tilde{Π}}_{j, 2}), \dots, p (x_{j, n} | {\tilde{v}}_{j, 2}, s_{q} {\tilde{Π}}_{j, 2}), \\ \dots \\ p (x_{j, n} | {\tilde{v}}_{j, V}, s_{1} {\tilde{Π}}_{j, V}), p (x_{j, n} | {\tilde{v}}_{j, V}, s_{2} {\tilde{Π}}_{j, V}), \dots, p (x_{j, n} | {\tilde{v}}_{j, V}, s_{q} {\tilde{Π}}_{j, V}),] \end{matrix}

(15)

where the expression of pdf

p (x_{j, n} | {\tilde{v}}_{j, l}, s_{1} {\tilde{Π}}_{j, l})

p (x_{j, n} | {\tilde{v}}_{j, l}, s_{k} {\tilde{Π}}_{j, l}) = \frac{\exp (- \frac{1}{2} {(x_{j, n} - {\tilde{v}}_{j, l})}^{T} {(s_{k} {\tilde{Π}}_{j, l})}^{- 1} (x_{j, n} - {\tilde{v}}_{j, l}))}{{(2 π)}^{\frac{D}{2}} {| s_{k} {\tilde{Π}}_{j, l} |}^{\frac{1}{2}}} .

(16)

Note that for the sake of convenience, we use the notation

ψ_{j, n}

to represent

ψ (x_{j, n})

.

Algorithm 1 delineates the essential steps involved in the construction of an explicit non-linear feature map for clarity.

Algorithm 1 Explicit Non-linear Feature Map Construction

Require:: Input vector ${x_{j, n}, Θ_{j, n}}_{n = 1}^{L_{j} + U_{j}}$ , and set initial values of cluster centers ${v_{j, l, 0}}_{l = 1}^{V}$ .
1:: while ( $| | v_{j, l, τ} - v_{j, l, τ - 1} {| |}_{2} > ε$ for $l = 1, \dots, V$ )
2:: for $j \in J$ do
3:: Group the training data into the closest partition.
4:: Calculate partition centers $v_{j, l, τ}^{'}$ and the corresponding counts $N_{j, l, τ}^{'}$ via (1).
5:: Exchange ${v_{n, l, τ}^{'}, N_{j, l, τ}^{'}}_{l = 1}^{V}$ with its neighboring nodes.
6:: end for
7:: for $j \in J$ do
8:: Update the reproduction vectors ${v_{j, l, τ}^{'}}_{l = 1}^{V}$ via (2).
9:: end for
10:: $τ = τ + 1$
11:: Calculate the mean ${\tilde{v}}_{j, l}$ and covariance matrix ${\tilde{Π}}_{j, l}$ via (13) and (14).
12:: Construct the non-linear explicit feature map via (15).

3.4.2. Simplification of Integration Calculation

While a finite-dimensional explicit feature map can be obtained, if a closed-form solution for the integral cannot be derived directly, classical optimization methods cannot be applied to minimize the objective function. Considering this, we propose to simplify the integral computation by converting the integral problem over the Gaussian uncertainty distribution of

x_{j, n}

to an integral problem with respect to the distribution of

ψ_{j, n}

.

\begin{matrix} F_{ψ} & = \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L} {(y_{n, m, k}^{'} - z_{n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L + U} {(z_{n, m, k} - \sum_{l = 1}^{L + U} ω_{n, l} z_{l, m, k})}^{2} \\ + \frac{λ_{C}}{L + U} \sum_{n = 1}^{L + U} \int_{R} h (z_{n, m, k} - u_{n, m, k}) f_{ψ_{n}} (ψ) d ψ + \frac{λ_{D}}{2} | | w_{m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2} b_{m, k}^{2}], \end{matrix}

(17)

However, the nonlinear relationship between

ψ_{j, n}

and

x_{j, n}

makes directly transforming the integral problem challenging. To overcome the issue, we suggest to use simple distributions to approximate the Gaussian uncertainty distribution of the explicit feature map

f_{ψ_{j, n}} (ψ)

.

Specifically, we first determine the entries of the first-order moment of

f_{ψ_{j, n}} (ψ)

using the formula provided below, which acts as a mean vector,

\begin{matrix} δ_{j, n, l r} & = \int_{R} p (x | {\tilde{v}}_{j, l}, s_{r} {\tilde{Π}}_{j, l}) f_{x_{j, n}} (x) d x \\ = \exp [- \frac{1}{2} (2 D log 2 π - log | Ξ_{1} | - log | Ξ_{2} | + η_{1}^{T} Ξ_{1}^{- 1} η_{1} + η_{2}^{T} Ξ_{2}^{- 1} η_{2}) \\ - \frac{1}{2} (D log 2 π - \log | Ξ_{1} + Ξ_{2} | + {(η_{1} + η_{2})}^{T} {(Ξ_{1} + Ξ_{2})}^{- 1} (η_{1} + η_{2}))] . \end{matrix}

(18)

Additionally, we use the formula provided below to calculate the diagonal entries of the second-order moment of

f_{ψ_{j, n}} (ψ)

, and this matrix can act as an approximated covariance matrix

\begin{matrix} Λ_{j, n, l r} & = \int_{R} {(p (x | {\tilde{v}}_{j, l}, s_{r} {\tilde{Π}}_{j, l}) - δ_{j, n, l r})}^{2} f_{x_{j, n}} (x) d x \\ = \exp [- \frac{1}{2} (3 D \log 2 π - 2 \log | Ξ_{1} | - \log | Ξ_{2} | + 2 η_{1}^{T} Ξ_{1}^{- 1} η_{1} + η_{2}^{T} Ξ_{2}^{- 1} η_{2}) \\ - \frac{1}{2} (D \log 2 π - \log | 2 Ξ_{1} + Ξ_{2} | + {(2 η_{1} + η_{2})}^{T} {(2 Ξ_{1} + Ξ_{2})}^{- 1} (2 η_{1} + η_{2}))] - {(δ_{j, n, k l})}^{2}, \end{matrix}

(19)

where the variables

Ξ_{1} = {(s_{r} {\tilde{Π}}_{j, l})}^{- 1}, Ξ_{2} = Θ_{j, n}^{- 1}, η_{1} = {(s_{r} {\tilde{Π}}_{j, l})}^{- 1} {\tilde{v}}_{j, l}, η_{2} = Θ_{j, n}^{- 1} x_{j, n} .

Using the approximated mean vector and covariance matrix, we can employ the following Gaussian probability density function to approximate the uncertainty distribution of data samples in the feature space

{\tilde{f}}_{ψ_{j, n}} (ψ) = \frac{\exp (- \frac{1}{2} {(ψ - δ_{j, n})}^{T} Λ_{j, n}^{- 1} (ψ - δ_{j, n}))}{{(2 π)}^{\frac{V q}{2}} {| Λ_{j, n} |}^{\frac{1}{2}}}

(20)

where the mean vector

δ_{j, n} = [δ_{j, n, 11}, δ_{j, n, 12}, \dots, δ_{j, n, k l}, \dots, δ_{j, n, q V}]

and the covariance matrix

Λ_{j, n} = diag ([Λ_{j, n, 11}, Λ_{j, n, 12}, \dots, Λ_{j, n, k l}, \dots, Λ_{j, n, q V}])

.

Since there exists the linear correlation between

x

and

ψ

in a localized region, this novel Gaussian pdf is unlikely to result in significant information loss. Utilizing (20), we reformulate the integral problem with respect to the Gaussian uncertainty distribution of

x

as

\begin{matrix} {\tilde{F}}_{ψ} & = \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L} {(y_{n, m, k}^{'} - z_{n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L + U} {(z_{n, m, k} - \sum_{l = 1}^{L + U} ω_{n, l} z_{l, m, k})}^{2} \\ + \frac{λ_{C}}{L + U} \sum_{n = 1}^{L + U} \int_{R} h (z_{n, m, k} - u_{n, m, k}) {\tilde{f}}_{ψ_{n}} (ψ) d ψ + \frac{λ_{D}}{2} | | w_{m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2} b_{m, k}^{2}], \end{matrix}

(21)

3.4.3. Reconstruction of Estimated Error Function on Labeling Confidence

To develop a decentralized objective function without aggregating all training data into a fusion center, we employ reproduction vectors to represent the complete dataset and reconstruct labeling confidence. This is justified as follows: Because the reproduction vectors are created using dVQ, their distribution is similar to the global data distribution. Thus, it can be concluded that the labeling confidence generated from these vectors is close to that acquired from the total dataset. Based on this, we redesign the objective function as

\begin{matrix} {\tilde{\tilde{F}}}_{ψ} & = \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L} {(y_{n, m, k}^{'} - z_{n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L + U} {(z_{n, m, k} - \sum_{l = 1}^{V} ω_{n, l}^{v} z_{l, m, k}^{v})}^{2} \\ + \frac{λ_{C}}{L + U} \sum_{n = 1}^{L + U} \int_{R} h (z_{n, m, k} - u_{n, m, k}) {\tilde{f}}_{ψ_{n}} (ψ) d ψ + \frac{λ_{D}}{2} | | w_{m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2} b_{m, k}^{2}], \end{matrix}

(22)

where

ω_{n, l}^{v}

measures the degree of the similarity between

x_{n}

and

{\tilde{v}}_{l}

, which is calculated by

p (x | {\tilde{v}}_{l}, {\tilde{Π}}_{l})

. Furthermore,

z_{l, m}^{v}

stands for the confidence of the m-th label class for the l-th reproduction data.

3.4.4. Decentralization Implementation

Based on (22), we can achieve decentralization of global objective function by substituting global variables with local ones and enforcing consensus equality constraints [17], namely,

\begin{matrix} {\tilde{\tilde{F}}}_{ψ} & = \sum_{j = 1}^{J} \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L_{j}} {(y_{j, n, m, k}^{'} - z_{j, n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L_{j} + U_{j}} {(z_{j, n, m, k} - \sum_{l = 1}^{V} ω_{n, l}^{v} z_{l, m, k}^{v})}^{2} \\ + \frac{λ_{C}}{L + U} \sum_{n = 1}^{L_{j} + U_{j}} \int_{R} h (z_{j, n, m, k} - u_{j, n, m, k}) {\tilde{f}}_{ψ_{j, n}} (ψ) d ψ + \frac{λ_{D}}{2} | | w_{j, m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2} b_{j, m, k}^{2}] \\ = \sum_{j = 1}^{J} {\tilde{\tilde{F}}}_{ψ}^{j} \\ s . t . & w_{j, m, k} = w_{i, m, k}, b_{j, m, k} = b_{i, m, k}, j \in J, i \in B_{j} . \end{matrix}

(23)

3.5. Optimization

In this subsection, we would like to use an alternating minimization strategy to simultaneously optimize the variables

{w_{j, m, k}, b_{j, m, k}}

and

{z_{j, n, m, k}, z_{j, l, m, k}^{v}}

. To obtain the update equations, referring to Lemma 1 [28], we simplify the decentralized objective function into

\begin{matrix} {\tilde{\tilde{F}}}_{ψ} & = \sum_{j = 1}^{J} \sum_{m = 1}^{M} \sum_{k = 1}^{K_{m}^{'}} [\frac{λ_{A}}{2 L} \sum_{n = 1}^{L_{j}} {(y_{j, n, m, k}^{'} - z_{j, n, m, k})}^{2} + \frac{λ_{B}}{2 (L + U)} \sum_{n = 1}^{L_{j} + U_{j}} {(z_{j, n, m, k} - \sum_{l = 1}^{V} ω_{n, l}^{v} z_{l, m, k}^{v})}^{2} \\ + \frac{λ_{C}}{2 (L + U)} \sum_{n = 1}^{L_{j} + U_{j}} \exp (\frac{β^{2} d_{Λ_{j, n, m, k}}^{2}}{4} + β d_{δ_{j, n, m, k}}) (1 + \erf (\frac{d_{δ_{j, n, m, k}}}{d_{Λ_{j, n, m, k}}} + \frac{β d_{Λ_{j, n, m, k}}}{2})) \\ + \frac{λ_{D}}{2 J} | | w_{j, m, k} {| |}_{2}^{2} + \frac{λ_{E}}{2 J} b_{j, m, k}^{2}], \\ s . t . w_{j, m, k} = w_{i, m, k}, b_{j, m, k} = b_{i, m, k}, j \in J, i \in B_{j} . \end{matrix}

(24)

where the variables

d_{δ_{j, n, m, k}} = z_{j, n, m, k} - w_{j, m, k}^{T} \cdot δ_{j, n} - b_{j, m, k}

,

d_{Λ_{j, n, m, k}} = \sqrt{2 w_{j, m, k}^{T} Λ_{j, n} w_{j, m, k}}

, and the error function

\erf (x) = \frac{2}{\sqrt{π}} \int_{0}^{x} \exp (- t^{2}) d t

.

The detailed derivation process is discussed in Lemma 1.

Lemma 1.

Assume that

ψ \in R

is a random variable with a multivariate Gaussian distribution

N (δ, Λ)

, that is,

p (ψ) = \frac{\exp (- \frac{1}{2} {(ψ - δ)}^{T} Λ^{- 1} (ψ - δ))}{{(2 π)}^{\frac{D}{2}} {| Λ |}^{\frac{1}{2}}} .

(25)

Given a hyperplane

z - w^{T} \cdot ψ - b = 0

, the expectation of the exponential hinge loss

h (z - w^{T} \cdot ψ - b)

with respect to

ψ

can be calculated as follows:

\begin{matrix} E_{p (ψ)} (w, b) & ≜ \int_{R} \exp [β max (0, z - w^{T} \cdot ψ - b)] p (ψ) d ψ \\ = \frac{1}{2} \exp (\frac{β^{2} d_{Λ}^{2}}{4} + β d_{δ}) (1 + \erf (\frac{d_{δ}}{d_{Λ}} + \frac{β d_{Λ}}{2})), \end{matrix}

(26)

where the variables

d_{δ} = z - w^{T} \cdot δ - b

and

d_{Λ} = \sqrt{2 w^{T} Λ w}

.

Proof.

Referring to [28], we firstly perform eigenvalue decomposition on the matrix

Λ

and obtain

Λ = U^{T} D U

. Then, we define the new integral variable as

ψ^{'} = D^{- \frac{1}{2}} U (ψ - δ)

. By letting

w^{'} = D^{\frac{1}{2}} U w

, we have

\begin{matrix} E_{p (ψ)} (w, b) & = \int_{R} \exp [β max (0, z - w^{T} δ - b - {(w^{'})}^{T} ψ^{'})] \frac{\exp (- \frac{1}{2} {(ψ^{'})}^{T} ψ^{'})}{{(2 π)}^{\frac{D}{2}}} d ψ^{'} \\ = \int_{R_{1}} \exp [β (z - w^{T} δ - b - {(w^{'})}^{T} ψ^{'})] \frac{\exp (- \frac{1}{2} {(ψ^{'})}^{T} ψ^{'})}{{(2 π)}^{\frac{D}{2}}} d ψ^{'}, \end{matrix}

(27)

where

R_{1} = {ψ^{'} \in R : z - w^{T} δ - b - {(w^{'})}^{T} ψ^{'} \geq 0}

.

Then, we introduce a unit orthogonal matrix

B

, which satisfies

B w^{'} = | | w^{'} {| |}_{2} e_{k}

. Here, the k-th element of

e_{k}

is 1, and the rest are 0. We define another new integral variable

ψ^{″} = B ψ^{'}

, and then obtain

E_{p (ψ)} (w, b) = \int_{R_{2}} \exp [β (z - w^{T} δ - b - | | w^{'} {| |}_{2} e_{k}^{T} ψ^{″})] \frac{\exp (- \frac{1}{2} {(ψ^{″})}^{T} ψ^{″})}{{(2 π)}^{\frac{D}{2}}} d ψ^{″},

(28)

where

R_{2} = {ψ^{″} \in R : z - w^{T} δ - b - | | w^{'} {| |}_{2} e_{k}^{T} ψ^{″} \geq 0}

. In

R_{2}

, only the integral domain of the k-th element of

ψ^{″}

is restricted.

Therefore, defining the k-th element of

ψ^{″}

as

r_{k}

, we can obtain

\begin{matrix} E_{p (ψ)} (w, b) = \int_{- \infty}^{\frac{z - w^{T} δ - b}{| | w^{'} {| |}_{2}}} \exp [β (z - w^{T} δ - b - | | w^{'} {| |}_{2} r_{k})] \frac{\exp (- \frac{1}{2} {(r_{k})}^{2})}{{(2 π)}^{\frac{1}{2}}} d r_{k} \\ \times \prod_{j \neq k} \int_{- \infty}^{+ \infty} \frac{\exp (- \frac{1}{2} {(r_{j})}^{2})}{{(2 π)}^{\frac{1}{2}}} d r_{j} \\ = \frac{1}{2} \exp [\frac{β^{2} | | w^{'} {| |}_{2}^{2}}{2} + β (z - w^{T} δ - b)] \cdot [1 + \erf (\frac{z - w^{T} δ - b}{\sqrt{2} | | w^{'} {| |}_{2}} + \frac{β | | w^{'} {| |}_{2}}{\sqrt{2}})] \end{matrix}

(29)

For simplicity, we let

d_{δ} ≜ z - w^{T} δ - b

and

d_{Λ} ≜ \sqrt{2 w^{T} Λ w}

. Since

| | w^{'} {| |}_{2} = \sqrt{w^{T} Λ w}

, we can obtain

E_{p (ψ)} (w, b) = \frac{1}{2} \exp (\frac{β^{2} d_{Λ}^{2}}{4} + β d_{δ}) (1 + \erf (\frac{d_{δ}}{d_{Λ}} + \frac{β d_{Λ}}{2})) .

(30)

Lemma 1 is proven. □

3.5.1. Update of Labeling Confidence of Reproduction Data

Before the large loop indexed by t, we set another loop indexed by

ϵ

for the update of the labeling confidence of reproduction data. At the initial step

ϵ = 0

, we initialize the labeling confidence

z_{j, l, m, k}^{v}

at each node j

z_{j, l, m, k}^{v} (0) = \frac{1}{N_{j, l}} \sum_{n \in M_{j, l}} y_{j, n, m, k}^{'} .

(31)

At iteration

ϵ > 0

, we can update the labeling confidence of the reproduction vector by fusing the intermediate estimates among one-hop neighbors, i.e.,

z_{j, l, m, k}^{v} (ϵ + 1) = \sum_{i \in B_{j}} c_{j i} z_{i, l, m, k}^{v} (ϵ),

(32)

where

c_{j i}

denotes the cooperative coefficient, which satisfies the Metropolis rule [17].

After a total of T iterations, the global consensus estimation can be obtained, that is,

z_{l, m, k}^{v} = z_{1, l, m, k}^{v} (T) = \dots = z_{J, l, m, k}^{v} (T)

.

3.5.2. Update of Labeling Confidence of Training Data Samples

Since the above equation seems to be complicated, we intend to employ the gradient descent method to optimize the objective function.

Using the gradient descent method, we have

z_{j, n, m, k} (t + 1) = z_{j, n, m, k} (t) - ζ_{1} (t + 1) ▿_{z_{j, n, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{z_{j, n, m, k}},

(33)

where

ζ_{1} (t + 1)

denotes a time-varying step size. Besides,

▿_{z_{j, n, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{z_{j, n, m, k}}

denotes the partial derivation of

{\tilde{F}}_{ψ}

with respect to

z_{j, n, m, k}

.

3.5.3. Update of Model Parameter

Similarly, we can utilize the gradient descent method and diffusion cooperative strategy [2] to seek the global optimal solution of objective function.

We can obtain the optimal values of

w_{j, m, k}

and

b_{j, m, k}

by executing the following update equations until convergence:

{\overset{˘}{w}}_{j, m, k} (t + 1) = w_{j, m, k} (t) - ζ_{2} (t + 1) ▿_{w_{j, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{w_{j, m, k}},

(34)

w_{j, m, k} (t + 1) = \sum_{i \in B_{j}} c_{j i} {\overset{˘}{w}}_{i, m, k} (t + 1),

(35)

and

{\overset{˘}{b}}_{j, m, k} (t + 1) = b_{j, m, k} (t) - ζ_{2} (t + 1) ▿_{b_{j, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{b_{j, m, k}},

(36)

b_{j, m, k} (t + 1) = \sum_{i \in B_{j}} c_{j i} {\overset{˘}{b}}_{i, m, k} (t + 1),

(37)

where

ζ_{2} (t + 1)

denotes the time-varying step size. Additionally,

▿_{w_{j, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{w_{j, m, k}}

and

▿_{b_{j, m, k}} {\tilde{\tilde{F}}}_{ψ} |_{b_{j, m, k}}

denote the corresponding gradients.

3.6. One-vs.-One Decomposition Decoding

Utilizing the induced MDC classifier, for an unseen data point

x^{*}

, we can derive the predicted labels of

x^{*}

in the transformed class space

y^{*^{'}} \in {+ 1, - 1}^{\sum_{m = 1}^{M} K_{m}^{'}}

. Subsequently, we employ a one-vs.-one decoding technique to derive the labels in the original class space. We enumerate the elements in vector

y^{*^{'}}

whose sequence indexes are included in sets

Q_{j, n, m}

and

R_{j, n, m}

, represented as

q_{m, k}

and

r_{m, k}

, respectively. The k-th predicted labels of unseen data

x^{*}

in the m-dimensional original class space may be derived using the majority voting approach.

y_{m, k}^{*} = \{\begin{matrix} + 1, & i f k = arg max_{1 \leq l \leq K_{m}} q_{m, k} + r_{m, k} \\ 0, & otherwise, \end{matrix} m = 1, \dots, M .

(38)

To enhance clarity, the label decoding process is depicted in Figure 3.

Example 2.

Given two unseen data points, with their predicted labels in the m-th dimension being

y_{1, m}^{*^{'}} = [+ 1, + 1, + 1, - 1, - 1, - 1]

,

y_{2, m}^{*^{'}} = {[+ 1, + 1, - 1, + 1, - 1, - 1]}^{T}

, the vote count w.r.t. four possible classes are in sequence

\begin{matrix} q_{1, m, 1} + r_{1, m, 1} = 3, q_{1, m, 2} + r_{1, m, 2} = 0, q_{1, m, 3} + r_{1, m, 3} = 1, q_{1, m, 4} + r_{1, m, 4} = 2, \\ q_{2, m, 1} + r_{1, m, 1} = 2, q_{2, m, 2} + r_{1, m, 2} = 1, q_{2, m, 3} + r_{1, m, 3} = 0, q_{2, m, 4} + r_{1, m, 4} = 3 . \end{matrix}

Based on the voting results, the final decoded label can be obtained in accordance with (38) as follows:

y_{1, m}^{*} = {[+ 1, 0, 0, 0]}^{T}, y_{2, m}^{*} = {[0, 0, 0, + 1]}^{T} .

For clarity, the flowchart of the proposed algorithm is presented in Figure 4, and the main steps of the dSMUDC algorithm are summarized in Algorithm 2.

Algorithm 2 dSMUDC algorithm

Require:: Collect uncertainty dataset ${x_{j, n}, Θ_{j, n}}_{n = 1}^{L_{j} + U_{j}}$ , and initialize $w_{j, m, k} (0) = 0_{V q}$ and $b_{j, m, k} (0) = 0$ for each node j.
1:: Transform the original class labels via (9).
2:: Construct the explicit mapping function ${ψ_{j, n}}$ via Algorithm 1.
3:: Calculate the mean vector $δ_{j, n}$ and the covariance matrix $Λ_{j, n}$ via (18) and (19).
4:: for $ϵ = 0, \dots, T - 1$ do
5:: for $j \in J$ do
6:: Exchange ${z_{j, l, m, k}^{v} (ϵ)}_{l = 1}^{V}$ with its neighboring nodes.
7:: end for
8:: for $j \in J$ do
9:: Update the labeling confidences ${z_{j, l, m, k}^{v} (ϵ + 1)}_{l = 1}^{V}$ via (32).
10:: end for
11:: end for
12:: for $t = 0, 1, \dots$ do
13:: for $j \in J$ do
14:: Update the intermediate labeling confidences ${z_{j, n, m, k} (t + 1)}_{n = 1}^{L_{j} + U_{j}}$ via (33).
15:: Compute ${\overset{˘}{w}}_{j, m, k} (t + 1)$ and ${\overset{˘}{b}}_{j, m, k} (t + 1)$ via (34) and (36).
16:: Exchange ${\overset{˘}{w}}_{j, m, k} (t + 1)$ and ${\overset{˘}{b}}_{j, m, k} (t + 1)$ to neighboring nodes.
17:: end for
18:: for $j \in J$ do
19:: Compute $w_{j, m, k} (t + 1)$ and $b_{j, m, k} (t + 1)$ via (35) and (37).
20:: end for
21:: end for
22:: Obtain the predictive labels $y^{*^{'}}$ in the transformed class space.
23:: Return $y^{*}$ via decoding produce (38).

3.7. Performance Analysis

In this subsection, we evaluate the convergence and complexity of the proposed algorithm.

To conduct the subsequent convergence investigation, several common assumptions regarding DL methods must be given at first.

Assumption 1.

Considering a connected network

G

, the cooperative coefficient matrix C, whose elements

C_{j i} = c_{j i}

are determined by the Metropolis rule [31], satisfies the following two conditions:

(1)

C 1_{J} = 1_{J}

,

1_{J}^{T} C = 1_{J}^{T}

, (2) spectral norm

ρ (C - \frac{1}{J} 1_{J} 1_{J}^{T}) \leq 1

.

Theorem 1.

If the above assumption holds, then we have

{lim}_{t \to \infty} | z_{j, n, m, k} (t) - z_{j, n, m, k}^{*} | = 0

,

{lim}_{t \to \infty} | | w_{j, m, k} (t) - w_{j, m, k}^{*} | | = 0

and

{lim}_{t \to \infty} | b_{j, m, k} (t) - b_{j, m, k}^{*} | = 0

, when t tends to ∞, where

z_{j, n, m, k}^{*}

,

w_{j, m, k}^{*}

, and

b_{j, m, k}^{*}

denote their corresponding optimal values.

Theorem 1 can be demonstrated in the literature [17], hence we cannot provide the detailed proof here. Theorem 1 shows that with a connected network, all nodes can obtain the global optimal classifier via sufficient iterations, demonstrating the theoretical efficiency of DL.

Additionally, to evaluate the complexity of the dSMUDC method, we calculated two key metrics: the volume of computations required at each node in each iteration, and the number of variables that need to be exchanged between each node and its neighboring nodes. Table 1 summarizes the number of addition operations (AO) and multiplication operations (MO) required for the proposed method, with detailed breakdowns of these two computational metrics per iteration and per node across four key steps of the algorithm: the derivation of reproduction vectors, the construction of explicit feature maps, the simplification of integral calculations, and the induction of the classifier.

From Table 1, we can notice that the computation complexity of the proposed algorithm depends on the number of reproduction vectors V, the value of the scaling coefficient q, and the number of one-hop neighbors

| B_{j} |

, except for the characteristics of the dataset. Given that the quantity of one-hop neighbors

| B_{j} |

is moderate in practical networks, and provided that the values of V and q are acceptable, the computational complexity of the proposed dSMUDC algorithm can be controlled within an appropriate range.

Besides, during the process of obtaining reproduction vectors, a total of

V D

scalars need to be exchanged among neighboring nodes at each iteration

τ

. In the classifier induction process, at each iteration t, each node j must communicate

\sum_{m = 1}^{M} K_{m}^{'} (M V q + M)

scalars to

| B_{j} |

neighboring nodes. In general, provided that the values of V and q are acceptable, the communication cost of the proposed dSMUDC algorithm is moderate.

4. Experiments

In this section, we conduct multi-faceted validation of the performance of the proposed algorithm based on several existing multi-dimensional datasets. Detailed information about the datasets is provided in Table 2. Note that in Table 2, the notation ♯ denotes the set cardinality. Besides, in the column of the number of class labels for each dimension, if the number of class labels is the same across all dimensions, only one number is retained. Otherwise, the number of class labels for each dimension is listed in sequence.

To facilitate the reproduction of the following experiments, several necessary explanations are provided. This experiment involves a randomly generated distributed network with 10 nodes and 23 links. The training data samples are randomly assigned to the 10 nodes. In the subsequent trials, we perform 50 separate Monte Carlo cross-validation simulations. In each Monte Carlo simulation, the training data is randomly divided into 10 folds, with 9 folds designated as training data and the remaining fold utilized for testing data.

In order to simulate this kind of uncertain data uncertainty, this work add Gaussian white noise with a mean of zero and a standard deviation of

0.25 e A_{d}

to the data, which serves as the uncertainty distribution for this attribute. Here,

A_{d}

denotes the value range of feature

x_{d}

across the entire dataset.

Besides, to test the performance variation of the proposed algorithm under different proportions of labeled uncertain data, we define a measurement metric called Labeled uncertain Data to total Ratio (LADR), which describes the ratio of the labeled uncertain data in the whole dataset.

To evaluate the algorithm’s performance from multiple perspectives, we utilize commonly used performance metrics in multi-dimensional classification algorithms, including Hamming loss, exact match, and semi-exact match. Due to page limitations, please refer to the literature [11,13,14,17] for the specific definitions of the aforementioned metrics.

Firstly, to examine the sensitivity of the proposed algorithm, we investigate the influence of parameters

λ_{A}

,

λ_{B}

,

λ_{C}

,

λ_{D}

and

λ_{E}

on the hamming loss of the dSMUDC algorithm on the “Flare” dataset [32] in Figure 5. In the simulation experiments, we adjusted the value of one parameter while keeping all other parameters unchanged. This design ensures that any observed changes in Hamming loss can be attributed solely to the variation of the target parameter, eliminating potential interference from cross-parameter interactions. The simulation results in Figure 5 reveal a consistent trend in Hamming loss across all five parameters. As the value of a parameter increases from a low initial level, the Hamming loss first decreases gradually. Once the parameter enters a suitable range, the Hamming loss stabilizes and remains unchanged even as the parameter value varies slightly within this interval, indicating that the algorithm’s classification accuracy is not sensitive to minor adjustments of these parameters when they are properly configured. However, when the parameter value continues to rise beyond this suitable range, the Hamming loss begins to increase noticeably. The simulation results in Figure 5 indicate that when parameters are configured within suitable ranges, the proposed algorithm’s classification accuracy is not affected by the values of the parameters. Therefore, we can determine the appropriate designation of the weighted parameters as

λ_{A} \in [0.5, 5], λ_{B} \in [0.5, 5], λ_{C} \in [0.5, 5], λ_{D} \in [5 \times 10^{- 3}, 0.05], λ_{E} \in [5 \times 10^{- 3}, 0.05]

.

In addition, we also simulate the influence of the dimension of the explicit feature map (i.e., the product of the number of reproduction vectors V and the number of scaling coefficients q) on the classification accuracy of the proposed algorithm in Figure 6. The simulation results indicate that provided that the values of V and q are larger than 30 and 4, respectively, good learning performance can be obtained. Considering that increasing the dimension may add to the burden on the computational system, V and q are appropriately adjusted to be no greater than 30 and 4, respectively.

Furthermore, to evaluate the proposed algorithm’s performance under different degrees of data uncertainty, we tested the variation of Hamming loss with different e values on the Flare, Cal500, Jura, and Music datasets [11,32] in Figure 7. The experimental results show that the Hamming loss of our algorithm increases slightly as e rises. Specifically, when e increases from 0.03 to 0.15, the Hamming loss rises by less than 0.01. This indicates that increased data uncertainty has a certain negative impact on the algorithm’s performance, but this impact is limited. When e exceeds 0.15, the Hamming loss exhibits a more noticeable increase compared to the 0.03–0.15 range, yet the magnitude of this growth remains controllable and does not lead to sharp performance degradation. Collectively, these results confirm that as long as data uncertainty is constrained within a reasonable range, the performance damage caused by uncertainty is limited, and the algorithm maintains stable and reliable prediction capabilities, verifying its robustness against moderate to moderately high levels of data uncertainty.

Additionally, to further highlight the advantage of our proposed algorithm, we testify to its classification accuracy on the 7 datasets, including the Flare, Cal500, Jura, Music, Song, WQ, and Belae datasets [11,12,32,33,34]. Besides, three evaluation metrics (hamming loss, exact match and semi-exact match) of the other comparison algorithms, including dS²PMDL [17], dS²MLL [2], PLEM [15], DLEM [14] and KARM [11], are also investigated. All the simulation results are presented in Table 3, Table 4 and Table 5. Furthermore, we also tested the performance of the centralized version of the SMADC algorithm (called cSMUDC), with its results serving as a baseline reference.

The experimental results in Table 3 illustrate the Hamming loss of various algorithms under an LADR value of 0.5 across seven MDC datasets. Among all compared algorithms, cSMUDC achieves the lowest Hamming loss on most datasets, including Cal500, Music, Song, WQ, and Belae. Besides, the dSMUDC algorithm performs competitively, ranking second on most datasets and achieving the minimum Hamming loss on the Flare dataset. Furthermore, the dS²MLL and KARM algorithms consistently exhibit higher Hamming loss values, particularly on the Jura and Song datasets.

The experimental results in Table 4 present the exact match scores of various algorithms under an LADR value of 0.5 across seven MDC datasets. By observing Table 4, we can notice that the cSMUDC and dSMUDC algorithms demonstrate superior performance on most datasets. The cSMUDC algorithm achieves the highest exact match scores on Flare, Cal500, Jura, Song, WQ, and Belae. Besides, dSMUDC exhibits comparable performance, securing the second-highest exact match scores across the majority of datasets. Algorithms including dS²MLL and KARM consistently show the lowest exact match scores. This is particularly evident on Cal500, WQ, and Belae.

Table 5 presents the semi-exact match performance of various algorithms for MDC across seven datasets under a LADR value of 0.5. From the simulation results, we can find that cSMUDC outperforms others on five datasets: Flare, Cal500, Jura, Music, and Belae. Besides, dSMUDC remains highly competitive, achieving the optimal semi-exact match score on the Song dataset and ranking second across most other datasets. In contrast, dS²MLL and KARM consistently demonstrate the lowest semi-exact match values. This is most pronounced on WQ, Belae, and Flare.

In summary, the experimental results presented in Table 3, Table 4 and Table 5 are similar, leading to the following key conclusions:

(1) The dS²MLL algorithm and KARM algorithm perform relatively poorly among all methods. A possible reason is that the dS²MLL algorithm, as a distributed multi-label learning algorithm, cannot effectively handle label correlations in heterogeneous multi-dimensional class spaces. The KARM algorithm, on the other hand, uses a classifier trained via the KNN method to perform initial data classification and incorporates the classification results into feature vectors as augmented information. Although the KARM algorithm can address the training issue of multi-dimensional classifiers to a certain extent, its performance heavily relies on the selection of the number of neighboring data, resulting in weak generalizability.

(2) The DLEM and PLEM algorithms outperform the aforementioned KARM and dS²MLL algorithms since they effectively explore label correlations in heterogeneous multi-dimensional class spaces. Despite this advantage, both the DLEM and PLEM algorithms are constrained by their inability to effectively handle unlabeled data and uncertain data. Consequently, their performance is inferior to that of our proposed method.

(3) The dS²PMDL algorithm performs well because it can simultaneously leverage information from both labeled and unlabeled data and exploit the label correlations of multi-dimensional classes by learning subspace. However, in this experiment, the training data has a certain level of uncertainty, and the dS²PMDL algorithm lacks the ability to characterize this uncertainty. Therefore, its performance is weaker than that of our proposed algorithm.

(4) Our proposed dSMUDC outperforms all comparative algorithms, signifying its efficacy in label recovery, data uncertainty exploitation, and classifier induction.

To further demonstrate the performance differences between these comparison algorithms, we use the Friedman test [35]. At a significance level of 0.05, the critical value is equivalent to 2.36. At this point, we can calculate the value of the Friedman statistic in terms of three metrics, please see Table 6. The Friedman test statistics are significantly greater than the critical value. Therefore, we reject the null hypothesis that there is no significant performance difference among these comparison algorithms.

Furthermore, we use the Bonferroni-Dunn test [35] to testify whether there is a significant performance difference between a pair of comparison algorithms. Here, the cSMUDC is assumed to be the controlled algorithm. For the Bonferroni-Dunn test, at a significance level of 0.05, a schematic diagram is depicted in Figure 8. In this figure, for all algorithms whose average ranking difference with the controlled algorithm is less than one critical distance, we can justify that there is no significant performance difference between it and the controlled algorithm. Therefore, we connect it to the controlled algorithm with a red line. By observing Figure 8, we can see that the average performance ranking of the dSMUDC algorithm has significant performance improvements compared to the dS²MLL and KARM comparison algorithms.

Finally, we test the performance of each comparative algorithm under different proportions of labeled uncertain data, and the experimental results are presented in the Figure 9, Figure 10 and Figure 11. Figure 9 compares the Hamming loss of multiple algorithms across four datasets (Flare, Cal500, Jura and Music) as LADR varies, revealing clear performance gaps between algorithms. Across all four datasets, dSMUDC and cSMUDC achieve good learning performance with consistently lower Hamming loss than other methods. On the Jura dataset, dSMUDC/cSMUDC algorithms have 0.01–0.03 lower Hamming loss than dS²PMDC/PLEM/DLEM algorithms, and 0.02–0.05 lower than dS²MLL/KARM algorithms. On Flare, Cal500 and Music datasets, they lead dS²PMDC/PLEM/DLEM algorithms by 0.005–0.01 and dS²MLL/KARM algorithms by 0.01–0.02.

Figure 10 evaluates the exact match performance of various algorithms (including dSMUDC, cSMUDC, dS²;PMDL, etc.) across four datasets as LADR ranges from 0.1 to 0.9. It can be seen that the dSMUDC and cSMUDC perform the best across all scenarios, and their exact match scores are consistently the highest, regardless of dataset or LADR value. Specifically, on Flare and Jura, they outperform dS²PMDC/PLEM/DLEM methods by roughly 0.01–0.03, and dS²MLL/KARM by 0.02–0.04. Even on Cal500 and Music, this pattern holds. The dSMUDC/cSMUDC algorithms maintain a 0.005–0.01 lead over dS²PMDC/PLEM/DLEM algorithms, and a 0.01–0.03 advantage over dS²MLL/KARM.

Figure 11 presents the semi-exact match performance of algorithms across four datasets as LADR varies (0.1–0.9), revealing distinct performance differences. We observe that cSMUDC and dSMUDC are the top performers overall: their scores remain consistently highest in most cases. On Jura and Music, they outperform mid-tier methods (e.g., PLEM, DLEM) by 0.01–0.02, and dS²MLL/KARM by 0.02–0.05. This pattern persists on the Flare and Cal500 datasets as well.

Across Figure 9, Figure 10 and Figure 11 (covering Hamming loss, exact match, and semi-exact match metrics), the following key conclusions emerge. The performance of all algorithms gradually improves as the LADR value increases. This indicates that the incorporation of more supervised information is beneficial to the training of classification models. Furthermore, we can also observe that the performance of our proposed algorithm is close to that of its corresponding centralized counterpart in most scenarios, and both are significantly superior to the other comparative algorithms. This superiority is particularly prominent when the amount of labeled data is relatively small.

5. Conclusions

In this paper, we have addressed the problem of distributed classification for partially labeled uncertain data and developed the dSMUDC algorithm. Within this proposed algorithm, we have employed the integral of the hinge loss of a sample over its uncertainty distribution as the misclassification loss for training samples, enabling effective utilization of the uncertainty of data distribution. Additionally, we have designed a mechanism for estimating labeling confidence to mitigate the negative impact on classification performance. Experimental results across multiple datasets have confirmed that the proposed algorithm can effectively tackle the challenge of classifying uncertain data in distributed scenarios.

The potential limitation of our algorithm mainly lies in two aspects. First, it assumes that the data across all nodes follows the same distribution, which may not align with real-world scenarios where data heterogeneity is common. Second, it presupposes that the uncertainty in the data originates from noise conforming to a Gaussian distribution. However, in practical applications, uncertainty can stem from diverse sources (e.g., measurement errors with non-Gaussian characteristics or incomplete data sampling) that do not follow Gaussian distribution. When facing such non-Gaussian uncertainty, the performance of the algorithm may be compromised.

Therefore, we aim to address these two issues in future research. On one hand, we intend to design a distributed personalized uncertain data classification algorithm to handle heterogeneously distributed uncertain data over a network, breaking the constraint of the identical data distribution assumption. On the other hand, we also aim to develop non-parametric methods that avoid strict distributional assumptions, which will help enhance the algorithm’s adaptability to complex real-world uncertainty scenarios.

Author Contributions

Conceptualization, Z.X.; Methodology, Z.X.; Software, S.C.; Validation, S.C.; Formal analysis, Z.X.; Investigation, S.C.; Resources, S.C.; Data curation, S.C.; Writing—original draft, S.C.; Writing—review & editing, S.C.; Visualization, S.C.; Supervision, Z.X.; Project administration, S.C.; Funding acquisition, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the National Natural Science Foundation of China (Grant No. 62201398).

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

Author Sicong Chen was employed by the company Kasco Signal Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shen, X.; Liu, Y. Privacy-preserving distributed estimation over multitask networks. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 1953–1965. [Google Scholar] [CrossRef]
Xu, Z.; Liu, Y.; Li, C. Distributed information theoretic semi-supervised learning for multi-label classification. IEEE Trans. Cybern. 2022, 52, 821–835. [Google Scholar] [CrossRef] [PubMed]
Lao, X.; Du, W.; Li, C. reproduction Distributed Estimation With Adaptive Combiner. IEEE Trans. Signal Inf. Process. Netw. 2022, 8, 187–200. [Google Scholar]
Verbraeken, J.; Wolting, M.; Katzy, J.; Kloppenburg, J.; Verbelen, T.; Rellermeyer, J.S. A survey on distributed machine learning. ACM Comput. Surv. 2020, 53, 1–33. [Google Scholar] [CrossRef]
Liu, M.; Yang, K.; Zhao, N.; Chen, Y.; Song, H.; Gong, F. Intelligent signal classification in industrial distributed wireless sensor networks based industrial internet of things. IEEE Trans. Ind. Inf. 2020, 17, 4946–4956. [Google Scholar] [CrossRef]
Aach, M.; Inanc, E.; Sarma, R.; Riedel, M.; Lintermann, A. Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks. J. Big Data 2023, 10, 96. [Google Scholar] [CrossRef]
Naseh, D.; Abdollahpour, M.; Tarchi, D. Real-World Implementation and Performance Analysis of Distributed Learning Frameworks for 6G IoT Applications. Information 2024, 15, 190. [Google Scholar] [CrossRef]
Le, M.; Huynh-The, T.; Do-Duy, T.; Vu, H.; Hwang, W.-J.; Pham, Q.-V. Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2025, 27, 1053–1100. [Google Scholar] [CrossRef]
Naseh, D.; Bozorgchenani, A.; Shinde, S.S.; Tarchi, D. Unified Distributed Machine Learning for 6G Intelligent Transportation Systems: A Hierarchical Approach for Terrestrial and Non-Terrestrial Networks. Network 2025, 5, 41. [Google Scholar] [CrossRef]
Read, J.; Bielza, C.; Larranaga, P. Multi-dimensional classification with super-classes. IEEE Trans. Knowl. Data Eng. 2014, 26, 1720–1733. [Google Scholar] [CrossRef]
Jia, B.-B.; Zhang, M.-L. Multi-dimensional classification via kNN feature augmentation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3975–3982. [Google Scholar]
Jia, B.-B.; Zhang, M.-L. Multi-dimensional classification via stacked dependency exploitation. Sci. China Inf. Sci. 2020, 63, 222102. [Google Scholar] [CrossRef]
Jia, B.-B.; Zhang, M.-L. Decomposition-based classifier chains for multi-dimensional classification. IEEE Trans. Artif. Intell. 2022, 3, 176–191. [Google Scholar] [CrossRef]
Jia, B.-B.; Zhang, M.-L. Multi-dimensional classification via decomposed label encoding. IEEE Trans. Knowl. Data Eng. 2023, 35, 1844–1856. [Google Scholar] [CrossRef]
Tang, J.; Chen, W.; Wang, K.; Zhang, Y.; Liang, D. Probability-based label enhancement fro multi-dimensional classification. Inf. Sci. 2024, 653, 119790. [Google Scholar] [CrossRef]
Xu, Z.; Chen, S. Distributed partial label multi-dimensional classification via label space decomposition. Electronics 2025, 14, 2623. [Google Scholar] [CrossRef]
Xu, Z.; Chen, W. Distributed semi-supervised partial multi-dimensional learning via subspace learning. Complex Intell. Syst. 2025, 11, 318. [Google Scholar] [CrossRef]
Tsang, S.; Kao, B.; Yip, K.Y.; Ho, W.S.; Lee, S.D. Decision trees for uncertain data. IEEE Trans. Knowl. Data Eng. 2011, 23, 64–78. [Google Scholar] [CrossRef]
Ozdemir, S.; Xiao, Y. Secure data aggregation in wireless sensor networks: A comprehensive overview. Comput. Netw. 2009, 53, 2022–2037. [Google Scholar] [CrossRef]
Srisooksai, T.; Keamarungsi, K.; Lamsrichan, P.; Araki, K. Practical data compression in wireless sensor networks: A survey. J. Netw. Comput. Appl. 2012, 35, 37–59. [Google Scholar] [CrossRef]
Aleroud, A.; Yang, F.; Pallaprolu, S.C.; Chen, Z.; Karabatis, G. Anonymization of network traces data through condensation based differential privacy. Digit. Threat. Res. Pract. 2021, 2, 1–23. [Google Scholar] [CrossRef]
Zhang, W.; Stella, X.Y.; Teng, S.H. Power SVM: Generalization with exemplar classification uncertainty. In Proceedings of the 2012 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 16–21 June 2012; pp. 2144–2151. [Google Scholar]
Liu, B.; Xiao, Y.; Cao, L.; Deng, F. Svdd-based outlier detection on uncertain data. Knowl. Inf. Syst. 2013, 34, 597–618. [Google Scholar] [CrossRef]
Dredze, M.; Crammer, K.; Pereira, F. Confidence-weighted linear classification. In Proceeding of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 June 2008; pp. 264–271. [Google Scholar]
Liu, B.; Xiao, Y.; Philip, S.Y.; Cao, L.; Zhang, Y.; Hao, Z. Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans. Knowl. Data Eng. 2012, 26, 468–484. [Google Scholar] [CrossRef]
Yan, X.; Luo, Q.; Sun, J.; Luo, Z.; Chen, Y. Online dynamic working-state recognition through uncertain data classification. Inf. Sci. 2023, 555, 1–16. [Google Scholar] [CrossRef]
Jiang, B.; Pei, J. Outlier detection on uncertain data: Objects, instances, and inferences. In Proceedings of the 27th IEEE International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; pp. 422–433. [Google Scholar]
Tzelepis, C.; Mezaris, V.; Patras, I. Linear maximum margin classifier for learning from uncertain data. IEEE Trans. Patt. Anal. Mach. Intell. 2018, 40, 2948–2962. [Google Scholar] [CrossRef] [PubMed]
Pang, J.; Pu, X.; Li, C. A hybrid algorithm incorporating vector quantization and one-class support vector machine for industrial anomaly detection. IEEE Trans. Ind. Inf. 2022, 18, 8786–8796. [Google Scholar] [CrossRef]
Li, C.; Luo, Y. Distributed vector quantization over sensor network. Int. J. Dist. Sensor Netw. 2014, 10, 189619. [Google Scholar] [CrossRef]
Wang, S.; Li, C. Distributed stochastic algorithm for global optimization in networked system. J. Optim. Theory Appl. 2018, 179, 1001–1007. [Google Scholar] [CrossRef]
Jia, B.-B.; Zhang, M.-L. Maximum margin multi-dimensional classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 7185–7198. [Google Scholar] [CrossRef]
Aggarwal, C.C.; Yu, P.S. A condensation approach to privacy preserving data mining. In Proceedings of the 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, 14–18 March 2004; pp. 183–199. [Google Scholar]
Bielza, C.; Li, G.; Larranaga, P. Multi-dimensional classification with Bayesian networks. Int. J. Approx. Reason. 2011, 52, 705–727. [Google Scholar] [CrossRef]
Demsar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]

Figure 1. This diagram illustrates the uncertain data. Here, each data sample follows a Gaussian distribution. The dot represents the mean of the Gaussian distribution, while the translucent area surrounding the dot denotes the main data region covered by this Gaussian distribution.

Figure 2. Schematic diagram of label encoding.

Figure 3. Schematic diagram of label decoding.

Figure 4. The flowchart of the dSMUDC algorithm.

Figure 5. Hamming loss of dSMUDC algorithm under varying values of parameters

λ_{A}

,

λ_{B}

,

λ_{C}

,

λ_{D}

, and

λ_{E}

on “Flare” dataset.

Figure 5. Hamming loss of dSMUDC algorithm under varying values of parameters

λ_{A}

,

λ_{B}

,

λ_{C}

,

λ_{D}

, and

λ_{E}

on “Flare” dataset.

Figure 6. Hamming loss of dSMUDC algorithm under varying numbers of reproduction vectors V and scaling coefficients q on “Flare” dataset.

Figure 7. Hamming loss of dSMUDC algorithm under varying parameter e on “Flare”, “Cal500”, “Jura” and “Music” datasets.

Figure 8. Comparison of cSMUDC (the control algorithms) in contrast to other comparing algorithms using the Bonferroni-Dunn test.

Figure 9. Hamming loss of different comparison algorithms versus the LADR on “Flare”, “Cal500”, “Jura” and “Music” datasets.

Figure 10. Exact match of different comparison algorithms versus the LADR on “Flare”, “Cal500”, “Jura” and “Music” datasets.

Figure 11. Semi-exact match of different comparison algorithms versus the LADR on “Flare”, “Cal500”, “Jura” and “Music” datasets.

Table 1. The MOs and AOs of the dSMUDC algorithm per iteration per node j.

${{\tilde{v}}_{j, l}}_{l}$	MO	$V D (\| B_{j} \| + 3 \|)$
${{\tilde{v}}_{j, l}}_{l}$	AO	$V D (\| M_{j, l} \| + \| B_{j} \|)$
${ψ_{j, n}}_{n}$	MO	$(L_{j} + U_{j}) V (D + 2 D^{2} + q (D^{4} + 2 D^{3} + 2 D^{2}))$
${ψ_{j, n}}_{n}$	AO	$(L_{j} + U_{j}) V (D^{3} + \| M_{j, l} \| D^{2} + \| M_{j, l} \| D + q (D^{4} + 2 D^{3} + 2 D^{2}))$
${f_{ψ_{j, n}} (ψ)}$	MO	$(L_{j} + U_{j}) V q (4 D^{4} + 10 D^{3} + 8 D^{2})$
${f_{ψ_{j, n}} (ψ)}$	AO	$(L_{j} + U_{j}) V q (4 D^{4} + 10 D^{3} + 8 D^{2})$
${z_{j, l, m, k}^{v}}$	MO	$\sum_{m = 1}^{M} K_{m}^{'} T V (1 + \| B_{j} \|)$
${z_{j, l, m, k}^{v}}$	AO	$\sum_{m = 1}^{M} K_{m}^{'} T V (\| M_{j, l} \| + \| B_{j} \|)$
${z_{j, n, m, k}}$	MO	$\sum_{m = 1}^{M} K_{m}^{'} (L_{j} + U_{j}) (2 V^{2} q^{2} + 2 V q + V + 8) + \sum_{m = 1}^{M} K_{m}^{'} L_{j}$
${z_{j, n, m, k}}$	AO	$\sum_{m = 1}^{M} K_{m}^{'} (L_{j} + U_{j}) (2 V^{2} q^{2} + 2 V q + V + 5) + \sum_{m = 1}^{M} K_{m}^{'} L_{j}$
${w_{j, m, k}}$	MO	$\sum_{m = 1}^{M} K_{m}^{'} V q (11 (L_{j} + U_{j}) + 2 V q + \| B_{j} \| + 3)$
${w_{j, m, k}}$	AO	$\sum_{m = 1}^{M} K_{m}^{'} V q (7 (L_{j} + U_{j}) + 2 V q + \| B_{j} \| + 3)$
${b_{j, m, k}}$	MO	$\sum_{m = 1}^{M} K_{m}^{'} (V^{2} q^{2} + 2 V q + 7 (L_{j} + U_{j}) + \| B_{j} \| + 2)$
${b_{j, m, k}}$	AO	$\sum_{m = 1}^{M} K_{m}^{'} (V^{2} q^{2} + 2 V q + 4 (L_{j} + U_{j}) + \| B_{j} \| + 1)$

Table 2. Detailed profiles of used datasets.

Dataset	♯ Training Exam.	♯ Testing Exam.	♯ Feature	♯ Lab./Dim.
Flare	2580	650	3	3, 4, 2
Cal500	450	52	10	2
Jura	2870	720	2	4, 5
Music	530	61	6	2
Song	3140	785	3	3
WQ	950	110	14	4
Belae	1740	190	5	5

Table 3. Hamming loss of different algorithms versus LADR on MDC datasets (Bold entity here represents the lowest hamming loss in that case).

Dataset	Hamming Loss
Dataset	Flare	Cal500	Jura	Music	Song	WQ	Belae
LADR	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$
dSMUDC	0.0505	0.3685	0.0564	0.1915	0.1265	0.3512	0.5761
cSMUDC	0.0507	0.3656	0.0551	0.1907	0.1236	0.3503	0.5742
dS²PMDL	0.0517	0.3715	0.0627	0.1932	0.1287	0.3535	0.5811
dS²MLL	0.0564	0.3807	0.0735	0.1973	0.1437	0.3668	0.5792
PLEM	0.0526	0.3747	0.0703	0.1942	0.1340	0.3557	0.5770
DLEM	0.0513	0.3705	0.0727	0.1948	0.1308	0.3573	0.5754
KARM	0.0557	0.3819	0.0792	0.1978	0.1447	0.3685	0.5805

Table 4. Exact match of different algorithms versus LADR on MDC datasets (Bold entity here represents the highest exact match in that case).

Dataset	Exact Match
Dataset	Flare	Cal500	Jura	Music	Song	WQ	Belae
LADR	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$
dSMUDC	0.8248	0.0180	0.7726	0.3326	0.5380	0.0096	0.0287
cSMUDC	0.8267	0.0202	0.7761	0.3318	0.5412	0.0098	0.0298
dS²PMDL	0.8218	0.0120	0.7552	0.3286	0.5302	0.0076	0.0265
dS²MLL	0.8070	0.0105	0.7378	0.3184	0.5124	0.0072	0.0211
PLEM	0.8172	0.0135	0.7450	0.3224	0.5280	0.0092	0.0247
DLEM	0.8203	0.0146	0.7423	0.3240	0.5271	0.0085	0.0252
KARM	0.8184	0.0122	0.7148	0.3180	0.5144	0.0070	0.0202

Table 5. Semi-exact match of different algorithms versus LADR on MDC datasets (Bold entity here represents the highest semi-exact match in that case).

Dataset	Semi-Exact Match
Dataset	Flare	Cal500	Jura	Music	Song	WQ	Belae
LADR	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$	$0.5$
dSMUDC	0.9564	0.0727	0.9703	0.6760	0.9257	0.0468	0.1454
cSMUDC	0.9578	0.0748	0.9722	0.6796	0.9275	0.0466	0.1470
dS²PMDL	0.9532	0.0716	0.9610	0.6693	0.9216	0.0446	0.1416
dS²MLL	0.9412	0.0631	0.9506	0.6532	0.9130	0.0389	0.1265
PLEM	0.9501	0.0688	0.9549	0.6606	0.9178	0.0426	0.1355
DLEM	0.9516	0.0712	0.9627	0.6602	0.9163	0.0432	0.1381
KARM	0.9282	0.0653	0.9423	0.6505	0.9115	0.0392	0.1272

Table 6. Summary of the Friedman statistics

F_{F}

and the critical value in teams of Hamming loss, Exact match and Semi-exact match.

Table 6. Summary of the Friedman statistics

F_{F}

and the critical value in teams of Hamming loss, Exact match and Semi-exact match.

Metric	$F_{F}$	Critical Value ( $α = 0.05$ )
Hamming loss	36.50
Exact match	21.92	2.36
Semi-exact match	46.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Z.; Chen, S. Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks. Electronics 2025, 14, 4634. https://doi.org/10.3390/electronics14234634

AMA Style

Xu Z, Chen S. Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks. Electronics. 2025; 14(23):4634. https://doi.org/10.3390/electronics14234634

Chicago/Turabian Style

Xu, Zhen, and Sicong Chen. 2025. "Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks" Electronics 14, no. 23: 4634. https://doi.org/10.3390/electronics14234634

APA Style

Xu, Z., & Chen, S. (2025). Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks. Electronics, 14(23), 4634. https://doi.org/10.3390/electronics14234634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks

Abstract

1. Introduction

2. Preliminaries

2.1. Distributed Vector Quantization

2.2. Explicit Feature Map

3. dSMUDC Algorithm

3.1. Problem Formulation

3.2. One-vs.-One Decomposition Encoding

3.3. Design of Global Objective Function

3.4. Reformulation and Decentralization of Global Objective Function

3.4.1. Construction of Explicit Feature Map

3.4.2. Simplification of Integration Calculation

3.4.3. Reconstruction of Estimated Error Function on Labeling Confidence

3.4.4. Decentralization Implementation

3.5. Optimization

3.5.1. Update of Labeling Confidence of Reproduction Data

3.5.2. Update of Labeling Confidence of Training Data Samples

3.5.3. Update of Model Parameter

3.6. One-vs.-One Decomposition Decoding

3.7. Performance Analysis

4. Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI