An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication

Wang, Shuhui; Li, Zuxing; Huang, Xin; Jiang, Qi

doi:10.3390/app15094608

Open AccessArticle

An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication

College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4608; https://doi.org/10.3390/app15094608

Submission received: 21 March 2025 / Revised: 14 April 2025 / Accepted: 19 April 2025 / Published: 22 April 2025

(This article belongs to the Topic Innovation, Communication and Engineering)

Download

Browse Figures

Versions Notes

Abstract

Semantic communication is emerging as a promising communication paradigm, where semantic coding plays an essential role by explicitly extracting task-critical information. Prior efforts toward semantic coding often rely on learning-based feature extraction methods but tend to overlook data compression and lack a rigorous theoretical foundation. To address these limitations, this paper proposes a novel explainable deep semantic coding framework, considering a binary mixed source and a classification task at the receiver. From an information-theoretic perspective, we formulate a semantic coding problem that jointly optimizes data compression rate and classification accuracy subject to distortion constraints. To solve this problem, we leverage deep learning techniques and variational approximation methods to develop practical deep semantic coding schemes. Experiments on the CelebA dataset and the CIFAR-10 dataset demonstrate that the proposed schemes effectively balance data compression and binary classification accuracy, which aligns with the theoretical formulation.

Keywords:

semantic communication; semantic coding; deep learning; variational method; image classification

1. Introduction

Traditional communication systems are primarily designed for reliable transmission of symbol (bit) streams, irrespective of specific requirements to support the task at the receiver. However, as communication technologies approach the Shannon capacity limit, traditional communication methods struggle to meet the growing demands for higher data volumes in 5G/6G networks and AI-driven applications [1]. Unlike traditional communication, semantic communication focuses on extracting and transmitting meaningful, task-critical features of the source information [2]. By preserving task-critical content while reducing redundancy, semantic communication shows higher efficiency and ensures better task performance at the receiver, making it a promising paradigm for next-generation intelligent communications [3]. Recent studies have demonstrated the effectiveness of semantic communication across various data modalities, including images [4,5,6,7], texts [8,9,10,11], videos [12,13], and question-answering (QA) tasks [14,15].

For image-based semantic communication, several works leverage joint source-channel coding (JSCC) to enhance transmission efficiency while preserving task-critical information. An attention-based DeepJSCC model is introduced in [4], which dynamically adapts to varying signal-to-noise ratio (SNR) levels during image transmission. Furthermore, the authors in [5] extend this approach to a multi-task semantic communication framework, enabling image recovery and classification simultaneously.

For text-based semantic communication, deep learning-based methods have been widely adopted to enhance transmission reliability while reducing bandwidth usage. The authors in [8] introduce DeepSC, a semantic communication system designed to maximize semantic accuracy in text transmission. To further optimize transmission efficiency, a lightweight distributed semantic communication model is proposed in [9], which enables IoT devices to transmit textual information at the semantic level.

For video-based semantic communication, the authors in [12] present an adaptive DeepJSCC framework tailored for wireless video delivery, ensuring high-quality transmission under varying network conditions. To reduce bandwidth requirement while maintaining perceptual quality, a novel video transmission framework, VISTA, is proposed in [13] to transmit the most relevant video semantics.

In QA tasks, semantic communication systems are evolving toward human-like communication through contextual understanding. In [14], DeepSC-VQA is introduced as a transformer-enabled multi-user semantic communication model for visual QA, which can effectively integrate text and image modalities at the transmitter and receiver. Similarly, the authors in [15] propose a memory-aided semantic communication system that utilizes a memory queue to enhance context-awareness and semantic extraction at the receiver.

These studies highlight the diverse applications of semantic communication across different data types, each involving distinct tasks at the receiver. To effectively support these tasks, semantic coding plays a crucial role in extracting task-critical information while reducing redundancy. To this end, several recent studies have focused on semantic coding methods tailored to various application scenarios [16,17].

In [18], the authors propose a deep learning-based joint transmission-recognition scheme for IoT devices to optimize task-specific recognition. This approach significantly enhances classification performance, particularly in low SNR conditions. Reference [19] develops an adaptable, deep learning-based semantic compression approach, which can compress the semantics according to their importance relevant to the image classification task. In [20], deep learning-based feature extraction with JSCC is leveraged to achieve efficient image retrieval in edge computing environments. A federated semantic learning (FedSem) framework is introduced in [21], which collaboratively trains semantic-channel encoders of multiple devices to enable semantic knowledge graph construction with privacy preservation.

These studies on semantic coding leverage deep learning methods to extract task-critical features. However, most approaches operate as black-boxes, lacking explainability of the coding scheme [22,23]. To address this limitation, recent research has explored explainable semantic communication. For example, [24] models control signals as semantic information and proposes a real-time control-oriented semantic communication framework that enables interpretable data transmission. However, this approach is not well-suited for transmitting image data. In [25], a triplet-based representation for text semantics is introduced, incorporating syntactic dependency analysis to enhance interpretability in semantic extraction. However, this method is specifically tailored for text semantics and is not applicable to image-related tasks. The authors in [26] present two information-theoretic metrics to characterize semantic information compression and select only task-critical features. However, these explainable semantic coding methods primarily focus on semantic extraction while overlooking effective data compression, which is crucial for ensuring reliable information transmission in communication systems.

In this paper, we propose a theoretically grounded and explainable deep semantic coding framework in a binary-classification-oriented communication (BCOC) scenario. Based on two fundamental information theoretic principles, rate-distortion theory and large-deviation theory, we formulate an optimization problem to effectively balance data compression efficiency and classification accuracy. Furthermore, we leverage deep learning techniques and variational methods to transform the theoretical optimization problem into a deep variational semantic coding (DVSC) model and develop a novel DVSC-Opt algorithm to efficiently obtain the deep semantic coding design. The contributions of this work are summarized as follows.

We formulate an optimization problem in the BCOC scenario from an information-theoretic perspective, addressing both source compression and task-critical information preservation under predefined distortion constraints. By explicitly optimizing a task-discriminative information measure, our framework effectively preserves the information that is critical to the binary classification task in an explainable manner.
We develop the DVSC model with respect to the formulated theoretical optimization problem. Without knowledge of probability distributions, we introduce discriminator networks to facilitate the estimation of information-theoretic terms. Additionally, we propose an efficient training algorithm for deep semantic coding design.
We conduct experimental validation on the CelebA dataset and the CIFAR-10 dataset, demonstrating that the proposed coding scheme effectively balances compression rate and classification accuracy while ensuring acceptable distortions.

Notably, the DVSC model is a combination of information-theoretic principles and deep learning techniques. Although hybrid frameworks such as InfoGAN [27] and deep information bottleneck (Deep IB) [28] have been widely adopted in representation learning and generative modeling, their problem formulations differ fundamentally from ours. InfoGAN is designed to learn disentangled representations by maximizing the mutual information between latent codes and generated samples in a generative adversarial framework, without explicitly addressing compression efficiency or task relevance. Deep IB, on the other hand, seeks to extract task-relevant features by optimizing an information bottleneck objective, but it is primarily used in supervised learning contexts and does not consider distortion constraints. In contrast, the DVSC model is explicitly designed for the BCOC scenario, jointly optimizing data compression and task performance subject to predefined distortion constraints, leveraging a theoretically principled and practically trainable architecture.

The rest of this paper is organized as follows: Section 2 presents the semantic coding model of BCOC and formulates the theoretical optimization problem; Section 3 develops the DVSC model and the DVSC-Opt algorithm; Section 4 implements the proposed training algorithm on the CelebA dataset and the CIFAR-10 dataset, providing experimental results and performance analysis; Section 5 concludes the paper.

2. System Model and Problem Formulation

In this section, we first introduce the semantic coding model in BCOC. Then we present related information theories and key concepts. Finally, an optimization problem is formulated to tradeoff the data compression efficiency, binary-classification task performance, and source reconstruction requirement.

2.1. System Model

The system model consists of three components: the mixed source, the semantic coding module, and the binary classification module, as depicted in Figure 1. The mixed source X consists of two subsources,

X_{1}

and

X_{2}

, each of which corresponds to a distinct class attribute and is defined on the same support set

X

. In each time block, an independent and identically distributed (i.i.d.) source sequence of length m,

X_{1}^{m}

or

X_{2}^{m}

, is generated either with respect to the marginal distribution

P_{X_{1}}

or

P_{X_{2}}

, with the prior probability

α_{1}

or

α_{2}

. For

i \in {1, 2}

, the source sequence

X_{i}^{m}

is first processed by the semantic coding module. By assuming a powerful channel coding scheme and error-free channel transmission, after encoding and decoding (hereinafter referred to as coding for brevity), the reconstructed sequence

Y_{i}^{m}

is subsequently fed into the classification module. The objective of the binary classification task is to determine from which subsource each source sequence originates.

In the semantic coding module, we analyze the rate-distortion trade-off of a mixed source based on information theory and information spectrum methods. In the binary classification module, large deviation theory is utilized to evaluate the classification performance. Consequently, the overall performance of the BCOC is jointly determined by the source compression rate, reconstruction distortion, and classification accuracy. The next subsection provides an overview of the information-theoretical foundations.

2.2. Theoretical Foundations

2.2.1. Rate-Distortion Function of Mixed Source

Given the mixed source in Figure 1, the probability of generating a source sequence

x^{m}

can be expressed as

P_{X^{m}} (x^{m}) = α_{1} P_{X_{1}^{m}} (x^{m}) + α_{2} P_{X_{2}^{m}} (x^{m}),

(1)

where the prior probabilities satisfy

α_{1}, α_{2} \in (0, 1)

,

α_{1} + α_{2} = 1

. Even though

X_{1}^{m}

and

X_{2}^{m}

are i.i.d. sequences, the mixed source sequence

X^{m}

is not an i.i.d. sequence. For mixed source coding, a rate-distortion pair

(R, D)

is said to be achievable if there exists a sequence of

(2^{m R}, m)

source coding

(f_{m}, g_{m})

that satisfies

\underset{m \to \infty}{lim sup} \frac{1}{m} E [d (X^{m}, g_{m} (f_{m} (X^{m})))] \leq D,

(2)

where

d (\cdot, \cdot)

denotes the distortion measure and mean squared error (MSE) is adopted in this paper.

Based on information spectrum methods [29], given a distortion constraint D, the infimum achievable rate of the mixed source coding is

R (D) = max {R_{1} (D), R_{2} (D)},

(3)

where

R_{i} (D)

for

i \in {1, 2}

is the rate-distortion function [30] with respect to the i.i.d. source sequence

X_{i}^{m}

, given by

R_{i} (D) = min_{Y_{i} : E [d (X_{i}, Y_{i})] \leq D} I (X_{i}; Y_{i}) .

(4)

This result indicates that the rate-distortion relationship of the mixed source coding is determined by the rate-distortion functions of individual i.i.d. subsources. As a result, two separate sets of encoders and decoders are employed to compress the subsources respectively as illustrated in Figure 1. Furthermore, the reconstructed sequence, either

Y_{1}^{m}

or

Y_{2}^{m}

, is an i.i.d. sequence generated with respect to the marginal probability distribution

P_{Y_{1}}

or

P_{Y_{2}}

. It can be straightforward to prove that the rate-distortion function of mixed source has an alternative expression as

\begin{matrix} R (D) = min_{P_{Y_{1} | X_{1}}, P_{Y_{2} | X_{2}}} & max {I (X_{1}; Y_{1}), I (X_{2}; Y_{2})} \\ subject to & E [d (X_{1}, Y_{1})] \leq D, \\ E [d (X_{2}, Y_{2})] \leq D . \end{matrix}

(5)

2.2.2. Large Deviation Theory of Binary Hypothesis Testing

In the BCOC system model, the classification task at the receiver can be formulated as a binary hypothesis testing problem. Specifically, let

H_{1}

denote the hypothesis that the source sequence originates from the subsource

X_{1}

, i.e., the input of classifier is the i.i.d. sequence

Y_{1}^{m}

; and let

H_{2}

denote the hypothesis that the source sequence originates from the subsource

X_{2}

, i.e., the input of classifier is the i.i.d. sequence

Y_{2}^{m}

. Under this setting, classification errors can be categorized into two types.

Type I error ( $E_{1}$ ): Classify hypothesis $H_{1}$ as $H_{2}$ .
Type II error ( $E_{2}$ ): Classify hypothesis $H_{2}$ as $H_{1}$ .

Since the source sequence is randomly generated from either

X_{1}

or

X_{2}

with prior probabilities

α_{1}

or

α_{2}

, the overall classification error probability

P_{e}

is given by

P_{e} = α_{1} P r (E_{1}) + α_{2} P r (E_{2}) .

(6)

According to large deviation theory,

P_{e}

follows an exponential decay law when the sequence length m is sufficiently large. More specifically,

P_{e}

is characterized by the Chernoff information [30], given by

P_{e} ≐ e^{- m C (P_{Y_{1}}, P_{Y_{2}})},

(7)

where the Chernoff information between probability distributions

P_{Y_{1}}

and

P_{Y_{2}}

is defined as

C (P_{Y_{1}}, P_{Y_{2}}) = - min_{0 \leq α \leq 1} log (\sum_{y} P_{Y_{1}}^{α} (y) P_{Y_{2}}^{1 - α} (y)) .

(8)

Chernoff information quantifies the statistical distinguishability between two probability distributions. A larger Chernoff information value indicates greater separability between the distributions, leading to a lower classification error probability in the asymptotic regime, as indicated by (7). Consequently, enhancing classification accuracy can be formulated as improving the Chernoff information in (8).

2.3. Problem Formulation

Different from the traditional source coding, the semantic coding needs to balance data compression efficiency, source reconstruction quality, and binary-classification task performance. To this end, we formulate an optimization problem based on the rate-distortion function of mixed source and large deviation theory of binary hypothesis testing as

\begin{matrix} min_{P_{Y_{1} | X_{1}}, P_{Y_{2} | X_{2}}} & max {I (X_{1}; Y_{1}), I (X_{2}; Y_{2})} - μ C (P_{Y_{1}}, P_{Y_{2}}) \\ subject to & E [d (X_{1}, Y_{1})] \leq D, \\ E [d (X_{2}, Y_{2})] \leq D, \end{matrix}

(9)

where

μ \geq 0

is a predetermined weight coefficient to balance mixed source compression efficiency and binary-classification performance. Intuitively, the formulated problem (9) can be seen as a weighted sum of the rate-distortion function (5) and the Chernoff information (8). Subject to a given distortion constraint, a larger

μ

prioritizes classification accuracy at the expense of compression efficiency, and vice versa.

Based on the convex optimization principle, we introduce Lagrange multipliers

λ_{1} \geq 0

and

λ_{2} \geq 0

to incorporate the distortion constraints into the objective function and formulate the dual problem as

\begin{matrix} max_{λ_{1}, λ_{2}} min_{P_{Y_{1} | X_{1}}, P_{Y_{2} | X_{2}}} & max {I (X_{1}; Y_{1}), I (X_{2}; Y_{2})} - μ C (P_{Y_{1}}, P_{Y_{2}}) \\ + λ_{1} (E [d (X_{1}, Y_{1})] - D) + λ_{2} (E [d (X_{2}, Y_{2})] - D) . \end{matrix}

(10)

For the four optimization variables in the dual problem, we employ an iterative optimization process to reduce the complexity. Thus, we further decompose the dual problem into two sub-problems, given by

\begin{matrix} max_{λ_{1}} min_{P_{Y_{1} | X_{1}}} I (X_{1}; Y_{1}) - μ C (P_{Y_{1}}, P_{Y_{2}}) + λ_{1} (E [d (X_{1}, Y_{1})] - D), \\ max_{λ_{2}} min_{P_{Y_{2} | X_{2}}} I (X_{2}; Y_{2}) - μ C (P_{Y_{1}}, P_{Y_{2}}) + λ_{2} (E [d (X_{2}, Y_{2})] - D) . \end{matrix}

(11)

Although we consider semantic coding of source sequence and binary-classification task dependent on reconstructed sequence in the system model, it is worth noting that the assumption of mixed i.i.d. sources leads to single-letter rate-distortion function and Chernoff information. Thus, all subsequent formulations and analysis are expressed in terms of single-letter variables instead of sequences.

It is worth emphasizing that in this work, the notion of explainability stems from the principled objective-level design of the BCOC model. By explicitly incorporating Chernoff information, which quantifies class separability into the optimization objective, the model structurally preserves features that are critical for binary decision making. Instead of relying on attention mechanisms or saliency maps commonly used in deep neural networks, this formulation offers a task-oriented and theoretically grounded approach to preserving task-critical semantics throughout the coding process.

3. Deep Variational Semantic Coding

Directly solving the dual problem (10) to obtain the optimal semantic coding represented by the conditional distributions

P_{Y_{1} | X_{1}}

and

P_{Y_{2} | X_{2}}

is generally difficult or even intractable. Recently, a popular approach is to parameterize the coding scheme using neural networks [31]. However, estimating mutual information and Chernoff information through data sampling but without knowledge of probability distributions remains highly challenging. To address this, we employ variational methods and design discriminator networks for efficient estimation of these information-theoretic terms in the proposed DVSC model. In this section, we first construct the DVSC model and then propose a learning algorithm for efficient training and optimization.

3.1. Parameterized Mutual Information

Mutual information can be defined as a Kullback-Leibler (KL) divergence. To evaluate mutual information without knowledge of probability distributions, we can estimate the corresponding KL divergence through variational methods, as presented in the following.

3.1.1. Estimation of KL Divergence

KL divergence measures the difference between two probability distributions. Let

P_{1} (x)

,

P_{2} (x)

be two discrete probability distributions defined on the same finite or countable space. Assume that

P_{1}

is absolutely continuous with respect to

P_{2}

, i.e., for every event x,

P_{1} (x) = 0

whenever

P_{2} (x) = 0

. The KL divergence between

P_{1}

and

P_{2}

is defined as

D_{KL} (P_{1} ∥ P_{2}) = E_{P_{1}} [log \frac{P_{1} (X)}{P_{2} (X)}] .

(12)

Evaluating KL divergence based on its definition requires knowledge of both probability distributions

P_{1}

and

P_{2}

, which is usually not available in the learning scenarios. A widely-used variational method leverages deep learning to optimize a discriminator network

T_{θ} (x)

to approximate KL divergence. The discriminator network

T_{θ} (x)

is trained with respect to the objective

T_{θ^{*}} = arg max_{T_{θ}} \{E_{P_{1}} [log σ (T_{θ} (X))] + E_{P_{2}} [log (1 - σ (T_{θ} (X)))]\},

(13)

where

σ (x) = \frac{e^{x}}{1 + e^{x}} .

(14)

Once the discriminator network is optimized, the KL divergence can be estimated as

D_{{KL}_{θ^{*}}} (P_{1} ∥ P_{2}) = E_{P_{1}} [T_{θ^{*}} (X)] .

(15)

This approach provides an efficient estimation of KL divergence and has been widely adopted in deep learning and information theory research [32,33].

3.1.2. Estimation of Mutual Information

Given two random variables X and Y, mutual information quantifies statistical dependency between them, defined as

I (X; Y) = \sum_{x, y} P_{X, Y} (x, y) log \frac{P_{X, Y} (x, y)}{P_{X} (x) P_{Y} (y)} .

(16)

Alternatively, mutual information can be expressed in terms of KL divergence, given by

I (X; Y) = D_{KL} (P_{X, Y} ∥ P_{X} P_{Y}) .

(17)

This formulation allows for a similar deep variational approach for mutual information estimation. Specifically, we introduce a discriminator network

T_{ϕ} (x, y)

, and the training objective is given by

T_{ϕ^{*}} = arg max_{T_{ϕ}} \{E_{P_{X, Y}} [log σ (T_{ϕ} (X, Y))] + E_{P_{X} P_{Y}} [log (1 - σ (T_{ϕ} (X, Y)))]\} .

(18)

Once the neural network

T_{ϕ} (x, y)

is trained, the mutual information can be estimated as

I_{ϕ^{*}} (X; Y) = E_{P_{X, Y}} [T_{ϕ^{*}} (X, Y)] .

(19)

We note that the mutual information term in Equation (19) is estimated via a variational upper bound, which provides a tractable surrogate for optimization. While this introduces a relaxation compared to the exact mutual information, our experiments indicate that the training process converges stably, and the resulting approximation does not negatively affect the optimization of the overall objective in Equation (11). This suggests that the bound is sufficiently tight for the purposes of our model.

3.2. Parameterized Chernoff Information

To achieve a deep variational approximation of Chernoff information, we leverage its mathematical relation with Rényi divergence, and utilize the latter as an intermediate quantity for estimation. Therefore, we begin by introducing the estimation of Rényi divergence.

3.2.1. Estimation of Rényi Divergence

Let

P_{1} (x)

,

P_{2} (x)

be two discrete probability distributions defined on the same finite or countable space. Assume that

P_{1}

is absolutely continuous with respect to

P_{2}

, i.e., for every event x,

P_{1} (x) = 0

whenever

P_{2} (x) = 0

. For parameter

a \in (0, 1) \cup (1, \infty)

, the Rényi divergence of order

α

is defined as

R_{α} (P_{1} ∥ P_{2}) = \frac{1}{α - 1} log \sum_{x} P_{1}^{α} (x) P_{2}^{1 - α} (x) .

(20)

According to Donsker-Varadhan Rényi variational formula [34], Rényi divergence can be reformulated as

R_{α} (P_{1} ∥ P_{2}) = sup_{T_{ψ}} \{\frac{1}{α - 1} log \sum_{x} P_{1} (x) e^{(α - 1) T_{ψ} (x)} - \frac{1}{α} log \sum_{x} P_{2} (x) e^{α T_{ψ} (x)}\},

(21)

where

T_{ψ}

is a discriminator function parameterized by

ψ

, and is trained with respect to the objective

T_{ψ^{*}} = arg min_{T_{ψ}} - \frac{1}{α - 1} log \sum_{x} P_{1} (x) e^{(α - 1) T_{ψ} (x)} + \frac{1}{α} log \sum_{x} P_{2} (x) e^{α T_{ψ} (x)} .

(22)

Once the optimal discriminator network

T_{ψ^{*}}

is obtained, the Rényi divergence can be approximated as

R_{α} (P_{1} ∥ P_{2}) = \frac{1}{α - 1} log \sum_{x} P_{1} (x) e^{(α - 1) T_{ψ^{*}} (x)} - \frac{1}{α} log \sum_{x} P_{2} (x) e^{α T_{ψ^{*}} (x)} .

(23)

3.2.2. Estimation of Chernoff Information

By comparing Equations (8) and (20), we notice that when

0 < α < 1

, Chernoff information can be expressed in terms of Rényi divergence as

C (P_{1}, P_{2}) = - min_{0 < α < 1} (α - 1) \cdot R_{α} (P_{1} ∥ P_{2}) .

(24)

Based on the variational formulation of Rényi divergence in Equation (21), the optimization of Chernoff information involves jointly training the discriminator network

T_{ψ}

and the Chernoff coefficient

α

. Specifically, after obtaining the optimized discriminator

T_{ψ^{*}}

,

α

is trained with respect the following objective:

α^{*} = arg min_{0 < α < 1} [log \sum_{x} P_{1} (x) e^{(α - 1) T_{ψ^{*}} (x)} - \frac{α - 1}{α} log \sum_{x} P_{2} (x) e^{α T_{ψ^{*}} (x)}] .

(25)

With both

T_{ψ}

and

α

optimized, the Chernoff information can be estimated as:

C_{ψ^{*}, α^{*}} (P_{1}, P_{2}) = - log \sum_{x} P_{1} (x) e^{(α^{*} - 1) T_{ψ^{*}} (x)} + \frac{α^{*} - 1}{α^{*}} log \sum_{x} P_{2} (x) e^{α^{*} T_{ψ^{*}} (x)} .

(26)

In practice, the optimal coefficient

α^{*}

in Chernoff information rarely attains an extreme value at

α = 0

or

α = 1

. Therefore, this numerical approximation over

0 < α < 1

remains effective in most cases.

Overall, the estimation of Chernoff information in our framework involves two stages: a variational approximation of Rényi divergence and a numerical minimization over the coefficient

α \in (0, 1)

. The first stage is theoretically supported by the Donsker–Varadhan variational formulation [34] using discriminator networks. The second stage is implemented via gradient-based optimization to update the parameter

α

. Our experimental results demonstrate stable convergence and effective estimation of Chernoff information throughout training.

3.3. The DVSC Model and Training Algorithm

We now revisit the BCOC optimization objective (11). In this formulation, the expected distortion is directly computable using MSE, whereas the mutual information and Chernoff information terms are estimated via discriminator networks. With all components of the objective function now accessible, we propose the DVSC model, as illustrated in Figure 2.

In Figure 2, we employ two deep neural networks (DNNs) as encoder networks, denoted as

F_{ω_{1}}

and

F_{ω_{2}}

, respectively. Discriminator networks

T_{ϕ_{1}}

and

T_{ϕ_{2}}

are trained to estimate mutual information

I_{ϕ_{1}^{*}} (X_{1}; Y_{1})

and

I_{ϕ_{2}^{*}} (X_{2}; Y_{2})

, respectively. Similarly, discriminator network

T_{ψ}

along with the parameter

α

are optimized to estimate Chernoff information

C_{ψ^{*}, α^{*}} (P_{Y_{1}}, P_{Y_{2}})

. Therefore, for given distortion constraint D and weight coefficient

μ

, the encoders can be optimized according to the following objectives

\begin{matrix} min_{ω_{1}} I_{ϕ_{1}^{*}} (X_{1}; F_{ω_{1}} (X_{1})) - μ C_{ψ^{*}, α^{*}} (F_{ω_{1}} (X_{1}), Y_{2}) + λ_{1} (E [d (X_{1}, F_{ω_{1}} (X_{1}))] - D), \\ min_{ω_{2}} I_{ϕ_{2}^{*}} (X_{2}; F_{ω_{2}} (X_{2})) - μ C_{ψ^{*}, α^{*}} (Y_{1}, F_{ω_{2}} (X_{2})) + λ_{2} (E [d (X_{2}, F_{ω_{2}} (X_{2}))] - D), \end{matrix}

(27)

and Lagrange multipliers can be optimized according to

\begin{matrix} max_{λ 1} I_{ϕ_{1}^{*}} (X_{1}; Y_{1}) - μ C_{ψ^{*}, α^{*}} (Y_{1}, Y_{2}) + λ_{1} (E [d (X_{1}, Y_{1})] - D), \\ max_{λ 2} I_{ϕ_{2}^{*}} (X_{2}; Y_{2}) - μ C_{ψ^{*}, α^{*}} (Y_{1}, Y_{2}) + λ_{2} (E [d (X_{2}, Y_{2})] - D) . \end{matrix}

(28)

In sum, the training process of DVSC model involves optimization of five neural networks, two Lagrange multipliers, and a Chernoff coefficient. The optimization process takes four key steps:

1.: Train discriminator networks $T_{ϕ_{1}}$ and $T_{ϕ_{2}}$ , and estimate mutual information;
2.: Train discriminator network $T_{ψ}$ and Chernoff coefficient $α$ , and estimate Chernoff information;
3.: Train encoders $F_{ω_{1}}$ and $F_{ω_{2}}$ ;
4.: Optimize Lagrange multipliers $λ_{1}$ and $λ_{2}$ .

These steps are performed iteratively until all network parameters and Lagrange multipliers converge. The detailed DVSC-Opt algorithm is given in Algorithm 1.

The overall training complexity of the DVSC-Opt algorithm per iteration is

O (M^{2} \cdot n)

, where M is the mini-batch size and n denotes the typical forward/backward cost of a neural network module. Across T iterations, the total training cost becomes

O (T \cdot M^{2} \cdot n)

. Although the training complexity introduces some modest overhead, each component in the framework is implemented with lightweight network architectures. As a result, the overall runtime remains practical and can support real-world deployment.

Algorithm 1: The DVSC-Opt algorithm.

1:

Input: Training dataset:

S_{1} = \{x_{1, i} ∣ x_{1, i} \sim P_{1} (X)\}, S_{2} = \{x_{2, i} ∣ x_{2, i} \sim P_{2} (X)\}, i = 1, 2, \dots, N;

Distortion constraint D, weight coefficient $μ$ ;
Initialized hyper-parameters $λ_{1}, λ_{2}$ , and Chernoff coefficient $α$ ;
Initialized network parameters $ω_{1}, ω_{2}, ϕ_{1}, ϕ_{2}, ψ$ .

2:

repeat

3:

Step 1 Train Discriminator 1 ( $ϕ_{1}$ ) and Discriminator 2 ( $ϕ_{2}$ )

4:

Randomly sample mini-batches (batchsize is M)

B_{X_{1}} \subset S_{1}, B_{X_{2}} \subset S_{2}

5:

Obtain

B_{Y_{1}} = \{y_{1, j} ∣ y_{1, j} = F_{ω_{1}} (x_{1, j}), x_{1, j} \in B_{X_{1}}\}, B_{Y_{2}} = \{y_{2, j} ∣ y_{2, j} = F_{ω_{2}} (x_{2, j}), x_{2, j} \in B_{X_{2}}\}

6:

Compute loss functions and update

ϕ_{1}, ϕ_{2}

using the Adam optimizer via backpropagation:

L_{ϕ_{1}} = - \frac{1}{M} \sum_{i = 1}^{M} log σ (T_{ϕ_{1}} (x_{1, i}, y_{1, i})) - \frac{1}{M (M - 1)} \sum_{i = 1}^{M} \sum_{j = 1, j \neq i}^{M} log (1 - σ (T_{ϕ_{1}} (x_{1, i}, y_{1, j})))

L_{ϕ_{2}} = - \frac{1}{M} \sum_{i = 1}^{M} log σ (T_{ϕ_{2}} (x_{2, i}, y_{2, i})) - \frac{1}{M (M - 1)} \sum_{i = 1}^{M} \sum_{j = 1, j \neq i}^{M} log (1 - σ (T_{ϕ_{2}} (x_{2, i}, y_{2, j})))

7:

Step 2 Train Discriminator 3 ( $ψ$ ) and Chernoff coefficient ( $α$ )

8:

Compute loss functions and update

ψ

and

α

:

L_{ψ} = - \frac{1}{α - 1} log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{ψ} (y_{1, i})}) + \frac{1}{α} log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{ψ} (y_{2, i})})

L_{α} = log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{ψ} (y_{1, i})}) - \frac{α - 1}{α} log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{ψ} (y_{2, i})})

9:

Step 3 Train Encoder 1 ( $ω_{1}$ ) and Encoder 2 ( $ω_{2}$ )

10:

Compute loss functions and update

ω_{1}, ω_{2}

:

\begin{aligned} L_{ω_{1}} = & \frac{1}{M} \sum_{i = 1}^{M} \log σ (T_{ϕ_{1}} (x_{1, i}, F_{ω_{1}} (x_{1, i}))) \\ - μ (- \log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{φ} (F_{ω_{1}} (x_{1, i}))}) + \frac{α - 1}{α} \log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{φ} (y_{2, i})})) \\ + λ_{1} (\frac{1}{M} \sum_{i = 1}^{M} d (x_{1, i}, F_{ω_{1}} (x_{1, i})) - D) \\ L_{ω_{2}} = & \frac{1}{M} \sum_{i = 1}^{M} \log σ (T_{ϕ_{2}} (x_{2, i}, F_{ω_{2}} (x_{2, i}))) \\ - μ (- \log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{φ} (y_{1, i})}) + \frac{α - 1}{α} \log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{φ} (F_{ω_{2}} (x_{2, i}))})) \\ + λ_{2} (\frac{1}{M} \sum_{i = 1}^{M} d (x_{2, i}, F_{ω_{2}} (x_{2, i})) - D) \end{aligned}

11:

Step 4 Train Lagrange multipliers ( $λ_{1}, λ_{2}$ )

12:

Compute loss functions and update

λ_{1}, λ_{2}

:

\begin{aligned} L_{ω_{1}} = & - \frac{1}{M} \sum_{i = 1}^{M} \log σ (T_{ϕ_{1}} (x_{1, i}, y_{1, i})) \\ + μ (- \log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{φ} (y_{1, i})}) + \frac{α - 1}{α} \log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{ψ} (y_{2, i})})) \\ - λ_{1} (\frac{1}{M} \sum_{i = 1}^{M} d (x_{1, i}, y_{1, i}) - D) \\ L_{ω_{2}} = & - \frac{1}{M} \sum_{i = 1}^{M} \log σ (T_{ϕ_{2}} (x_{2, i}, y_{2, i})) \\ + μ (- \log (\frac{1}{M} \sum_{i = 1}^{M} e^{(α - 1) T_{φ} (y_{1, i})}) + \frac{α - 1}{α} \log (\frac{1}{M} \sum_{i = 1}^{M} e^{α T_{ψ} (y_{2, i})})) \\ - λ_{2} (\frac{1}{M} \sum_{i = 1}^{M} d (x_{2, i}, y_{2, i}) - D) \end{aligned}

13:

until convergence

14:

Output: Optimized parameters

ω_{1}^{*}, ω_{2}^{*}, ϕ_{1}^{*}, ϕ_{2}^{*}, ψ^{*}

;

Lagrange multipliers

λ_{1}^{*}, λ_{2}^{*}

, Chernoff coefficient

α^{*}

.

The DVSC-Opt algorithm provides a semantic coding framework for BCOC systems, aiming to jointly optimize data compression while preserving task-critical information. Although several state-of-the-art semantic coding models, such as DeepSC [8], VISTA [13], and multi-task JSCC [14] frameworks, have achieved notable progress in semantic communication, their problem formulations differ fundamentally from ours. These models typically consider noisy communication channels and focus on general semantic transmission across diverse tasks, whereas DVSC-Opt focuses solely on semantic source coding and is specifically tailored to the BCOC scenario.

Moreover, most of current semantic coding models primarily emphasize semantic extraction, often overlooking effective data compression and lacking theoretical interpretability. In contrast, the DVSC model is designed to preserve task-critical information while simultaneously enhancing data efficiency under explicit distortion constraints. By integrating information-theoretic principles with deep learning techniques, the DVSC-Opt algorithm achieves a favorable balance among theoretical grounding, structural explainability, and practical trainability. The effectiveness of the proposed framework will be demonstrated through experiments in the next section.

4. Experiments

To comprehensively evaluate the proposed DVSC-Opt algorithm, we conduct experiments on the CelebA dataset [35] and a binary-class subset of the CIFAR-10 [36] dataset. Experimental results from both datasets demonstrate the effectiveness of the DVSC-Opt algorithm. In the following, we present a detailed analysis based primarily on the CelebA dataset. Experiments are performed on an NVIDIA RTX 3090 GPU, using PyTorch 1.12.1 as the deep learning framework.

4.1. Dataset Description and Preprocessing

The CelebA dataset consists of 202,599 celebrity images, each annotated with 40 binary attributes stored in an attribute file. For our binary classification task, we use the “gender” attribute to categorize images into two classes: female and male.

Before training the DVSC model, several preprocessing steps are required. First, we partition the dataset into two subsets with respect to

X_{1}

(female) and

X_{2}

(male). Each image is then normalized to ensure numerical stability and consistency in training. Finally, the processed data from both subsets are fed into their respective encoders for learning and optimization. The hyperparameter settings for the DVSC-Opt algorithm are listed in Table 1. The network architectures of the encoders and discriminators are listed in Table 2, Table 3 and Table 4.

4.2. Evaluation Metrics

The optimization objective of the DVSC model involves a trade-off among compression rate, reconstruction distortion, and classification performance. When training is completed, these three design objectives can be evaluated as follows.

Reconstruction distortion: As the training of DVSC model converges, the final distortion satisfies its predefined constraint D. Therefore, a smaller value of D results in a lower reconstruction distortion.
Compression rate: The compression rate is assessed by estimating mutual information using the trained discriminator networks $T_{ϕ_{1}^{*}}$ and $T_{ϕ_{2}^{*}}$ . A smaller mutual information value indicates a higher degree of compression.
Classification performance: A lightweight two-layer neural network classifier is used to evaluate the task performance on reconstructed images. A higher classification accuracy indicates a better preservation of task-relevant semantic information during the encoding process.

Since the weight coefficient

μ

determines the relative importance of mutual information and Chernoff information in the objective, adjusting

μ

under a given distortion constraint D directly affects the trade-off between compression efficiency and classification performance. Furthermore, varying D also impacts both the compression rate and classification performance. Experiments are designed to investigate the influence of both

μ

and D on the overall system performance.

4.3. Experiment Results and Analysis

To validate the convergence of the DVSC-Opt algorithm, we show the training processes of two Lagrange multipliers,

λ_{1}

and

λ_{2}

, over training epochs under a specific setting of (

D = 0.02

,

μ = 100

). As depicted in Figure 3, both multipliers gradually stabilize after 600 epochs, indicating the convergence of training.

Additionally, other parameters involved in the training process, including

ω_{1}

,

ω_{2}

,

ϕ_{1}

,

ϕ_{2}

,

ψ

, and

α

, demonstrate similar convergence behavior. It is worth noting that all the subsequent experimental results and analysis are conducted under the premise that the training of DVSC model converges.

4.3.1. Trade-Off Analysis: Influence of $μ$ with Fixed D

To investigate the trade-off influenced by

μ

, we fix the distortion constraint D and systematically adjust

μ

to examine its effect on the compression rate and classification performance. Additionally, we conduct multiple experiments under varying values of D. The corresponding results are illustrated in Figure 4.

Across all subfigures in Figure 4, a common trend shows that as

μ

increases, both the compression rate and classification performance improve. This behavior suggests a shift in the optimization focus, where a larger

μ

places greater emphasis on classification accuracy at the expense of compression efficiency. This trend aligns well with our theoretical analysis. Moreover, when

μ > 10

, the rate of improvement gradually diminishes. Beyond

μ = 100

, both the compression rate and classification accuracy reach a plateau, indicating that the preservation of task-critical information has an inherent upper bound.

Figure 5 presents the qualitative evaluation of the coding schemes for different values of

μ

under a fixed distortion constraint of

D = 0.02

. Notably, when

μ = 0

, the coding algorithm degrades to the conventional source coding scheme. We can observe that as

μ

increases, gender-related features become more distinct, therefore leading to an improvement in classification accuracy.

4.3.2. Trade-Off Analysis: Influence of D for Fixed $μ$

Similarly, we fix

μ

and examine the relationships between compression rate and distortion constraint D, as well as classification accuracy and D. The corresponding curves are depicted in Figure 6. A general trend observed across these subfigures is that when

D > 0.02

, both the compression rate and classification accuracy exhibit a decreasing pattern as D increases. This trend suggests that increasing D allows for more efficient compression; however, it also leads to severer loss of task-critical information, ultimately degrading classification performance.

Figure 7 presents the qualitative evaluation of the coding schemes for different values of D at

μ = 100

. As expected, when D increases, the images become progressively blurred, leading to a noticeable loss of gender-related features and consequently a decline in classification accuracy.

4.3.3. Comparison with the Traditional Source Coding Scheme

As previously discussed, setting

μ = 0

corresponds to the conventional source coding scheme, as illustrated in Figure 6d. Compared with Figure 6a–c, Figure 6d achieves the highest compression efficiency but at the cost of the lowest classification accuracy. For instance, at

D = 0.02

,

μ = 100

in Figure 6a, the coding scheme achieves a peak classification accuracy of

96 %

, outperforming the traditional scheme by

11 %

subject to the same distortion constraint, albeit with a

3.6

-bit loss in compression rate. From the perspective of the optimization objectives, since conventional coding scheme does not account for classification performance, the overall effectiveness is inherently lower than that of the semantic coding scheme, despite achieving superior data compression efficiency.

4.4. Extended Evaluation on the CIFAR-10 Dataset

To further evaluate the generalizability of the proposed DVSC-Opt algorithm, we conduct additional experiments on the CIFAR-10 dataset following the same experimental pipeline as described above. The CIFAR-10 dataset contains 60,000 color images evenly distributed across 10 object categories. For this evaluation, we select two semantically similar categories, deer and horse, to perform mixed-source semantic coding and binary classification with the DVSC model. The results of these experiments are reported in Figure 8 and Figure 9.

As shown in Figure 8a, when the distortion constraint is fixed at

D = 0.05

, increasing the weight coefficient

μ

leads to improvements in both compression rate and classification accuracy, until the performance reaches a saturation point. This trend corresponds well with the visualization results in the first three rows of Figure 9, where larger

μ

values result in more prominent class-discriminative features. On the other hand, when the weight coefficient is fixed at

μ = 100

, as illustrated in Figure 8b, increasing D results in a gradual decline in both compression rate and classification accuracy, eventually stabilizing at a lower level. This effect is visually reflected in the last three rows of Figure 9, where higher distortion levels lead to progressively blurred image reconstructions. These two trends are consistent with the experimental findings on the CelebA dataset, confirming that the DVSC-Opt model consistently handles the trade-off between compression rate and binary classification performance across different datasets.

4.5. Summary

Experiments on the CelebA dataset and the CIFAR-10 dataset demonstrate that the DVSC-Opt algorithm can effectively balance compression rate and classification performance, while ensuring the distortion under the predefined constraint. This result aligns well with our theoretical analysis. Moreover, for a given weight coefficient

μ

, increasing distortion constraint D also leads to a degradation in both compression efficiency and classification accuracy, indicating a trade-off among these three design objectives. From the perspective of optimization objectives, the DVSC-Opt algorithm encompasses traditional encoding scheme while offering a more comprehensive and flexible approach.

5. Conclusions

In this paper, we propose a theoretically-grounded and explainable deep semantic coding framework tailored for the BCOC scenario. From an information-theoretic perspective, we formulate a semantic coding problem that jointly optimizes compression rate and classification accuracy subject to predefined distortion constraints. Furthermore, we leverage deep learning techniques and variational methods to transform the theoretical optimization problem into a trainable DVSC model and propose the DVSC-Opt algorithm. Experiments on the CelebA dataset and the CIFAR-10 dataset demonstrate that the DVSC-Opt algorithm effectively balances compression efficiency and classification performance while ensuring acceptable reconstruction distortion.

An important future direction is extending our coding framework to multi-class classification tasks by studying mixed multi-source semantic coding schemes. Moreover, future work could explore adaptive optimization strategies that dynamically adjust the weight coefficient

μ

based on task complexity, dataset characteristics, or real-time feedback mechanisms.

In conclusion, this study introduces a deep semantic coding framework that is both explainable and mathematically principled, laying a solid foundation for future research in explainable semantic coding and efficient data compression strategies.

Author Contributions

Conceptualization, S.W. and Z.L.; methodology, S.W. and Z.L.; software, S.W. and Q.J.; validation, S.W.; formal analysis, S.W.; investigation, Z.L.; resources, Z.L.; data curation, S.W.; writing—original draft preparation, S.W.; writing—review and editing, Z.L. and X.H.; visualization, S.W.; supervision, Z.L.; project administration, Z.L; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Inter-Governmental International Science and Technology Innovation Cooperation of National Key Research and Development Program of China (2023YFE0112500) and the National Natural Science Foundation of China (62006173).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors conducted experiments on two publicly available datasets: CelebA and CIFAR-10. These datasets can be accessed respectively at http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html (accessed on 22 January 2025) and https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 10 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BCOC	Binary-Classification-Oriented Communication
Deep IB	Deep Information Bottleneck
DNN	Deep Neural Network
DVSC	Deep Variational Semantic Coding
DVSC-Opt	DVSC Optimization
k-NN	k-Nearest Neighbors
MSE	Mean Square Error
SNR	Signal-to-Noise Ratio

References

Zou, H.; Zhang, C.; Lasaulce, S.; Saludjian, L.; Poor, H.V. Goal-oriented quantization: Analysis, design, and application to resource allocation. IEEE J. Sel. Areas Commun. 2022, 41, 42–54. [Google Scholar] [CrossRef]
Luo, X.; Chen, H.H.; Guo, Q. Semantic communications: Overview, open issues, and future research directions. IEEE Wirel. Commun. 2022, 29, 210–219. [Google Scholar] [CrossRef]
Yang, W.; Du, H.; Liew, Z.Q.; Lim, W.Y.B.; Xiong, Z.; Niyato, D.; Chi, X.; Shen, X.; Miao, C. Semantic communications for future internet: Fundamentals, applications, and challenges. IEEE Commun. Surv. Tutor. 2022, 25, 213–250. [Google Scholar] [CrossRef]
Xu, J.; Ai, B.; Chen, W.; Yang, A.; Sun, P.; Rodrigues, M. Wireless image transmission using deep source channel coding with attention modules. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2315–2328. [Google Scholar] [CrossRef]
Lyu, Z.; Zhu, G.; Xu, J.; Ai, B.; Cui, S. Semantic communications for image recovery and classification via deep joint source and channel coding. IEEE Trans. Wirel. Commun. 2024, 23, 8388–8404. [Google Scholar] [CrossRef]
Xie, B.; Wu, Y.; Shi, Y.; Ng, D.W.K.; Zhang, W. Communication-efficient framework for distributed image semantic wireless transmission. IEEE Internet Things J. 2023, 10, 22555–22568. [Google Scholar] [CrossRef]
Zhao, B.; Xing, H.; Wang, X.; Xiao, Z.; Xu, L. Classification-oriented distributed semantic communication for multivariate time series. IEEE Signal Process. Lett. 2023, 30, 369–373. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y.; Juang, B.-H. Deep learning enabled semantic communication systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z. A lite distributed semantic communication system for Internet of Things. IEEE J. Sel. Areas Commun. 2021, 39, 142–153. [Google Scholar] [CrossRef]
Wang, Z.; Zou, L.; Wei, S.; Liao, F.; Zhuo, J.; Mi, H.; Lai, R. Large language model enabled semantic communication systems. arXiv 2024, arXiv:2407.14112. [Google Scholar]
Ma, J.; Li, Q.; Liu, R.; Pandharipande, A.; Ge, X. Enhanced semantic information transfer on RIS-assisted communication systems. IEEE Wirel. Commun. Lett. 2024, 13, 2225–2229. [Google Scholar] [CrossRef]
Xu, J.; Tung, T.-Y.; Ai, B.; Chen, W.; Sun, Y.; Gündüz, D.D. Deep joint source-channel coding for semantic communications. IEEE Commun. Mag. 2023, 61, 42–48. [Google Scholar] [CrossRef]
Liang, C.; Deng, X.; Sun, Y.; Cheng, R.; Xia, L.; Niyato, D.; Imran, M.A. VISTA: Video transmission over a semantic communication approach. In Proceedings of the IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; pp. 1777–1782. [Google Scholar]
Xie, H.; Qin, Z.; Tao, X.; Letaief, K.B. Task-oriented multi-user semantic communications. IEEE J. Sel. Areas Commun. 2022, 40, 2584–2597. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y. Semantic communication with memory. IEEE J. Sel. Areas Commun. 2023, 41, 2658–2669. [Google Scholar] [CrossRef]
Qin, Z.; Tao, X.; Lu, J.; Tong, W.; Li, G.Y. Semantic communications: Principles and challenges. arXiv 2021, arXiv:2201.01389. [Google Scholar]
Getu, T.M.; Kaddoum, G.; Bennis, M. A survey on goal-oriented semantic communication: Techniques, challenges, and future directions. IEEE Access 2024, 12, 51223–51274. [Google Scholar] [CrossRef]
Lee, C.H.; Lin, J.W.; Chen, P.H.; Chang, Y.C. Deep learning-constructed joint transmission-recognition for Internet of Things. IEEE Access 2019, 7, 76547–76561. [Google Scholar] [CrossRef]
Liu, C.; Guo, C.; Yang, Y.; Jiang, N. Adaptable semantic compression and resource allocation for task-oriented communications. IEEE Trans. Cogn. Commun. Netw. 2023, 10, 769–782. [Google Scholar] [CrossRef]
Jankowski, M.; Gündüz, D.; Mikolajczyk, K. Wireless image retrieval at the edge. IEEE J. Sel. Areas Commun. 2020, 39, 89–100. [Google Scholar] [CrossRef]
Wei, H.; Ni, W.; Xu, W.; Wang, F.; Niyato, D.; Zhang, P. Federated semantic learning driven by information bottleneck for task-oriented communications. IEEE Commun. Lett. 2023, 27, 2652–2656. [Google Scholar] [CrossRef]
Bao, J.; Basu, P.; Dean, M.; Partridge, C.; Swami, A.; Leland, W.; Hendler, J.A. Towards a theory of semantic communication. In Proceedings of the 2011 IEEE Network Science Workshop, IEEE, West Point, NY, USA, 15–17 June 2011; pp. 110–117. [Google Scholar]
Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE J. Sel. Areas Commun. 2022, 41, 5–41. [Google Scholar] [CrossRef]
Kountouris, M.; Pappas, N. Semantics-empowered communication for networked intelligent systems. IEEE Commun. Mag. 2021, 59, 96–102. [Google Scholar] [CrossRef]
Liu, C.; Guo, C.; Yang, Y.; Ni, W.; Zhou, Y.; Li, L.; Quek, T.Q.S. Explainable semantic communication for text tasks. IEEE Internet Things J. 2024, 11, 39820–39833. [Google Scholar] [CrossRef]
Ma, S.; Qiao, W.; Wu, Y.; Li, H.; Shi, G.; Gao, D.; Shi, Y.; Li, S.; Al-Dhahir, N. Task-oriented explainable semantic communications. IEEE Trans. Wirel. Commun. 2023, 22, 9248–9262. [Google Scholar] [CrossRef]
Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Advances in NeurIPS, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep variational information bottleneck. In Proceedings of the the 5th ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
Han, T.S. Information-Spectrum Methods in Information Theory; Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 2003; Volume 50. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Razeghi, B.; Calmon, F.P.; Gunduz, D.; Voloshynovskiy, S. Bottlenecks CLUB: Unifying information-theoretic trade-offs among complexity, leakage, and utility. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2060–2075. [Google Scholar] [CrossRef]
Nowozin, S.; Cseke, B.; Tomioka, R. f-GAN: Training generative neural samplers using variational divergence minimization. In Proceedings of the Advances in NeurIPS, Barcelona, Spain, 5–11 December 2016. [Google Scholar]
Nguyen, X.; Wainwright, M.J.; Jordan, M.I. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 2010, 56, 5847–5861. [Google Scholar] [CrossRef]
Birrell, J.; Dupuis, P.; Katsoulakis, M.; Pantazis, Y.; Rey-Bellet, L. Function-space regularized Rényi divergences. In Proceedings of the ICLR 2023 Conference, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 3730–3738. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]

Figure 1. The BCOC model.

Figure 2. The DVSC model.

Figure 3. Convergence of Lagrange multipliers

λ_{1}

and

λ_{2}

over training epochs. (a) Evolution of

λ_{1}

over epochs; (b) Evolution of

λ_{2}

over epochs.

Figure 3. Convergence of Lagrange multipliers

λ_{1}

and

λ_{2}

over training epochs. (a) Evolution of

λ_{1}

over epochs; (b) Evolution of

λ_{2}

over epochs.

Figure 4. Compression rate and classification accuracy vs.

μ

.

Figure 4. Compression rate and classification accuracy vs.

μ

.

Figure 5. Visualization of reconstructed images for different values of

μ

at

D = 0.02

.

Figure 5. Visualization of reconstructed images for different values of

μ

at

D = 0.02

.

Figure 6. Compression rate and classification accuracy vs. D.

Figure 7. Visualization of reconstructed images for different D at

μ = 100

.

Figure 7. Visualization of reconstructed images for different D at

μ = 100

.

Figure 8. Trends of compression rate and classification accuracy on the CIFAR-10 dataset.

Figure 9. Reconstructed images on the CIFAR-10 dataset.

Table 1. Parameter settings for training.

Parameter	Value
Epoch	1000
Batchsize	256
Optimizer	Adam
Learning rate	0.0005

Table 2. Network architecture of Encoder 1 and Encoder 2.

Layer Index	Layer Type	Input Size	Output Size	Activation
1	Conv2D	(64, 64, 3)	(32, 32, 64)	ReLU
2	Conv2D	(32, 32, 64)	(16, 16, 128)	ReLU
3	Conv2D	(16, 16, 128)	(8, 8, 256)	ReLU
4	Conv2D	(8, 8, 256)	16384	ReLU
5	Dense	16384	1024	ReLU
6	Dense	1024	512	ReLU
7	Dense	512	768	Sigmoid

Table 3. Network architecture of Discriminator 1 and Discriminator 2.

Layer Index	Layer Type	Input Size	Output Size	Activation
1	Conv2D	(64, 64, 3)	(32, 32, 16)	ELU
2	Conv2D	(32, 32, 16)	(16, 16, 128)	ELU
3	Conv2D	(16, 16, 128)	4096	ELU
4	Dense	4096	256	ELU
5	Dense	256	1	-

Table 4. Network architecture of Discriminator 3.

Layer Index	Layer Type	Input Size	Output Size	Activation
1	Conv2D	(64, 64, 3)	(32, 32, 16)	ELU
2	Conv2D	(32, 32, 16)	(16, 16, 128)	ELU
3	Conv2D	(16, 16, 128)	4096	ELU
4	Dense	4096	512	ELU
5	Dense	512	64	ELU
6	Dense	64	1	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Li, Z.; Huang, X.; Jiang, Q. An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication. Appl. Sci. 2025, 15, 4608. https://doi.org/10.3390/app15094608

AMA Style

Wang S, Li Z, Huang X, Jiang Q. An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication. Applied Sciences. 2025; 15(9):4608. https://doi.org/10.3390/app15094608

Chicago/Turabian Style

Wang, Shuhui, Zuxing Li, Xin Huang, and Qi Jiang. 2025. "An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication" Applied Sciences 15, no. 9: 4608. https://doi.org/10.3390/app15094608

APA Style

Wang, S., Li, Z., Huang, X., & Jiang, Q. (2025). An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication. Applied Sciences, 15(9), 4608. https://doi.org/10.3390/app15094608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Explainable Deep Semantic Coding for Binary-Classification- Oriented Communication

Abstract

1. Introduction

2. System Model and Problem Formulation

2.1. System Model

2.2. Theoretical Foundations

2.2.1. Rate-Distortion Function of Mixed Source

2.2.2. Large Deviation Theory of Binary Hypothesis Testing

2.3. Problem Formulation

3. Deep Variational Semantic Coding

3.1. Parameterized Mutual Information

3.1.1. Estimation of KL Divergence

3.1.2. Estimation of Mutual Information

3.2. Parameterized Chernoff Information

3.2.1. Estimation of Rényi Divergence

3.2.2. Estimation of Chernoff Information

3.3. The DVSC Model and Training Algorithm

4. Experiments

4.1. Dataset Description and Preprocessing

4.2. Evaluation Metrics

4.3. Experiment Results and Analysis

4.3.1. Trade-Off Analysis: Influence of μ with Fixed D

4.3.2. Trade-Off Analysis: Influence of D for Fixed μ

4.3.3. Comparison with the Traditional Source Coding Scheme

4.4. Extended Evaluation on the CIFAR-10 Dataset

4.5. Summary

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.3.1. Trade-Off Analysis: Influence of $μ$ with Fixed D

4.3.2. Trade-Off Analysis: Influence of D for Fixed $μ$