Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function

Han, Yuxin; Liu, Yang; Sun, Yaping; Niu, Kai; Ma, Nan; Cui, Shuguang; Zhang, Ping

doi:10.3390/e27060651

Open AccessArticle

Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function

by

Yuxin Han

^1,2

,

Yang Liu

^1,2,

Yaping Sun

^2,4,

Kai Niu

^1,2,*,

Nan Ma

^3,2

,

Shuguang Cui

^4,2 and

Ping Zhang

^3,2

¹

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

Department of Broadband Communication, Pengcheng Laboratory, Shenzhen 518055, China

³

Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

⁴

School of Science and Engineering (SSE) and the Future Network of Intelligent Institute (FNii), The Chinese University of Hong Kong (Shenzhen), Shenzhen 518172, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(6), 651; https://doi.org/10.3390/e27060651

Submission received: 29 April 2025 / Revised: 14 June 2025 / Accepted: 16 June 2025 / Published: 18 June 2025

(This article belongs to the Special Issue Semantic Information Theory)

Download

Browse Figures

Versions Notes

Abstract

Semantic communication has recently gained significant attention in theoretical analysis due to its potential to improve communication efficiency by focusing on meaning rather than exact signal reconstruction. In this paper, we extend the Blahut–Arimoto (BA) algorithm, a fundamental method in classical information theory (CIT) for computing the rate-distortion (RD) function, to semantic communication by proposing the extended Blahut–Arimoto (EBA) algorithm, which iteratively updates transition and reconstruction distributions to calculate the semantic RD function based on synonymous mapping in semantic information theory (SIT). To address scenarios where synonymous mappings are unknown, we develop an optimization framework that combines the EBA algorithm with simulated annealing. Initialized with a syntactic mapping, the framework progressively merges syntactic symbols and identifies the mapping with a maximum synonymous number that satisfies objective constraints. Furthermore, by considering the semantic knowledge base (SKB) as a specific instance of synonymous mapping, the EBA algorithm provides a theoretical approach for analyzing and predicting the SKB size. Numerical results validate the effectiveness of the EBA algorithm. For Gaussian sources, the semantic RD function decreases with an increasing synonymous number and becomes significantly lower than its classical counterpart. Additionally, analysis on the CUB dataset demonstrates that larger SKB sizes lead to higher semantic communication compression efficiency.

Keywords:

Blahut–Arimoto algorithm; semantic rate-distortion function; semantic information theory; semantic knowledge base

1. Introduction

Recent years have witnessed the rapid development of semantic communication, which has emerged as a promising paradigm for the next-generation communication system [1]. Unlike the traditional communication system that focuses on transmitting symbols, semantic communication aims to recover the message that matches the meaning of the transmitted information [2]. This paradigm enables more efficient data transmission by focusing on transporting and delivering the meaning of messages, potentially reducing communication overhead while maintaining the essential meaning of the information [3,4,5,6].

Classical information theory (CIT), established by Shannon in 1948, has served as the cornerstone for modern digital communication [7]. In CIT, the rate-distortion (RD) theory characterizes the fundamental trade-off between the rate and the distortion in lossy compression systems [8,9,10]. The Blahut–Arimoto (BA) algorithm has been established as a prevailing numerical method in CIT, which is capable of computing the RD function for arbitrary discrete source models [11,12]. Variants and acceleration techniques of the BA algorithm have been developed [13,14,15,16,17], and a mapping approach has been introduced for continuous sources [18].

Recent research suggests that the RD-perception function plays an important role in semantic communication [19,20,21,22,23]. More recently, Niu and Zhang extended CIT to semantic communication, referred to as the semantic information theory (SIT) [24]. By introducing the synonymous mapping, information measures (such as semantic entropy, which is consistent with that used in large language models [25]) and coding theorems for the semantic communication system were established, thus providing a mathematical foundation for guiding the design of the semantic communication system. In their work, the analytical expression of the semantic RD function for Gaussian sources is derived; however, computing the semantic RD function for other source models remains challenging, which motivates us to extend the classical BA algorithm to the semantic communication system.

In this paper, we propose an extension of the BA algorithm based on the SIT, referred to as the extended Blahut–Arimoto (EBA) algorithm, such that the semantic RD function can be directly calculated with any given source model. The main contributions of this paper are summarized as follows.

First, based on the synonymous mapping in SIT, the EBA algorithm is proposed by extending the classical BA algorithm to the semantic communication system. Similar to the BA algorithm, the EBA algorithm is also an iterative procedure that converges to the semantic RD function through alternating optimization of the transition and reconstruction distributions. The convergence is guaranteed by the convexity property of the semantic RD function, providing an efficient method for computing the semantic RD function for arbitrary discrete source models.
Then, starting from a syntactic mapping, an optimization framework is developed for scenarios with unknown synonymous mappings. The framework combines the EBA algorithm with simulated annealing [26] to progressively merge syntactic symbols and identify the mapping with maximum synonymous number that satisfies objective constraints, enabling the discovery of optimal semantic representations that balance compression efficiency and distortion.
Furthermore, by considering the semantic knowledge base (SKB) as a specific instance of synonymous mapping, the EBA algorithm provides a theoretical approach for analyzing and predicting the SKB size. Using the CUB dataset [27], experimental results indicate that increasing the SKB size directly improves semantic communication compression efficiency, thereby validating the critical role of SKB in enhancing transmission performance.

The remainder of this paper is organized as follows. Section 2 reviews the semantic RD function in SIT and introduces the classical BA algorithm. Section 3 presents the EBA algorithm and details its optimization framework for finding the synonymous mapping with maximum synonymous number. Section 4 describes the application of the EBA to SKB. Simulation results are provided in Section 5, followed by conclusions in Section 6.

2. Preliminaries

2.1. Notation and Conventions

Throughout this paper, calligraphic letters, such as

X

and

Y

, denote sets, while lowercase letters denote elements in these sets. For a syntactic symbol, its reconstruction is denoted by

\hat{\cdot}

, and its associated semantic symbol by

\tilde{\cdot}

. The cardinality of set

X

is defined as

| X |

. Let

f : X \to Y

denote a mapping from set

X

to set

Y

.

2.2. Semantic RD Function

To facilitate understanding, a brief overview of the related concepts is presented first; detailed definitions can be found in [24]. Consider a syntactic information set

X = {x_{1}, \dots, x_{N}}

and its corresponding semantic information set

\tilde{X} = {{\tilde{x}}_{1}, \dots, {\tilde{x}}_{\tilde{N}}}

, where

\tilde{N} \leq N

. The synonymous mapping

f_{x} : \tilde{X} \to X

is a one-to-many mapping that partitions

X

into disjoint synonymous sets

{X_{i_{s}}}

,

i_{s} = 1, 2, \dots, \tilde{N}

, where each

X_{i_{s}}

contains syntactic symbols sharing the same semantic meaning and

| X_{i_{s}} | = N_{i_{s}}

. As illustrated in Figure 1, each semantic symbol corresponds to an equivalent set of syntactic symbols, and there is no overlap between any two sets.

Based on the synonymous mapping, the semantic RD function is obtained by minimizing the semantic mutual information between the source and the reconstruction subject to an average semantic distortion constraint. Assume that given an i.i.d. source

X \in X

with its associated semantic source

\tilde{X}

, let

x_{i_{s}, l}

denote the l-th syntactic symbol in the

i_{s}

-th synonymous set. The corresponding source probability distributions are expressed as

p_{i_{s} l} = p (x_{i_{s}, l})

and

p_{i_{s}} = p ({\tilde{x}}_{i_{s}}) = p (X_{i_{s}}) = \sum_{l = 1}^{N_{i_{s}}} p (x_{i_{s}, l})

. Let

{\hat{x}}_{j_{s}, m}

denote the m-th reconstruction syntactic symbol in the

j_{s}

-th synonymous set and

f_{\hat{x}}

denote the synonymous mapping

f_{\hat{x}} : \tilde{\hat{X}} \to \hat{X}

, which is defined analogously to

f_{x}

. The test channel is characterized by transition probability

P_{j_{s} m, i_{s} l} = p ({\hat{x}}_{j_{s}, m} | x_{i_{s}, l})

with semantic distortion

{\tilde{d}}_{i_{s} j_{s}} = {\tilde{d}}_{s} ({\tilde{x}}_{i_{s}}, {\tilde{\hat{x}}}_{j_{s}}) = {\tilde{d}}_{s} (X_{i_{s}}, {\hat{X}}_{j_{s}})

. For the reconstruction, the probability distributions are defined as

q_{j_{s} m} = p ({\hat{x}}_{j_{s}, m}) = \sum_{i_{s} = 1}^{\tilde{N}} \sum_{l} p_{i_{s} l} P_{j_{s} m, i_{s} l}

and

q_{j_{s}} = p ({\tilde{\hat{x}}}_{j_{s}}) = p ({\hat{X}}_{j_{s}}) = \sum_{m = 1}^{N_{j_{s}}} \sum_{i_{s}} \sum_{l} p_{i_{s} l} P_{j_{s} m, i_{s} l}

. Then, the semantic RD function is defined as follows, and this problem is convex as shown in [24]:

R_{s} (D) = min_{f_{x}, f_{\hat{x}}} min_{P_{D}} \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} log \frac{p_{i_{s} l} P_{j_{s} m, i_{s} l}}{\sum_{l^{'}} \sum_{m^{'}} p_{i_{s} l^{'}} q_{j_{s} m^{'}}}

(1)

where the test channel set

P_{D}

is defined as

P_{D} = \{P_{j_{s} m, i_{s} l} : \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} {\tilde{d}}_{i_{s} j_{s}} \leq D\} .

(2)

2.3. BA Algorithm

The BA algorithm is an iterative procedure for computing the RD function. Let

p (x)

,

p (\hat{x})

, and

p (\hat{x} | x)

denote the source distribution, reconstruction distribution, and test channel transition probability, respectively. For each fixed Lagrange multiplier

λ \in R^{+}

, the algorithm minimizes the RD Lagrangian

L_{λ} = I (X; \hat{X}) + λ E [d (x, \hat{x})]

, where

I (X; \hat{X})

represents the mutual information and

d (x, \hat{x})

represents the distortion measure.

Starting with an initial reconstruction distribution

p (\hat{x})

, each iteration alternates between two steps: first, computing the test channel transition probability

p (\hat{x} | x) = \frac{p (\hat{x}) e^{- λ d (x, \hat{x})}}{\sum_{\hat{x}} p (\hat{x}) e^{- λ d (x, \hat{x})}}

(3)

and then updating the reconstruction distribution according to

p (\hat{x}) = \sum_{x} p (x) p (\hat{x} | x) .

(4)

This iterative process continues until the change in the reconstruction distribution between consecutive iterations is less than a predetermined threshold. Geometrically,

λ

represents the slope of the tangent line of the RD curve, and each optimal solution corresponds to a point on the RD curve. By varying

λ

, the entire RD curve can be swept out. The convergence of this algorithm has been proved by Blahut.

3. Extended BA Algorithm

In this section, the EBA algorithm is presented as a systematic approach for computing the semantic RD function. We first establish the theoretical foundations of the EBA algorithm, followed by an optimization framework for finding the synonymous mapping with maximum synonymous number.

3.1. EBA Algorithm for the Semantic RD Function

Without loss of generality, we consider the case where the synonymous mappings

f_{x}

and

f_{\hat{x}}

are identical. Given a synonymous mapping that partitions the syntactic symbols into disjoint synonymous sets, the semantic RD function can be formulated as the following optimization problem:

min_{P_{D}} \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} log \frac{p_{i_{s} l} P_{j_{s} m, i_{s} l}}{\sum_{l^{'}} \sum_{m^{'}} p_{i_{s} l^{'}} q_{j_{s} m^{'}}}

(5)

subject to the following constraints:

\begin{matrix} P_{D} = \{P_{j_{s} m, i_{s} l} : \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} {\tilde{d}}_{i_{s} j_{s}} \leq D\} \\ \sum_{j_{s}} \sum_{m} P_{j_{s} m, i_{s} l} = 1, \forall i_{s} = 1, 2, \dots, \tilde{N}, l = 1, 2, \dots, N_{i_{s}} \\ P_{j_{s} m, i_{s} l} \geq 0, \forall i_{s}, m, j_{s}, l \end{matrix} .

(6)

The EBA algorithm solves this optimization problem through iterative updates of the transition and reconstruction distributions. The main steps are outlined below, with a detailed pseudo-code provided in Algorithm 1.

(i) For a fixed

q_{j_{s} m}

, the transition distribution

{\overset{˘}{P}}_{j_{s} m, i_{s} l}

is calculated with a multiplier

λ \in R^{+}

:

{\overset{˘}{P}}_{j_{s} m, i_{s} l} = \frac{\sum_{m^{'}} q_{j_{s} m^{'}} e^{- λ {\tilde{d}}_{i_{s} j_{s}}}}{\sum_{j_{s}} \sum_{m^{'}} q_{j_{s} m^{'}} e^{- λ {\tilde{d}}_{i_{s} j_{s}}}} .

(7)

(ii) For a fixed

P_{j_{s} m, i_{s} l}

, the reconstruction distribution

{\overset{˘}{q}}_{j_{s} m}

is calculated as follows:

{\overset{˘}{q}}_{j_{s} m} = \sum_{i_{s}} \sum_{l} p_{i_{s} l} P_{j_{s} m, i_{s} l} .

(8)

Algorithm 1: EBA algorithm for the semantic RD function.

In the following, a detailed derivation of the EBA algorithm will be provided for the semantic RD function. Similar to CIT, the optimization problem can be solved using the method of Lagrange multipliers, as the convexity of the semantic RD function has been proved in SIT [24]. For a fixed initial distribution

q_{j_{s} m}

, the problem can be formulated as finding the optimal distribution

{\overset{˘}{P}}_{j_{s} m, i_{s} l}

that minimizes the objective function under the constraints:

\sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} {\tilde{d}}_{i_{s} j_{s}} = D

(9)

and

\sum_{j_{s}} \sum_{m} P_{j_{s} m, i_{s} l} = 1, \forall i_{s}, l .

(10)

Beginning with a choice of Lagrange multipliers

λ

and

r_{i_{s} l}

corresponding to the above constraints, the objective function is constructed as

\begin{matrix} L (λ) = & \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} log \frac{p_{i_{s} l} P_{j_{s} m, i_{s} l}}{\sum_{l^{'}} \sum_{m^{'}} p_{i_{s} l^{'}} q_{j_{s} m^{'}}} \\ + λ \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} P_{j_{s} m, i_{s} l} {\tilde{d}}_{i_{s} j_{s}} \\ + \sum_{i_{s}} \sum_{l} r_{i_{s} l} \sum_{j_{s}} \sum_{m} P_{j_{s} m, i_{s} l} \end{matrix} .

(11)

To find the optimal solution, the partial derivative of

L (λ)

with respect to

P_{j_{s 1} m_{1}, i_{s 1} l_{1}}

is taken and set to zero,

\begin{matrix} \frac{\partial L (λ)}{\partial P_{j_{s 1} m_{1}, i_{s 1} l_{1}}} & = p_{i_{s 1} l_{1}} [log \frac{p_{i_{s 1} l_{1}} P_{j_{s 1} m_{1}, i_{s 1} l_{1}}}{\sum_{l^{'}} p_{i_{s 1} l^{'}}} + 1] - p_{i_{s 1} l_{1}} log \sum_{m^{'}} q_{j_{s 1} m^{'}} + λ p_{i_{s 1} l_{1}} {\tilde{d}}_{i_{s 1} j_{s 1}} + r_{i_{s 1} l_{1}} \\ = p_{i_{s 1} l_{1}} log \frac{p_{i_{s 1} l_{1}} P_{j_{s 1} m_{1}, i_{s 1} l_{1}}}{\sum_{l^{'}} p_{i_{s 1} l^{'}} \sum_{m^{'}} q_{j_{s 1} m^{'}}} + p_{i_{s 1} l_{1}} + λ p_{i_{s 1} l_{1}} {\tilde{d}}_{i_{s 1} j_{s 1}} + r_{i_{s 1} l_{1}} = 0 \end{matrix} .

(12)

Solving for

P_{j_{s 1} m_{1}, i_{s 1} l_{1}}

, thus obtaining

P_{j_{s 1} m_{1}, i_{s 1} l_{1}} = p_{i_{s 1} l_{1}}^{- 1} \sum_{l^{'}} p_{i_{s 1} l^{'}} \sum_{m^{'}} q_{j_{s 1} m^{'}} e^{- λ {\tilde{d}}_{i_{s 1} j_{s 1}} - p_{i_{s 1} l_{1}}^{- 1} r_{i_{s 1} l_{1}} - 1} .

(13)

By substituting

(i_{s 1}, j_{s 1}, l_{1}, m_{1}) \to (i_{s}, j_{s}, l, m)

and applying the normalization constraint, we derive

{\overset{˘}{P}}_{j_{s} m, i_{s} l} = \frac{p_{i_{s} l}^{- 1} \sum_{l^{'}} p_{i_{s} l^{'}} \sum_{m^{'}} q_{j_{s} m^{'}} e^{- λ {\tilde{d}}_{i_{s} j_{s}} - p_{i_{s} l}^{- 1} r_{i_{s} l} - 1}}{\sum_{j_{s}} \sum_{m} p_{i_{s} l}^{- 1} \sum_{l^{'}} p_{i_{s} l^{'}} \sum_{m^{'}} q_{j_{s} m^{'}} e^{- λ {\tilde{d}}_{i_{s} j_{s}} - p_{i_{s} l}^{- 1} r_{i_{s} l} - 1}} .

(14)

The distribution

{\overset{˘}{P}}_{j_{s} m, i_{s} l}

is calculated for the fixed

q_{j_{s} m}

, as shown in Equation (7). Subsequently, with the transition probability distribution

{\overset{˘}{P}}_{j_{s} m, i_{s} l}

fixed, the reconstruction distribution

{\overset{˘}{q}}_{j_{s} m}

is calculated according to Equation (8). Based on these updated distributions, the distribution error

ϵ^{*} = \sum_{i_{s}, l, j_{s}, m} |{\overset{˘}{P}}_{j_{s} m, i_{s} l} - P_{j_{s} m, i_{s} l}| + \sum_{j_{s}, m} |{\overset{˘}{q}}_{j_{s} m} - q_{j_{s} m}|

(15)

is calculated. If

ϵ^{*}

is less than the predetermined threshold

ϵ

, the EBA algorithm is considered to have converged for the current

λ

, and the corresponding RD pair

(R (λ), D (λ))

is obtained by

\begin{matrix} R (λ) = \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} {\overset{˘}{P}}_{j_{s} m, i_{s} l} log \frac{p_{i_{s} l} {\overset{˘}{P}}_{j_{s} m, i_{s} l}}{\sum_{l^{″}} \sum_{m^{″}} p_{i_{s} l^{″}} {\overset{˘}{q}}_{j_{s} m^{″}}} \\ D (λ) = \sum_{i_{s}} \sum_{j_{s}} \sum_{l} \sum_{m} p_{i_{s} l} {\overset{˘}{P}}_{j_{s} m, i_{s} l} {\tilde{d}}_{i_{s} j_{s}} \end{matrix} .

(16)

Otherwise, these distributions are updated as the initial distributions for the next iteration, i.e.,

\begin{matrix} P_{j_{s} m, i_{s} l} & = {\overset{˘}{P}}_{j_{s} m, i_{s} l} \\ q_{j_{s} m} & = {\overset{˘}{q}}_{j_{s} m} \end{matrix} .

(17)

The entire semantic RD function curve can be obtained by sweeping through the value of

λ

, where each

(R (λ), D (λ))

pair corresponds to a point on the curve. The EBA algorithm incorporates synonymous mappings, enabling it to compute the semantic rate-distortion function, while the classical BA algorithm operates solely at the syntactic symbol level and does not account for any semantic structure or synonymous relationships.

3.2. EBA Algorithm for the Optimal Synonymous Mapping

When the synonymous mapping is unknown, an optimization framework that combines the EBA algorithm with simulated annealing is introduced to find the optimal synonymous mapping (i.e., the mapping with maximum synonymous number). To formally characterize the synonymous number, we introduce the following definition.

Definition 1.

The synonymous number

L_{s}

is defined by

L_{s} = \sum_{i_{s}} p_{i_{s}} N_{i_{s}}

(18)

where

N_{i_{s}}

represents the number of syntactic symbols mapped to the

i_{s}

-th semantic symbol.

As mentioned in [24], it is worth noting that the semantic RD function is not necessarily non-negative, thus requiring specific objective constraints to ensure meaningful results. In this paper, the distortion value at

R_{s} = 0

is considered as the constraint, which represents the maximum achievable distortion when no information is transmitted. For convenience, we denote the synonymous mapping as M. Algorithm 2 describes the optimization process.

Algorithm 2: Optimization framework for finding the optimal synonymous mapping.

The inputs to the optimization framework include the distortion constraint

D^{'}

, the initial temperature T, the cooling rate

α

, the temperature threshold

ϵ_{T}

, and the maximum number of iterations

N_{m a x}

. The outputs are the optimal synonymous mapping

M_{o p t}

(which achieves the maximum

L_{s}

under the constraint

D = D^{'}

when

R_{s} (D) = 0

) and the corresponding semantic RD function

R_{s} {(D)}_{o p t}

.

In the proposed optimization framework, a syntactic mapping is considered as the initial synonymous mapping M, where one semantic symbol corresponds to one syntactic symbol. Given a source distribution

p_{i_{s} l}

and a distortion measure

{\tilde{d}}_{i_{s} j_{s}}

, the problem is transformed into calculating the semantic RD function under a given synonymous mapping, which can be directly solved using the EBA algorithm as shown in Algorithm 1. Then set

M_{o p t} = M

,

R_{s} {(D)}_{o p t} = R_{s} (D)

.

Initialize the iteration number to zero. While the current temperature T is larger than the threshold

ϵ_{T}

or the number of iterations is less than the maximum limit

N_{m a x}

, the following steps are executed:

At each iteration, based on the current mapping and the probability distribution, which determines the probability of merging adjacent syntactic symbols into one semantic symbol, a new semantic mapping

M_{n e w}

is generated. The new semantic RD curve

R_{s} {(D)}_{n e w}

is then computed using the EBA algorithm. The corresponding

D_{0}^{n e w}

is defined as the value of D when

R_{s} (D) n e w = 0

. If

D_{0}^{n e w}

is less than the constraint

D^{'}

, indicating that

M_{n e w}

does not meet the optimization objective and thus needs to be regenerated. Otherwise, it is checked whether the

R_{s} {(D)}_{n e w}

curve lies below the current

R_{s} (D)

curve. If so, both the optimal mapping

M_{o p t}

and current mapping M are updated. To avoid the local optimum solutions, even if

M_{n e w}

yields a higher semantic RD curve, it may still be accepted with probability

P_{u} = e^{- \frac{D_{0}^{n e w} - D^{'}}{T}} .

(19)

The temperature T and the number of iterations are updated after each iteration.

The algorithm terminates when

T \leq ϵ_{T}

or the iterations reach

N_{m a x}

, returning the optimal mapping

M_{o p t}

and its corresponding semantic RD function

R_{s} {(D)}_{o p t}

. Through the combination of the EBA algorithm and simulated annealing, the optimal synonymous mapping

M_{o p t}

under the given constraints can be found.

4. The Application to SKB

SKB is a key concept in the semantic communication system that facilitates the semantic encoding and decoding process between the transmitter and the receiver [28,29,30,31]. For each mobile intelligent agent, it defines a set of semantic knowledge vectors corresponding to different classes, which can be obtained from Word2Vec or manually annotated semantic attribute vectors [32]. These vectors not only characterize the attribute features of corresponding classes but also reflect the semantic relationships among different classes. Specifically, we denote

W ≜ {w_{1}, w_{2}, \dots, w_{N}}

as the set of syntactic vectors and

S ≜ {s_{1}, s_{2}, \dots, s_{K}}

as the set of semantic vectors, where semantic vector

s_{i}

is referred to as the semantic prototype of the i-th class. It is worth noting that both transmitter and the receiver use the same SKB to ensure the effectiveness of semantic communication.

From the perspective of SIT, the SKB can be considered as a specific instance of synonymous mapping, where each semantic vector s corresponds to a semantic symbol

\tilde{x}

defined in Section 2, and syntactic vectors w sharing similar semantic vectors can be regarded as syntactic symbols x mapped to the same semantic symbol

\tilde{x}

. Therefore, for a given source, the semantic RD function can be directly computed using the EBA algorithm by considering the SKB as a synonymous mapping. By varying the SKB size, different semantic RD function results can be obtained. Through analyzing the relationship between the semantic RD function and the SKB size, the semantic communication compression efficiency can be quantitatively evaluated, which has theoretical significance for guiding the design of the semantic communication system.

An example is presented in Figure 2 to illustrate the SKB as a synonymous mapping. The left part shows the syntactic samples, the middle part presents the set of syntactic vectors

W

, and the right part displays the set of semantic vectors

S

. For this example, a dataset containing 10 uniformly distributed samples from 5 different bird classes is considered, with 2 samples per class (samples of the same class are circled by dashed lines). These images are first converted into vector representations to obtain their corresponding syntactic vectors. Obviously, the probability distribution is given by

p (w) = {0.1, 0.1, \dots, 0.1}

. Assuming that the SKB contains semantic attributes of 4 bird classes, these syntactic vectors can be mapped to the corresponding semantic vector space. Specifically, this mapping can be expressed as:

{w_{1}, w_{2}} \to s_{1}

,

{w_{3}, w_{4}} \to s_{2}

,

{w_{5}, w_{6}} \to s_{3}

,

{w_{7}, w_{8}} \to s_{4}

. However, for the remaining bird class whose semantic attributes are not included in the SKB, its two syntactic vectors correspond to distinct semantic vectors in semantic space, i.e.,

w_{9} \to s_{5}

and

w_{10} \to s_{6}

. Consequently, the probability distribution

p (s) = {0.2, 0.2, 0.2, 0.2, 0.1, 0.1}

is obtained. Based on

p (w)

and

p (s)

, the semantic RD result can be computed using the EBA algorithm.

5. Experimental Results

In this section, the simulation results of the semantic RD function calculated by the EBA algorithm are presented. The semantic mean squared error (SMSE) distortion is adopted, which is defined as

{\tilde{d}}_{i_{s} j_{s}} = \{\begin{matrix} 0, & i_{s} = j_{s} \\ \frac{1}{l \cdot m} \cdot \sum_{l} \sum_{m} {∥ x_{i_{s}, l} - {\hat{x}}_{j_{s}, m} ∥}^{2}, & i_{s} \neq j_{s} \end{matrix} .

(20)

Figure 3a shows the semantic RD function of Gaussian sources with zero mean and variance

σ^{2} = 1

computed by the EBA algorithm under the given synonymous mappings

f_{x}

, where the blue curve represents the syntactic RD function of a Gaussian source. For the Gaussian source, 201 syntactic points are uniformly sampled in the interval

[- 10, 10]

, and their corresponding probability distribution

p (x)

follows the Gaussian distribution. Based on

f_{x}

,

p (\tilde{x})

can be easily obtained. Figure 3b,c illustrate two synonymous mappings

f_{x}^{1}

and

f_{x}^{2}

with

L_{s} = 1.0809

and

L_{s} = 1.4736

, respectively. With the error threshold

ϵ = 0.0001

, the corresponding semantic RD function can be computed using the EBA algorithm. The simulation results show that the semantic RD function decreases as the synonymous number

L_{s}

grows and becomes significantly lower than the classic counterparts. Furthermore, it can be observed that when

L_{s} = 1

, the semantic RD function is identical to the classic RD function.

Given the distortion constraint

D^{'}

, the optimal synonymous mapping can be obtained using Algorithm 2. The simulated annealing parameters are set as follows: initial temperature

T = 100

, maximum iteration number

N_{m a x} = 1000

, cooling rate

α = 0.99

, and temperature threshold

ϵ_{T} = 10^{- 6}

. For the aforementioned Gaussian source, 201 syntactic points are uniformly sampled in the interval

[- 10, 10]

. The interval merging is performed as follows: based on the probability distribution (a mixture of two Gaussian distributions centered at

- 10

and 10), an interval endpoint is first located and extended by

δ

, where

δ

ranges from

- 0.5

to

0.5

with a step size of

0.1

. Then, multiple syntactic points within this extended range are merged into a semantic representation. Finally, to maintain symmetry, the same merging process is applied to the corresponding symmetric interval. With

D^{'} = 0.5

, the optimization results are presented in Figure 4. Specifically, Figure 4b shows an intermediate synonymous mapping with

L_{s} = 1.1328

during the optimization process, while Figure 4c presents the final optimal synonymous mapping with

L_{s} = 1.3470

. The results clearly show that the optimization algorithm finds the synonymous mapping with the maximum

L_{s}

under the given distortion constraint. Figure 4a illustrates the corresponding semantic RD function, where the blue curve represents the syntactic RD function of a Gaussian source. It should be noted that the optimization problem may yield multiple optimal synonymous mappings under the current constraint setting, and additional constraints would be necessary to ensure uniqueness.

Using the CUB dataset [27], the test set can be seen as an SKB containing 50 feature vectors; 2 samples are selected from each class, resulting in a total of 100 test samples. To investigate the impact of SKB size on the semantic RD function, the number of test samples is fixed and three SKB configurations are considered: 10 classes, 20 classes, and the complete 50 classes from the CUB test set. Figure 5 shows the semantic RD results under different SKB sizes, where the blue curve represents the syntactic RD function for transmitting 100 samples without the assistance of SKB. The simulation result shows that as the SKB size increases, the semantic RD function decreases significantly, indicating improved compression efficiency. Furthermore, compared with the syntactic RD function, the simulation result demonstrates the effectiveness of the SKB in improving compression performance. In practical applications, the semantic RD function can be computed by the EBA algorithm and then utilized for theoretical analysis to predict the required SKB size and compression limits based on specific task objectives.

6. Conclusions

In this paper, we extend the classic BA algorithm to semantic communication based on synonymous mapping and propose the EBA algorithm, which can compute the semantic RD function for a given source. Furthermore, by combining it with the optimization algorithm, the optimal synonymous mapping can be obtained under given constraints. From a practical perspective, by considering the SKB as a given synonymous mapping, the EBA algorithm provides an approach for analyzing the SKB size, thereby offering theoretical insights into the trade-off between semantic distortion and compression performance. Future work could explore the application of the EBA algorithm in other scenarios, establishing a more comprehensive theoretical foundation for the semantic communication system.

Author Contributions

Writing—original draft, Y.H.; writing—review and editing, Y.L., Y.S., K.N., N.M., S.C. and P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62293481, Grant 9246730007, Grant 62471054, Grant 62301471, and Grant 62293482 and the National Science and Technology Major Project-Mobile Information Networks under Grant No.2024ZD1300700.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BA	Blahut–Arimoto
CIT	classical information theory
CUB	CUB-200-2011 Birds
EBA	extended Blahut–Arimoto
RD	rate-distortion
SIT	semantic information theory
SKB	semantic knowledge base
SMSE	semantic mean squared error

References

Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE J. Sel. Areas Commun. 2022, 41, 5–41. [Google Scholar] [CrossRef]
Weaver, W. The mathematics of communication. In Communication Theory; Routledge: London, UK, 2017; pp. 27–38. [Google Scholar] [CrossRef]
Farsad, N.; Rao, M.; Goldsmith, A. Deep learning for joint source-channel coding of text. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2326–2330. [Google Scholar]
Xie, H.; Qin, Z.; Li, G.Y.; Juang, B.H. Deep learning enabled semantic communication systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Bourtsoulatze, E.; Kurka, D.B.; Gündüz, D. Deep joint source-channel coding for wireless image transmission. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 567–579. [Google Scholar] [CrossRef]
Weng, Z.; Qin, Z. Semantic communication systems for speech transmission. IEEE J. Sel. Areas Commun. 2021, 39, 2434–2444. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Shannon, C.E. Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. In Claude E. Shannon: Collected Papers; IEEE Press: Piscataway, NJ, USA, 1993. [Google Scholar] [CrossRef]
Berger, T. Rate-distortion theory. In Wiley Encyclopedia of Telecommunications; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar] [CrossRef]
Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar] [CrossRef]
Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
Dupuis, F.; Yu, W.; Willems, F.M. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information. In Proceedings of the International Symposium onInformation Theory, ISIT 2004, Chicago, IL, USA, 27 June–2 July 2004; p. 179. [Google Scholar]
Lingyi, C.; Wu, S.; Ye, W.; Wu, H.; Zhang, W.; Wu, H.; Bo, B. A Constrained BA Algorithm for Rate-Distortion and Distortion-Rate Functions. Csiam Trans. Appl. Math. 2025, 6, 350–379. [Google Scholar] [CrossRef]
Matz, G.; Duhamel, P. Information geometric formulation and interpretation of accelerated Blahut-Arimoto-type algorithms. In Proceedings of the Information Theory Workshop, San Antonio, TX, USA, 24–29 October 2004; pp. 66–70. [Google Scholar]
Sayir, J. Iterating the Arimoto-Blahut algorithm for faster convergence. In Proceedings of the 2000 IEEE International Symposium on Information Theory (Cat. No. 00CH37060), Sorrento, Italy, 25–30 June 2000; p. 235. [Google Scholar]
Yu, Y. Squeezing the Arimoto–Blahut algorithm for faster convergence. IEEE Trans. Inf. Theory 2010, 56, 3149–3157. [Google Scholar] [CrossRef]
Rose, K. A mapping approach to rate-distortion computation and analysis. IEEE Trans. Inf. Theory 1994, 40, 1939–1952. [Google Scholar] [CrossRef]
Stavrou, P.A.; Kountouris, M. The role of fidelity in goal-oriented semantic communication: A rate distortion approach. IEEE Trans. Commun. 2023, 71, 3918–3931. [Google Scholar] [CrossRef]
Serra, G.; Stavrou, P.A.; Kountouris, M. Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with f-Divergence Perception Constraints. arXiv 2024, arXiv:2408.15015. [Google Scholar]
Serra, G.; Stavrou, P.A.; Kountouris, M. Computation of rate-distortion-perception function under f-divergence perception constraints. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 531–536. [Google Scholar]
Li, D.; Huang, J.; Huang, C.; Qin, X.; Zhang, H.; Zhang, P. Fundamental limitation of semantic communications: Neural estimation for rate-distortion. J. Commun. Inf. Netw. 2023, 8, 303–318. [Google Scholar] [CrossRef]
Liang, Z.; Niu, K.; Wang, C.; Xu, J.; Zhang, P. Synonymous Variational Inference for Perceptual Image Compression. arXiv 2025, arXiv:2505.22438. [Google Scholar]
Niu, K.; Zhang, P. A mathematical theory of semantic communication. J. Commun. 2024, 45, 7–59. [Google Scholar] [CrossRef]
Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024, 630, 625–630. [Google Scholar] [CrossRef]
Rose, K. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc. IEEE 1998, 86, 2210–2239. [Google Scholar] [CrossRef]
Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200–2011 Dataset; Tech. Rep. CNS-TR-2010-001; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
Ren, J.; Zhang, Z.; Xu, J.; Chen, G.; Sun, Y.; Zhang, P.; Cui, S. Knowledge base enabled semantic communication: A generative perspective. IEEE Wirel. Commun. 2024, 31, 14–22. [Google Scholar] [CrossRef]
Ni, F.; Wang, B.; Li, R.; Zhao, Z.; Zhang, H. Interplay of semantic communication and knowledge learning. In Wireless Semantic Communications: Concepts, Principles and Challenges; John Wiley & Sons: Hoboken, NJ, USA, 2025; pp. 87–108. [Google Scholar]
Hello, N.; Di Lorenzo, P.; Strinati, E.C. Semantic communication enhanced by knowledge graph representation learning. In Proceedings of the 2024 IEEE 25th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, 10–13 September 2024; pp. 876–880. [Google Scholar]
Yi, P.; Cao, Y.; Kang, X.; Liang, Y.C. Deep learning-empowered semantic communication systems with a shared knowledge base. IEEE Trans. Wirel. Commun. 2023, 23, 6174–6187. [Google Scholar] [CrossRef]
Sun, Y.; Chen, H.; Xu, X.; Zhang, P.; Cui, S. Semantic knowledge base-enabled zero-shot multi-level feature transmission optimization. IEEE Trans. Wirel. Commun. 2023, 23, 4904–4917. [Google Scholar] [CrossRef]

Figure 1. An example of synonymous mapping between semantic and syntactic information sets.

Figure 2. Illustration of the SKB as a synonymous mapping. The left part shows the syntactic samples, the middle part shows the set of syntactic vectors

W

, and the right part shows the set of semantic vectors

S

.

Figure 2. Illustration of the SKB as a synonymous mapping. The left part shows the syntactic samples, the middle part shows the set of syntactic vectors

W

, and the right part shows the set of semantic vectors

S

.

Figure 3. Semantic RD function of Gaussian sources under the given synonymous mappings.

Figure 4. Semantic RD function and optimal synonymous mapping obtained by the optimization algorithm.

Figure 5. Impact of SKB size on the semantic RD function.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Y.; Liu, Y.; Sun, Y.; Niu, K.; Ma, N.; Cui, S.; Zhang, P. Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function. Entropy 2025, 27, 651. https://doi.org/10.3390/e27060651

AMA Style

Han Y, Liu Y, Sun Y, Niu K, Ma N, Cui S, Zhang P. Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function. Entropy. 2025; 27(6):651. https://doi.org/10.3390/e27060651

Chicago/Turabian Style

Han, Yuxin, Yang Liu, Yaping Sun, Kai Niu, Nan Ma, Shuguang Cui, and Ping Zhang. 2025. "Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function" Entropy 27, no. 6: 651. https://doi.org/10.3390/e27060651

APA Style

Han, Y., Liu, Y., Sun, Y., Niu, K., Ma, N., Cui, S., & Zhang, P. (2025). Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function. Entropy, 27(6), 651. https://doi.org/10.3390/e27060651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extended Blahut–Arimoto Algorithm for Semantic Rate-Distortion Function

Abstract

1. Introduction

2. Preliminaries

2.1. Notation and Conventions

2.2. Semantic RD Function

2.3. BA Algorithm

3. Extended BA Algorithm

3.1. EBA Algorithm for the Semantic RD Function

3.2. EBA Algorithm for the Optimal Synonymous Mapping

4. The Application to SKB

5. Experimental Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI