Large-Language-Model-Enabled Text Semantic Communication Systems

Wang, Zhenyi; Zou, Li; Wei, Shengyun; Li, Kai; Liao, Feifan; Mi, Haibo; Lai, Rongxuan

doi:10.3390/app15137227

Open AccessArticle

Large-Language-Model-Enabled Text Semantic Communication Systems

by

Zhenyi Wang

¹

,

Li Zou

¹,

Shengyun Wei

²

,

Kai Li

^1,*

,

Feifan Liao

¹,

Haibo Mi

¹ and

Rongxuan Lai

^1,*

¹

College of Information and Communication, National University of Defense Technology, Wuhan 430000, China

²

School of Electrical Engineering, Naval University of Engineering, Wuhan 430000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7227; https://doi.org/10.3390/app15137227

Submission received: 19 May 2025 / Revised: 18 June 2025 / Accepted: 24 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue Recent Advances in AI-Enabled Wireless Communications and Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Large language models (LLMs) have recently demonstrated state-of-the-art performance in various natural language processing (NLP) tasks, achieving near-human levels in multiple language understanding challenges and aligning closely with the core principles of semantic communication Inspired by LLMs’ advancements in semantic processing, we propose LLM-SC, an innovative LLM-enabled semantic communication system framework which applies LLMs directly to the physical layer coding and decoding for the first time. By analyzing the relationship between the training process of LLMs and the optimization objectives of semantic communication, we propose training a semantic encoder through LLMs’ tokenizer training and establishing a semantic knowledge base via the LLMs’ unsupervised pre-training process. This knowledge base facilitates the creation of optimal decoder by providing the prior probability of the transmitted language sequence. Based on this, we derive the optimal decoding criteria for the receiver and introduce beam search algorithm to further reduce complexity. Furthermore, we assert that existing LLMs can be employed directly for LLM-SC without extra re-training or fine-tuning. Simulation results reveal that LLM-SC outperforms conventional DeepSC at signal-to-noise ratios (SNRs) exceeding 3 dB, as it enables error-free transmissions of semantic information under high SNRs while DeepSC fails to do so. In addition to semantic-level performance, LLM-SC demonstrates compatibility with technical-level performance, achieving approximately an 8 dB coding gain for a bit error ratio (BER) of

10^{- 3}

without any channel coding while maintaining the same joint source–channel coding rate as traditional communication systems.

Keywords:

large language model; joint source–channel coding; joint source–channel decoding; semantic communication

1. Introduction

Shannon and Weaver’s seminal work categorizes communication systems into three hierarchical levels [1,2]:

Technical communication:How accurately can the symbols of communication be transmitted?
Semantic communication:How precisely do the transmitted symbols convey the desired meaning?
Effective communication:How can the transmitted symbols be used to achieve the desired effect?

Ever since Morse invented Morse code and the wired telegraph for communication in 1837, communication systems have consistently aimed to achieve reliable and efficient transmission of symbols without considering their meanings. Shannon’s information theory, which quantifies information based on the probability of symbols, has provided a theoretical basis for all practical communication systems. As communication technology has advanced, the demands for communication systems are steadily growing. For instance, the development of 6G wireless communication systems significantly increased the demand for high-speed, low-latency communication to support various applications such as autonomous driving, the Internet of Things, and the Internet of Vehicles. This evolution necessitates the deployment of advanced coding techniques. Furthermore, the vast data transmission required by emerging technologies, such as autonomous driving and the metaverse, are approaching the Shannon limit with traditional communication methodologies, thereby pressing for new paradigms. Semantic communication, which emphasizes the precise transmission of semantic information, transcends conventional symbol-level source and channel coding. For instance, in the context of autonomous driving, it is often unnecessary to transmit raw sensor images to the control center; instead, the controller is primarily concerned with the semantics of the objects detected by the sensors.

Semantic communication systems have been studied for many years, with early discussions on the theory of semantic communication dating back to 1952 [3]. Recent theoretical research has categorized semantic communication problems into language exploitation and language design and provided mathematical models for both [4]. In practical engineering applications, DeepSC, the first semantic transmission system for text, was proposed in [5]. Moreover, DeepSC evaluates semantic communication systems by using BLEU and sentence similarity metrics and introduces a transformer-based network for semantic encoding and decoding. Then, several improved models have been developed on this basis [6,7]. Furthermore, semantic communication is gradually being explored in the fields of speech [8], images [9], and video [10].

The rapid evolution of artificial intelligence, particularly in NLP, continues to drive progress in semantic communication by providing new paradigms. LLMs have recently taken machines’ understanding of human language to new levels [11,12]. For instance, ChatGPT-3.5 can understand and respond to human language, well satisfying the needs of semantic communication. Therefore, applying LLM technology to semantic communication systems is a natural progression, with some initial studies already exploring this integration [13]. However, many issues in semantic communication remain unresolved, such as semantic representation and measurement, semantic error correction, and semantic knowledge bases [14].

In this paper, we investigate semantic communication systems by applying LLM technology to semantic encoding and decoding. A novel LLM-SC framework is proposed for text in wireless communication. The proposed framework utilizes LLM technology to model the semantic information of text and design a semantic communication system based on LLMs. We evaluate the framework through a series of simulations, demonstrating its effectiveness across multiple metrics. The main contributions of this paper are as follows:

A novel LLM-SC framework for text in wireless communication featuring a dual-component structure: (a) a semantic encoder utilizing LLM’s tokenizer for joint source–channel coding, and (b) a semantic knowledge base built by LLM’s unsupervised pre-training. This knowledge base provides prior probability distributions of language sequences, enabling optimal decoding at the receiver. This is the first application of LLM to the physical-layer encoding and decoding in semantic communication. By using LLMs to probabilistically model transmitted language sequences, we achieve a communication paradigm that balances semantic-level and technical-level performance. We investigate the relationship between the training process of LLMs and the optimization goals of semantic communication, proposing the training of a semantic encoder using LLMs’ tokenizer training and the construction of a semantic knowledge base using LLM unsupervised pre-training.
The optimal detection for LLM-SC is derived, and beam search algorithm from the domain of NLP is introduced to reduce complexity. This algorithm strikes a balance in complexity between Viterbi decoding algorithm and greedy decoding algorithm, thereby ensuring both high decoding efficiency and maintaining decoding reliability. Comparative simulations demonstrate that exploring the optimal paths with a beam size in the order of tens yields quite excellent decoding performance compared to traversing a vocabulary of tens of thousands. Moreover, when the beam size exceeds 20, the performance improvements by further increasing the beam size become negligible.
The semantic-level and technical-level performance of LLM-SC under AWGN and Rayleigh fading channels are evaluated, without requiring any additional re-training and fine-tuning on the existing LLM. Results show that LLM-SC outperforms the classical DeepSC in semantic-level metrics at an SNR above 3 dB and can achieve error-free semantic transmission at high SNRs, a contribution unattainable by DeepSC; moreover, in terms of technical-level metrics, LLM-SC exhibits superior coding efficiency compared to classical text compression algorithms and achieves optimal BER performance across the entire SNR curve at the same equivalent joint source–channel coding rate.

2. Related Works

To provide a structured overview of existing research, Table 1 summarizes key contributions and limitations in three domains: (1) AI-enabled wireless communication, (2) semantic representation in NLP, and (3) LLM-enabled semantic communication.

2.1. AI-Enabled Wireless Communication

The burgeoning field of artificial intelligence (AI) has catalyzed a profound transformation in wireless communications. Unlike traditional end-to-end communication algorithms that rely on mathematical theories, AI leverages neural networks trained to model the input–output relationships of communication systems. By utilizing large datasets, these neural networks can learn the inherent patterns within communication systems. This approach emphasizes end-to-end optimization and data-driven methodologies, which significantly reduce design overhead. Historical efforts to integrate AI into wireless communications date back to 1992, when the authors of [15] attempted to use neural networks for modulation recognition. Contemporary AI-based methods have now surpassed classical mathematical algorithms in modulation recognition [16]. For text data transmission, the authors of [17] introduced a unique deep and lightweight Transformer variant, DeLighT, which achieves end-to-end joint source–channel coding (JSCC). Subsequently, neural network methods have been successfully applied to various domains, including neural error correction [18], channel modeling [19], encoding and decoding [20], channel estimation and equalization [21], and bandwidth compression mappings [22]. These methods have made impressive performance improvements in wireless communications. A comprehensive review of neural network-based wireless communication technologies is provided in [30]. By leveraging the capabilities of AI, wireless communication systems can achieve higher efficiency, adaptability, and performance, marking a significant advancement over traditional methods.

2.2. Semantic Representation in Natural Language Processing

Recent advancements in NLP, primarily driven by large pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) [23], Generative Pre-trained Transformer (GPT) [24], and their successors, have substantially enhanced the capability to process and generate human language. These models are remarkably proficient in understanding context, generating coherent text, and performing a variety of language tasks, establishing them as pivotal in semantic communication. Techniques such as Word2Vec [25], GloVe [26], and FastText [29] transform words into continuous vector spaces, wherein semantically similar words are positioned closer together. These embeddings have revolutionized numerous NLP tasks by effectively capturing semantic meaning. However, traditional word embeddings lack the ability to capture syntactic information. In contrast, models like BERT and Embeddings from Language Models (ELMo) generate dynamic embeddings for words based on their context within sentences, thereby enhancing the capture of polysemy and contextual nuances. Semantic parsing, which involves converting natural language into formal representations such as logical forms or knowledge graphs, is crucial for tasks like question-answering and information extraction [27]. Ontologies like WordNet [28] and knowledge graphs such as Google’s Knowledge Graph [36] organize information into interconnected entities and relationships, offering a rich semantic framework that underpins various NLP applications. Advanced architectures, notably Transformers, have dramatically bolstered models’ abilities to understand and generate language. These architectures leverage large-scale pre-training and fine-tuning on specific tasks, achieving state-of-the-art performance across numerous benchmarks.

2.3. LLM-Enabled Semantic Communication

Recent breakthroughs in LLMs have significantly impacted semantic communication across various domains [13]. For instance, Jiang et al. addressed challenges in semantic communication for image data through a framework incorporating a segment-anything model-based knowledge base, attention-based semantic integration, and adaptive compression techniques [31]. Similarly, ref. [32] propose a semantic communication framework (LAM-SC) tailored for image data, leveraging LLMs as the core knowledge base. These approaches harness LLMs’ deep understanding of human knowledge to construct robust knowledge bases for diverse communication tasks. Shen et al. exploited the capabilities of LLMs in language understanding, planning, and code generation, combined with classical command strategies like task-oriented and communication-edge joint learning. They proposed an efficient multifunctional framework for coordinating edge AI models in executing tasks of edge intelligence [33]. However, current LLM-based semantic communication systems primarily focus on higher-level tasks such as user intent understanding, posing challenges in their application to physical-layer encoding and decoding. Nevertheless, LLMzip has achieved compression ratios surpassing previously known estimates of the entropy bounds by employing LLMs for lossless text compression [34]. Y. Zhao et al. put forward a semantic communication system driven by LLMs (LLMs) that uses multimodal features to reconstruct raw visual information, thereby improving transmission quality of images [35]. This demonstrates exceptional source coding performance at the physical layer, notwithstanding complexity issues. However, the feasibility and benefits of utilizing LLMs for channel encoding/decoding and modulation/demodulation remain largely unexplored.

3. System Model and Mechanism

3.1. Problem Description

The primary objective of a communication system is to effectively and reliably transmit information-bearing messages to the receiver. Within such systems, the information sequence is subjected to source encoding, channel encoding, and modulation at the transmitter, enabling it to be transmitted through the channel as symbols. At the receiver, inverse operations are performed to reconstruct the transmitted information. Communication systems typically pursue two optimization goals. Firstly, the reliability metric aims to minimize the discrepancy between the estimated sequence at the receiver and the transmitted sequence at the transmitter. In technical communication, this is evaluated using metrics like the BER, while in semantic communication, it considers semantic differences. Secondly, the efficiency metric aims to minimize the length of transmitted symbols. These objectives often conflict with each other. Shannon’s separation theorem states that in technical communication systems, these objectives correspond to channel encoding and source encoding, respectively, and are optimized independently.

This study analyzes the problem of semantic communication from the perspective of optimal decoding, focusing on the process where an information sequence

X = (x_{1}, x_{2}, \dots, x_{n})

, with

x_{i} \in X

is transmitted.

X

represents the set of transmitted information symbols. Initially, X is encoded and modulated at the transmitter using the function described in Equation (1), resulting in

S = (s_{1}, s_{2}, \dots, s_{t})

, where

s_{i} \in S

and

S

denotes the set of transmitted symbols. Subsequently, after transmission through the channel, the receiver acquires the sequence

O = (o_{1}, . . ., o_{t})

. The receiver task is to estimate

\hat{S}

from O, and

\hat{S}

is then decoded into

\hat{X}

using a semantic decoding function specified by Equation (2). The optimization objectives of the communication system are twofold: Firstly, to minimize information errors by aligning the probability distribution of

\hat{S}

as close as possible to S. Here, the function

φ

is bijective and reversible, ensuring that recovering S is equivalent to recovering X. Secondly, the system aims to minimize the length t of the sequence S, thereby improving the efficiency.

S = φ (X),

(1)

\hat{X} = φ^{- 1} (\hat{S}),

(2)

The relationship between

o_{t}

and

s_{t}

is shown in Equation (3):

o_{t} = h_{t} \otimes s_{t} + n_{t}

(3)

where

h_{t}

and

n_{t}

denote the channel impulse response (CSI) and noise at time t, and ⊗ signifies the convolution operator. The process is a typical Markov process, and the optimal decoding and demodulating process involves solving the following:

\hat{S} = \underset{(s_{i} \in S)}{arg max} P (s_{1}, s_{2}, \dots, s_{t} | o_{1}, o_{2}, \dots, o_{t}),

(4)

In fact,

P (S | O)

is often unavailable, hence Bayesian equation in Equation (5) is commonly employed to modify it:

P (S | O) = \frac{P (S, O)}{P (O)} = \frac{P (O | S) P (S)}{P (O)},

(5)

where

P (O | S)

represents the channel conditional transition probability. The design of the communication system assumes that the channel is memoryless, so the following formula is valid:

P (o_{1}, o_{2}, \dots, o_{t} | s_{1}, s_{2}, \dots, s_{t}) = \prod_{i = 1}^{t} P (o_{i} | s_{i}),

(6)

Therefore, the optimization goal of the optimal decoding system is as follows:

\begin{matrix} \underset{(s_{i} \in S)}{arg max} P (s_{1}, s_{2}, \dots, s_{t} ∣ o_{1}, o_{2}, \dots o_{t}) = & \underset{(s_{i} \in S)}{arg max} \frac{P (s_{1}, s_{2}, \dots, s_{t}) \prod_{i = 1}^{t} P (o_{i} ∣ s_{i})}{P (o_{1}, o_{2}, \dots, o_{t})}, \end{matrix}

(7)

In Equation (7),

P (s_{1}, s_{2}, \dots, s_{T})

represents the probability of the transmitted symbol sequence occurring within the entire set of transmitted symbols

S

. For a specific received sequence,

P (o_{1}, o_{2}, \dots, o_{T})

is a fixed value and cannot be optimized. Therefore, the final optimization objective is set by Equation (8).

\begin{matrix} \underset{(s_{i} \in S)}{arg max} P (s_{1}, s_{2}, \dots, s_{t} ∣ o_{1}, o_{2}, \dots o_{t}) = & \underset{(s_{i} \in S)}{arg max} P (s_{1}, s_{2}, \dots, s_{t}) \prod_{i = 1}^{t} P (o_{i} ∣ s_{i}) \end{matrix}

(8)

The problem now revolves around identifying a transmission system where the receiver can access both

P (s_{1}, s_{2}, \dots, s_{t})

and

P (o_{i} ∣ s_{i})

. The length of t determines the coding rate. If such a system exists, it would be optimal in view of the current coding rate. The two components of this requirement will be analyzed separately in the subsequent sections.

First,

P (o_{i} ∣ s_{i})

represents the probability distribution of the received signal at the receiver through the channel. For an AWGN channel, this distribution is given by Equation (9):

P (o_{i} ∣ s_{i}) = \frac{1}{\sqrt{2 π} σ} e^{(- \frac{{(s_{i} - o_{i})}^{2}}{2 σ^{2}})},

(9)

where

σ^{2}

denotes the noise power. Considering fading channels beyond AWGN channels, if

h_{i}

can be accurately estimated at time i, then based on Equation (3),

P (o_{i} ∣ h_{i} \otimes s_{i})

follows a normal distribution, as expressed in Equation (10):

P (o_{i} ∣ h_{i} \otimes s_{i}) = \frac{1}{\sqrt{2 π} σ} e^{(- \frac{{(o_{i} - s_{i} \otimes h_{i})}^{2}}{2 σ^{2}})},

(10)

When

h_{i}

remains constant, the fading channel effectively behaves like an AWGN channel.

The ‘⊗’ in Equation (10) corresponds to channel equalization in classical technical communication systems, where

s_{i}

represents the constellation sequences corresponding to the transmitted semantic symbol. In digital communication, the estimated

h_{i}

is utilized to equalize

o_{i}

by multiplying the conjugate transposed matrix of

o_{i}

and

h_{i}

. However, we use

h_{i}

to convolve

s_{i}

to the advantage of the design of our subsequent system.

Thus, our objective transforms into identifying a function

φ

that maps the sequence X to the sequence S, while accurately modeling the probability of S. This probability model should be consistent and computable at both transmitter and receiver ends, where the length of S reflects the coding rate at the transmitter. In technical communication systems, determining the probability distribution of S is challenging, often assuming an equiprobable transmission of information bits. LLMs process text into numerical token sequences and predict the probabilities of subsequent tokens in natural language, closely resembling our problem. These models can effectively model

P (S)

. In subsequent sections, we detail the use of LLMs for the probabilistic modeling of text. Assuming that

P (S)

is accessible, we continue discussing the function

φ

.

The goal of

φ

is to minimize the length t of S, thereby maximizing data transmission rates, ensuring that

P (s_{1}, s_{2}, \dots, s_{t})

reaches its maximum:

max P (s_{1}, s_{2}, \dots, s_{t}) = \prod_{i = 1}^{t} P (s_{i}),

subject to P (s_{i}) = P (s_{j}) \forall 1 \leq i, j \leq t .

(11)

When transmitted symbols are independent and equiprobable, the mapping process from X to S approaches the entropy limit in compression. This scenario mirrors source coding achieving the entropy limit without channel coding in technical communication [2]. The optimal decoding strategy involves comparing the Euclidean distance between the received signal and the transmission constellation. Bit errors are irreparable without prior information; once an error occurs at the receiver, the information becomes unrecoverable. However, technical communication systems typically exhibit some correlations among elements within S, enabling the receiver to correct erroneous symbols using prior knowledge. This correlation is analogous to the use of parity bits in channel coding within technical communication systems. Channel coding fundamentally entails computing parity bits from information bits using a corresponding generator matrix, thereby facilitating error recovery through receiver-side correlation. Viterbi decoding principles maximize the probability of information bits given the posterior distribution of received symbols. Hence, from both technical and semantic perspectives, an ideal communication system should possess the following characteristics:

S should be as short as possible, but cannot reach the entropy bound. The process from X to S is an encoding process that retains a certain amount of information redundancy.
The information redundancy inherent in S can be effectively exploited, allowing the design of algorithms to implement Equation (8).

DeepSC [5] employs a Transformer-based architecture for joint semantic encoding and decoding. The encoder maps input text X to a fixed-length semantic vector

S \in R^{d}

, while the decoder reconstructs

\hat{X}

from noisy channel outputs. Its training objective is to minimize semantic loss (e.g., sentence similarity) rather than bit-level errors, enabling robustness in low-SNR regimes but limiting error-free transmission at high SNRs.

3.2. System Model

In this subsection, the application of LLMs for modeling the problems outlined in the previous subsection is expounded. LLMs undergo an unsupervised pre-training process aimed at predicting the probability of the next token in a training set, given preceding text. Tokens, which represent encoded forms of human language beyond individual characters, are derived from extensive natural language data and undergo the phase of tokenizer training, akin to a data compression process [37]. Frequently used methods such as Byte Pair Encoding (BPE) and WordPiece [38] serve as forms of data compression, aligning with the function

φ

discussed earlier. The embedding layer after tokenization further represents semantic information from this encoding.

Consider a scenario where both the transmitter and receiver share identical background knowledge from a common knowledge base. If all possible sequences of transmitted information can be enumerated, then

P (S)

becomes computable, resolving the problem stated in the foregoing subsection. The objective of LLM training is to predict token probabilities after the tokenization of text, while dataset construction aims to comprehensively cover all human languages. Assuming text information, Alice and Bob, needing communication, should possess the same knowledge base (X distribution equivalence) for effective language interaction. Assuming a comprehensive collection of their language usage into dataset

D

, and given adequate computational resources for both parties, algorithms like BPE or WordPiece tokenize

D

’s text. The goal is to train a tokenizer with a vocabulary size V. If m bits represent a token, then

2^{m} \geq V

. Tokenization’s compression effect—where UTF-8 encodes sequence X of length n into sequence W of length t—should satisfy the expression

m * t \leq 8 * n,

(12)

Assuming V approaches infinity and the number of training iterations tends to infinity, tokenization achieves maximum data compression. For a dataset D, this tokenization results in each token in the vocabulary becoming approximately equiprobable. Consequently, the encoding of transmitted information reaches its shortest form. If such tokenization is applied to train an LLM, the model may fail to converge, as the output

P (S)

remains constant. Therefore, by controlling the size of V, one can regulate the degree of compression through tokenization, thereby managing the redundancy of information carried by transmitted bits. This effectively controls the coding rate of source–channel encoding [39].

Next, tokens are fed into LLMs for unsupervised pre-training. The model output can be approximated as

L L M_{o u t p u t} \approx P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{m a x (1, i - N)}),

(13)

where N denotes the context length of the LLM. As LLMs evolve, N tends to increase. Given the contextual dependencies inherent in natural language sequences, the tokens

W = (w_{i}, w_{i - 1}, w_{i - 2}, \dots, w_{m a x (1, i - N)})

are not independent. Thus, the equation addressing the causal system is

P (W) = \prod_{i = 1}^{t} P (w_{i} | w_{i - 1}, w_{i - 2}, \dots, w_{m a x (1, i - N)}),

(14)

The semantic knowledge base is established through the unsupervised pre-training phase of LLMs by Equation (14), which learns the joint probability distribution

P (W)

of token sequences from massive text corpora.

The loss function for unsupervised pre-training of LLMs is

L = - \sum_{i = 1}^{t} log P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{m a x (i - N, 1)}),

(15)

It is observed that Equation (14) and the loss function Equation (15) are equivalent. This distribution serves two critical functions: (1) it guides the beam search decoder by ranking candidate sequences based on linguistic plausibility, and (2) it enables semantic error correction by assigning near-zero probabilities to implausible sequences. The knowledge base is static after pre-training and requires no fine-tuning for deployment. By designing a high-dimensional constellation diagram for W and modulating it, W can becomes S. It should be noted that because the vocabulary size is generally large, it is difficult to represent it with one modulation symbol. Therefore, multiple modulation symbols are needed to represent a token, which we call a high-dimensional constellation diagram. Since W and S are in one-to-one correspondence, they share identical probability distributions. To train a semantic system for optimal decoding, the loss function is formulated as follows:

\begin{matrix} L_{\int} = - \sum_{i = 1}^{t} log P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{m a x (i - N, 1)}) - \sum_{i = 1}^{t} log P (o_{i} ∣ w_{i}), \end{matrix}

(16)

The two components of Equation (16) are evidently independent, enabling separate training of the system parts. The second component can be expressed as follows:

\begin{matrix} - \sum_{i = 1}^{t} log P (o_{i} ∣ w_{i}) = - \sum_{i = 1}^{t} log P (o_{i} ∣ s_{i}), \end{matrix}

(17)

Equation (17) represents the channel condition transition probability. Once the channel and modulation scheme are determined, this equation can be specified as Equation (10). For modulation schemes akin to those in classical communication (e.g., QAM or QPSK), Equation (17) can be directly applied without additional training.

Therefore, language sequences can be semantically encoded through the tokenization and training of LLMs. Probabilistic modeling is subsequently performed on these encoded sequences. Achieving a zero loss in LLM training suggests optimal decoding within this transmission system, where loss indicates deviation from optimal decoding. Notably, through the training of tokenization and LLMs, an optimal transmission system can be attained across various coding rates, obviating the need for further LLM fine-tuning. This training process aligns seamlessly with the typical training methodologies of LLMs, requiring no adjustments for semantic communication.

The system structure depicted in Figure 1 illustrates the pre-training process. We constructed the dataset by collecting all possible transmitted contents from both the sender and receiver. A tokenizer with a vocabulary size of V was trained using algorithms such as BPE and WordPiece. This tokenizer converts the text into tokens, which are later used to train an LLM. The loss function for the LLM is defined in Equation (15). Upon completion of the LLM training, the model serves as a knowledge base for semantic communication by providing

P (W)

to supply shared knowledge to the receiver. At the receiving end, the LLM plays a crucial role in the decoder by offering priors for the transmitted sequence. The method for achieving optimal decoding will be discussed in subsequent sections. The LLM-SC has the following characteristics:

The system assumes that D comprises all possible content that can be transmitted between the transmitter and receiver. While achieving this in real life poses challenges, current training corpora for LLMs strive to encompass all languages used by humans. Furthermore, this assumption is becoming increasingly feasible as LLM technology evolves rapidly.
The system prioritizes decoding sentences with higher probabilities of occurrence in the real world. Sequences for which $P (s_{1}, s_{2}, \dots, s_{T}) = 0$ cannot be transmitted, as the receiver would be unable to decode such sequences where the probability is zero. Consequently, the system cannot transmit sentences that are impossible in the real world, as these sentences convey no meaningful information. Therefore, the system integrates both technical and semantic information.

3.3. Decoding Algorithm of LLM-SC

After training, a pre-trained LLM can compute the probability in Equation (14) of a sent sequence

S = (s_{1}, s_{2}, \dots, s_{t})

given a received sequence

O = (o_{1}, o_{2}, \dots, o_{t})

, representing the likelihood of the sequence S transforming into O after transmission through a channel. This decoding process can be applied using any pre-trained LLM, and in this subsection, we delve into the specific decoding algorithm.

According to Equation (8), assuming the transmitted sequence length does not exceed the maximum context length of the LLM, the optimal decoding strategy involves

\begin{matrix} \hat{W} & = \underset{w_{i} \in V}{argmax} \prod_{i = 1}^{t} P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1}) P (o_{i} ∣ w_{i}) \\ = \underset{w_{i} \in V}{argmax} \prod_{i = 1}^{t} P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1}) P (o_{i} ∣ s_{i}) \\ = \underset{w_{i} \in V}{argmax} \sum_{i = 1}^{t} [ln (P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1})) + ln (P (o_{i} ∣ s_{i}))], \end{matrix}

(18)

The first term in Equation (18) reflects the prior probability distribution of transmitted tokens, while the second term quantifies the Euclidean distance between received symbols and constellation points. The direct exploration of all possible sequences to select the one with the highest probability, known as maximum likelihood decoding, poses computational challenges with complexity

O (V^{t})

, where V is the number of possible states (sets of transmit symbols) and t is the sequence length, typically imposing engineering constraints.

\begin{matrix} \hat{W} & = \underset{w_{i} \in V}{argmax} \prod_{i = 1}^{t} P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1}) P (o_{i} ∣ w_{i}) \\ = \underset{w_{i} \in V}{argmax} \sum_{i = 1}^{t} [N_{0} ln (P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1})) - ∥ o_{i} - h_{i} \otimes s_{i} ∥^{2}], \end{matrix}

(19)

where

N_{0}

denotes the noise power spectral density (PSD), typically

N_{0} = 2 σ^{2}

. Here,

w_{i}

represents the i-th transmitted token that undergoes modulation into

s_{i}

and transmission through the channel. Equation (19) represents a typical hidden Markov model prediction problem, commonly addressed using dynamic programming algorithms to find the path with the maximum probability. The Viterbi algorithm provides optimal sequence decoding for hidden Markov models by dynamically tracking maximal-probability paths. However, its

O (V^{2} \times t)

complexity is prohibitive for large vocabularies (

V =

32,000). To mitigate complexity, beam search, a heuristic algorithm prevalent in NLP, is adapted. Beam search sets up a search tree using breadth-first search, sacrificing optimality for efficiency. In beam search, the beam width K governs its operation. At each timestep, it retains the top K sequences by score, expanding each to generate

K \times V

candidates. From these, the next K best sequences are selected to continue expansion.

Beam search transitions into the Viterbi algorithm when

K = V

, and into a greedy decoding algorithm when

K = 1

. It strikes a balance between the optimality of Viterbi and the efficiency of greedy decoding, making it suitable for scenarios where minor errors are acceptable given the expansive search space.

Our decoding approach is modified using the scoring function described by Equation (20). Specifically, a max heap of size K is maintained to store high-scoring sequences. Then, K sequences are used to predict the next symbol and these are added to the heap. Then, we extract the current top K for the next iteration:

s o c r e = N_{0} ln P (w_{i} ∣ w_{i - 1}, w_{i - 2}, \dots, w_{1}) - {∥ o_{i} - h_{i} \otimes s ∥}^{2},

(20)

Beam search decoding with a computational complexity of

O (K \times V \times t)

facilitates parallelized computation across K sequences. Algorithm 1 outlines the workflow, designed to demodulate text efficiently. By adjusting K, computational efficiency against decoding accuracy can be balanced. Beam search offers a feasible, though suboptimal, approach leveraging the capabilities of LLMs. The specific decoding process is illustrated in Figure 1.

Algorithm 1 details beam search decoding, where

Beam size (K): Controls the trade-off between accuracy (higher K) and complexity;
Target tokens (t): Predefined based on transmitted sequence length;
Noise PSD ( $N_{0}$ ): Scales the semantic prior term in Equation (20).

The output W is the highest-scoring token sequence after t iterations.

Algorithm 1 Beam search for text decoding

Input: O: Received symbols from wireless channel
K: beam size
t: the number of target decoding tokens
$N_{0}$ : PSD of noise

Output: $W :$ Decoding tokens
  1.
   $B_{0} \leftarrow {< 0, >}$
  2.
  for $i \in {0, \dots, t - 1} :$
  3.
         $B \leftarrow \infty$
  4.
        for $< s o r c e, y > \in B_{i} - 1$
  5.
           for $s \in S$
  6.
                  $s o r c e \leftarrow N_{0} l n (P (s | y)) - | | o_{i} - h_{i} \otimes s {| |}^{2}$
  7.
            $B . a d d (< s o r c e, y \circ s >)$
  8.
   $B_{t} \leftarrow B . t o p (K)$
  9.
  return $B . m a x ()$

3.4. Performance Metrics

Performance metrics are essential for evaluating proposed methods in communication systems. In end-to-end communication, BER is commonly adopted as the training target by both transmitters and receivers, yet it often overlooks broader communication goals. BER may not accurately reflect the performance of semantic communication systems. Consequently, novel metrics such as Bilingual Evaluation Understudy (BLEU) and Word Error Rate (WER) have been proposed, focusing on word-level similarity between transmitter and receiver outputs. However, these metrics do not fully capture the similarity between entire sentences. To address this gap, metrics utilizing pre-trained models like BERT have emerged for evaluating semantic similarity. This paper selects evaluation metrics that encompass both traditional technical communication systems and emerging semantic communication paradigms, providing a comprehensive assessment of the proposed method.

(1) BLEU: BLEU is a metric commonly used to evaluate the quality of machine-generated text against one or more reference texts. Originally developed for machine translation, it has been adapted in semantic communication systems where assessing the quality of generated outputs is crucial. BLEU calculates a score based on the precision of n-grams (continuous sequences of n items, typically words) between the generated output and the reference texts. It quantifies how closely the generated text matches the reference texts in terms of these n-grams, providing a numerical assessment of similarity and fluency. For a transmitted sentence s of length

l_{s}

and its decoded counterpart

\hat{s}

with length

l_{\hat{s}}

, the BLEU score is calculated as follows:

log BLEU = min (1 - \frac{l_{\hat{s}}}{l_{s}}, 0) + \sum_{n = 1}^{N} u_{n} log p_{n},

(21)

where

u_{n}

are weights assigned to n-grams, and

p_{n}

denotes the n-gram score:

p_{n} = \frac{\sum_{k} min (C_{k} (\hat{s}), C_{k} (s))}{\sum_{k} min (C_{k} (\hat{s}))},

(22)

Here,

C_{k} (\cdot)

represents the frequency count function for the k-th elements in n-grams. BLEU captures contextual relationships to some extent but primarily evaluates superficial similarity of n-grams without considering semantic equivalence. Consequently, BLEU may assign a lower score even when the generated text is semantically aligned with the reference, due to differences in expressions or the use of synonyms.

(2) Sentence similarity: For addressing the issue of polysemous words, Xie et al. introduce a novel evaluation metric for sentence similarity [5]. Sentence similarity assesses the semantic equivalence between two sentences using a pre-trained model. Such models, exemplified by BERT [23], are natural language processing models trained extensively on diverse corpora. The semantic similarity between sentences s and

\hat{s}

, both from the sender and receiver perspectives, is computed as follows:

match (s, \hat{s}) = \frac{B_{Φ} (s) B_{Φ} {(\hat{s})}^{T}}{∥B_{Φ} (s)∥ ∥B_{Φ} (\hat{s})∥},

(23)

where

B_{Φ}

, based on BERT, represents a highly parameterized pre-trained model designed for extracting semantic information from text.

(3) BER: BER is a widely adopted metric for assessing the performance of communication systems, quantifying the likelihood of correctly receiving a transmitted symbol. A low BER signifies the system’s capability to accurately recover transmitted symbols. However, bit-level errors often inadequately reflect the performance of semantic transmission systems. For instance, occasional bit errors per word may yield a negligible BER, yet render the received text unintelligible due to the absence of correctly received words. Conversely, texts with notably high BER can remain understandable if crucial semantic elements are correctly deciphered. Leveraging LLM-SC, which achieves error-free transmission and facilitates communication compatible across technical and semantic levels, BER serves as a pertinent evaluation metric.

(4) Token Error Rate (TER): The token acts as the fundamental unit for transmission and demodulation in LLM-SC. In addition to BER, errors in tokens reflect the challenge of accurately comprehending a sentence. For instance, a low BER combined with errors uniformly distributed across t tokens results in a high TER, indicating persistent difficulty in sentence comprehension by the receiver. Conversely, errors concentrated within a specific token, despite a potentially high BER, do not impair semantic understanding, thereby maintaining system effectiveness. This metric bears resemblance to the WER discussed in [40]. While each token can correspond to a word, they encompass a broader spectrum of entities and constitute the basic input units for LLMs.

4. Numerical Results

Simulations have been conducted to assess the capability of LLM-SC, focusing on measuring feasibility rather than extensive benchmarking. From the discussions above, it is concluded that using existing LLMs allows for the verification of the feasibility of the proposed method, while the vocabulary size V remains fixed. Adjusting the modulation order provides a means to control communication efficiency. In this section, we present the existing LLMs that we employed without fine-tuning or additional training to verify the feasibility of LLM-SC. We present a comprehensive comparison of existing semantic communication schemes among multiple evaluation metrics and discuss factors influencing their performance. Notably, this method accounts for the technical communication context, enabling restoration of original characters and facilitating comparison with traditional communication systems in terms of BER, a challenging metric to compare all semantic communication systems.

The dataset consists of English text extracted from the Europarl proceedings [41], widely utilized in semantic communication tasks. Due to computational constraints, we conducted preprocessing to eliminate extraneous characters and filter out sentences that were excessively short or long. The LLM utilized in our simulations was Vicuna (v1.5), fine-tuned from LLaMA. Vicuna-7B v1.5 is chosen for its balance of efficiency and performance. Fine-tuned from LLaMA on conversational data, it achieves 90% of ChatGPT’s quality on MT-Bench [42]. Its 4,096-token context window supports long-sequence decoding, and the 32,000-token vocabulary covers diverse linguistic constructs in the Europarl corpus. In fact, we can use any existing LLM to conduct experiments.

AWGN and Rayleigh fading channels were selected for their fundamental representation of real-world scenarios. The AWGN channel models ubiquitous thermal noise, while Rayleigh fading captures multipath effects in non-line-of-sight environments (e.g., urban areas). These models are standardized benchmarks in 3GPP specifications [43]. Table 2 outlines the key simulation settings.

We conducted simulations using an NVIDIA RTX 4090 for performance evaluation. Specifically, 16QAM and 8-QAM modulations were adopted for comparisons. The choice of 16QAM facilitated comparisons with traditional communication systems, while 8-QAM was selected for comparison against the findings in the literature [5]. Our simulations encompassed AWGN and Rayleigh fading channel models, and include multiple sets of comparative analyses to assess the influence of beam size.

4.1. The Performance of BLEU and Sentence Similarity

To evaluate the performance of LLM-SC in comparison with existing popular methods in semantic communication systems, we utilized BLEU and sentence similarity as our evaluation metrics. Unlike the modulation method previously discussed, the methods presented in [5] employ various modulation schemes. To ensure a fair comparison with the work in [5], we used bits per transmitted symbol (bps) as the metric for coding efficiency, which quantifies the number of information bits conveyed by each modulation symbol. According to [5], their coding efficiency is approximately

1.07

bps. Hence, for LLM-SC to achieve comparable coding efficiency, we employed a 15 bits per token encoding strategy coupled with an 8-QAM modulation scheme, wherein each token is represented by five modulation symbols in a high-dimensional constellation. This approach ensures that the coding efficiency of LLM-SC aligns with that of the DeepSC method proposed in [5], thereby maintaining equivalence in the number of transmitted characters for a given number of symbols.

Figure 2 illustrates the relationship between the BLEU score and the SNR for identical bps over AWGN and Rayleigh fading channels. The comparison includes traditional techniques such as Huffman coding with RS (30,42) in 8-QAM, 5-bit coding with RS (42,54) in 64-QAM, Huffman coding with Turbo coding in 64-QAM, 5-bit coding with Turbo coding in 128-QAM, Brotli coding with Turbo coding in 8-QAM, and the DNN-based JSCC trained over AWGN channels and Rayleigh fading channels, as reported in [5], alongside the DeepSC approach, which employs trained modulation maps.

It can be observed in Figure 2a that AI-based methods generally outperform traditional source–channel separation coding methods. Specifically, at SNR levels below 3 dB, the DeepSC model demonstrates the best performance in terms of BLEU score, indicating its superior adaptability to channel noise under low-SNR conditions. Conversely, for SNR levels above 3 dB, the LLM-SC model exhibits superior BLEU performance across all n-grams, significantly surpassing traditional coding schemes. This suggests that, under high-SNR conditions, the LLM-SC model is more effective in recovering semantic information. Notably, the DeepSC model fails to achieve completely error-free transmission even at very-high-SNR levels, implying that DeepSC cannot ensure the reliable transmission of all semantic information. In contrast, the LLM-SC model achieves a BLEU score of 1 at high-SNR levels, with near-zero errors at SNRs above 10 dB. Additionally, the BLEU performance of LLM-SC improves rapidly with increasing SNRs. When comparing BLEU scores across multiple n-grams, it is evident that the degradation of the BLEU score with increasing n in n-grams is slowest for the LLM-SC method. This is reflected in the diminishing advantage of DeepSC over LLM-SC with increasing n at low-SNR levels, whereas at high-SNR levels, the advantage of LLM-SC over DeepSC strengthens with larger n-grams. This indicates that the LLM-SC method is more favorable for semantic coherence, prioritizing the decoding of semantically continuous tokens, which aligns with the characteristics of LLMs.

Figure 2b shows the performance of LLM-SC under Rayleigh fading channels. Traditional encoding and decoding schemes experience significant performance degradation in Rayleigh channels, with the highest BLEU score not exceeding 0.6 at an SNR of 18 dB. In contrast, DeepSC shows better adaptability to Rayleigh fading channels, indicating that neural network-based semantic encoding and decoding systems can effectively learn channel characteristics. However, DeepSC still fails to achieve error-free transmission, even when BLEU scores are high. The LLM-SC model demonstrates performance under Rayleigh fading channels similar to that in AWGN channels, with no noticeable degradation, consistent with the conclusion of Equation (10). Furthermore, LLM-SC achieves error-free transmission at high-SNR levels and outperforms all compared models. When comparing multiple n-gram scores, the advantage of LLM-SC over other methods heightens with higher n-grams at high-SNR levels.

Figure 3 presents a comparison of sentence similarity metrics. In Figure 3a, all methods exhibit the same increasing trend in sentence similarity as the SNR increases. Traditional methods such as Huffman with Turbo coding achieve a BLEU (1-gram) score of 0.2 under 9 dB, yet the sentence similarity approaches 0. This indicates that even if some words are received, the sentence cannot be properly understood. Machine learning-based methods like DeepSC and LLM-SC show consistency between sentence similarity and BLEU scores. At low SNRs, DeepSC has higher similarity than LLM-SC, but DeepSC has an upper bound across the entire SNR range and cannot achieve error-free transmission similar to the BLEU score. In contrast, LLM-SC demonstrates better competitiveness at high SNR, capable of transmitting complete semantic information. In Figure 3b, under the Rayleigh fading channel, all methods experience significant degradation compared to the AWGN channel, indicating that channel fading has a substantial impact on semantic transmission. In this scenario, LLM-SC still maintains a high advantage at high SNRs, achieving near-error-free transmission, but performs slightly worse than DeepSC and JSCC at low SNRs due to the impact of channel fading.

4.2. The Performance of BER and TER

To validate the effectiveness and reliability of the LLM-SC, we also conducted a comparison with traditional technical communication systems. Based on the observations from the previous subsection, the performance of the LLM-SC under AWGN and Rayleigh fading channels is comparable; therefore, the simulations in this subsection were conducted only under AWGN channels.

Unlike BLEU and sentence similarity, which are often used for evaluating semantic communication, both LLM-SC and classical algorithms utilize classical modulation schemes such as QAM and QPSK. Therefore, we compared the BER and TER under the same modulation scheme, allowing us to assess system performance under identical modulation and joint coding efficiency conditions. In this simulation, we employed the commonly used 16-QAM modulation scheme for a fair comparison. To ensure a fair comparison, we first calculated the equivalent joint source–channel coding rate of LLM-SC using the metric bits per symbol (bpc), which represents the number of bits used for each character transmitted through the channel. In technical communication systems, this value is typically the product of the source coding rate and the channel coding rate. Hence, we compute the average equivalent joint source–channel coding rate for LLM-SC. Based on simulation data presented in Table 2, the equivalent joint coding rate of LLM-SC is approximately

R \approx \frac{78.93 \times 16}{348.87} \approx 3.62

(bpc). The factor of 16 arises from the use of a vocabulary size of 32,000, necessitating at least 15 bits for representation, along with the employment of 16-QAM modulation. Consequently, in a high-dimensional constellation, at least four modulation symbols are required to represent a token. Since 16-QAM carries 4 bits per modulation symbol, 16 bits are needed per token to be encoded. The approximation (≈) indicates that this rate is derived from a large number of sentences, and may vary across different sentences. This detailed analysis ensures a rigorous comparison of the LLM-SC system against classical communication systems under the same modulation conditions, providing insights into their relative performance in terms of BER and TER.

To benchmark against traditional technical communication systems, we calculate the coding rate of commonly used source coding methods, such as Huffman coding, zlib, and arithmetic coding (AC), also expressed in bpc, as shown in Table 3.

As shown in Table 3, the joint coding efficiency of LLM-SC surpasses that of conventional text compression algorithms. Notably, LLM-SC maintains semantic-level redundancy, which enhances error correction capabilities at the receiver. In contrast, for technical communication systems, achieving the same bpc precludes the incorporation of additional channel coding schemes, as these would increase the bpc. Nevertheless, even without considering channel coding, the bpc of technical systems remains higher to that of LLM-SC. Therefore, for subsequent comparisons, channel coding schemes will be excluded for technical communication systems.

We evaluated the demodulation performance by transmitting English text samples from the dataset, comparing LLM-SC in 16-QAM with UTF-8 encoding (hard demodulation) in 16-QAM. The metrics used for evaluation were BER and TER, as tokens represent the fundamental unit of transmission and reception. The beam size utilized in the simulation was 15.

The results, presented in Figure 4, demonstrate that LLM-SC significantly outperforms hard demodulation. For BER performance, LLM-SC exhibits strong competitiveness across the entire SNR curve. Notably, as SNR rises, the BER of LLM-SC declines rapidly, while the decline for the hard demodulation method is much slower. At the common

10^{- 3}

BER threshold, LLM-SC achieves a coding gain of approximately 8 dB. Furthermore, no bit errors are observed in the simulation for SNR values exceeding 16 dB, indicating that LLM-SC can achieve error-free bit transmission in technical communication systems.

Regarding TER, hard demodulation struggles at SNR values below 14 dB, whereas LLM-SC reaches a

10^{- 3}

TER at 14 dB, successfully decoding the majority of words. The improvement in TER is consistent across all SNR levels, as LLM-SC leverages contextual information. Similarly, TER decreases faster with increasing SNR, and no token errors are found for SNR values above 16 dB, indicating that LLM-SC can achieve error-free token transmission in technical communications.

In summary, the superior coding efficiency and the preservation of semantic redundancy for error correction underscore the advantages of LLM-SC over traditional methods, positioning LLM-SC as a more effective alternative for semantic communication tasks.

4.3. Effect of Beam Size

Decoding performance heavily depends on the beam search width K. We conducted simulations comparing various K values, averaging BLEU and sentence similarity performance at each SNR, as shown in Figure 5. Narrow beams (

K = 1

) performed the poorest, akin to greedy search, resulting in the lowest BLEU and sentence similarity scores. Three key patterns emerged from the averaged results:

Significant Gain from Small K: Increasing K from 1 to 5 yields substantial improvements:
- BLEU-1 increases by $Δ 0.73$ (from ∼0.05 to ∼0.78);
- Sentence similarity increases by $Δ 0.27$ (from ∼0.31 to ∼0.58).
This reveals beam search’s critical advantage over greedy decoding ( $K = 1$ ).
Diminishing Returns Beyond $K = 10$ : Further increasing K from 10 to 30 provides minimal gains:
- BLEU-1 improves by only $Δ 0.07$ (from ∼0.85 to ∼0.92);
- Sentence similarity improves by $Δ 0.07$ (from ∼0.71 to ∼0.78).
The relative gain drops below 10% for $K > 10$ , indicating rapidly diminishing returns.
Optimal Operating Point: $K = 10$ achieves >90% of the maximum observed performance for both metrics. This suggests $K_{opt} \approx 10$ provides the best accuracy-complexity trade-off for practical deployment.

These findings accord with the theoretical expectation: LLM token distributions are typically concentrated on a few high-probability candidates. Beyond the top-K paths (where K matches the typical number of plausible continuations), additional exploration yields minimal benefits.

In summary, a moderate K strikes a balance between efficiency and accuracy. Values around 10 optimize both performance and complexity by concentrating the search on the most probable sequences informed by language model statistics. This approach leverages the inherent knowledge of the language model to prune implausible decoding.

5. Discussion

5.1. Case Analysis and Interpretations

Table 4 shows a comparison of the demodulating capabilities of the LLM-SC and DeepSC over Rayleigh fading channels. At a 6 dB SNR, DeepSC can demodulate some words, realizing a 1-gram BLEU score of 0.54. However, it is challenging to extract useful information from the received content, while its BER is as high as 0.39. In contrast, LLM-SC achieves a 1-gram BLEU score of 0.94 at 6dB SNR, allowing us to obtain most of the intended semantics from the transmitted sentence, with a BER of only 0.026. At 12 dB SNR, DeepSC reaches a BLEU score of 0.72, enabling partial understanding of the transmitted information, though accurately grasping the semantics remains difficult, with a BER of 0.32. Comparatively, LLM-SC achieves error-free transmission at 12 dB SNR, allowing complete recovery of the transmitted sentence, with optimal BLEU and BER performance. Additionally, at 3 dB SNR, LLM-SC still achieves a BLEU score of 0.86, enabling us to understand most of the critical information from the received data. Of course, the sampling selected a commonly used English sentence with strong context relevance. The results of the average performance are shown in Figure 2.

A deeper analysis reveals that LLM-SC concentrates errors on a limited number of tokens, allowing for the demodulation of the majority of tokens and thus enabling a high level of information understanding. In contrast, DeepSC prioritizes improving BLEU scores by considering entire sentences. While it correctly demodulates many words, resulting in a higher BLEU score, semantic comprehension remains imperfect. This underscores the limitations of using BLEU as a metric for semantic communication. Importantly, LLM-SC achieves a harmonious integration of technical and semantic communication, with metrics consistently evaluating both aspects. Unlike DeepSC, which excels semantically in BLEU but struggles technically with BER, and unlike traditional technical communication, which excels BER but falls short semantically, LLM-SC effectively balances both dimensions.

In essence, leveraging mutual information between symbols and meanings, LLM-SC reliably transmits semantic content despite noise. Rather than solely maximizing symbolic fidelity, the system preserves information at higher linguistic levels—the essence of effective communication.

5.2. Complexity Considerations

The computational complexity of LLM-SC and DeepSC is compared in Table 5 in terms of the average demodulating runtime per character. It is evident that the runtime of LLM-SC significantly exceeds that of DeepSC, primarily due to the vast number of parameters in LLMs. The assumptions in this paper presume unlimited computational power at both the transmitter and receiver. However, LLM-SC introduces a novel paradigm for semantic communication systems. With the exponential growth in computational power, real-time demodulation by both transmitter and receiver becomes achievable, potentially mitigating the complexity concerns. Moreover, comparing the runtime of machine learning-based semantic communication methods on GPUs is inherently biased, as practical communication devices are unlikely to integrate high-performance GPUs at both ends.

5.3. Insights and Challenges

Relative to classical DeepSC, LLM-SC manifests several distinguishing characteristics:

Longer context length: Word encoding and decoding lengths can match LLM’s context length, whereas DeepSC’s maximum length is limited to 30. According to Shannon’s information theory, longer code lengths theoretically enhance error correction capability.
DeepSC disregards distinctions such as uppercase vs. lowercase, punctuation, and special characters, simplifying semantic encoding. In contrast, LLMs consider the entire natural language vocabulary, enhancing adaptability and alignment with actual usage.
DeepSC’s training on the European Parliament dataset restricts its adaptability to other corpora, while LLMs are typically trained on broad natural language corpora, potentially making them applicable across multiple languages.
DeepSC outputs fixed-length symbols regardless of input length, whereas LLM-SC adapts symbol length based on input, potentially improving efficiency.

However, challenges remain in terms of computation constraints and real-time requirements before fully harnessing LLMs’ potential in semantic communication. Advances in model architecture, accelerators, compression, and quantization methods can mitigate these challenges. Future avenues include benchmarking different model architectures, analyzing artifacts, enhancing robustness, and conducting comparative studies across diverse datasets and languages. Exploring joint optimizations with classical error-correcting codes also holds promise.

6. Conclusions

This paper introduces a novel LLM-SC framework for textual data within wireless communication systems. We propose that leveraging the tokenizer of an LLM serves as an effective joint source–channel coder. An optimal LLM-enabled decoding and demodulation method is derived to resiliently demodulate text by integrating the LLM’s contextual understanding. It is deduced that the pre-training of LLMs fundamentally constructs the encoding and decoding mechanisms for semantic communication, achievable without altering the original training process. Existing pre-trained models can be utilized for semantic encoding and decoding.

Extensive simulations demonstrate that LLM-SC outperforms traditional communication systems in terms of bit error rate (BER) at the technical communication level and outcompetes current machine learning-based methods in semantic-level metrics. Compared to technical communication algorithms, LLM-SC excels in extracting contextual relationships from text and leveraging the receiver LLM’s contextual understanding for error correction, thereby achieving lower BER. In comparison to other semantic communication systems, LLM-SC tends to demodulate meaningful sentences, thus outperforming in BLEU and sentence similarity metrics. Furthermore, the LLM acts as a semantic knowledge base in these systems, providing both sender and receiver with a probabilistic distribution of transmitted sequences.

In conclusion, this pioneering study demonstrates initial feasibility and motivates further research in co-designing LLMs to advance intelligent communication systems. Beyond maximizing bit transmission, integrating higher-level semantics is crucial for unlocking future capabilities. This work makes a significant step toward LLM-empowered wireless systems focused on meaningful transmission.

Author Contributions

Conceptualization, Z.W. and R.L.; methodology, Z.W.; software, L.Z.; validation, S.W.; formal analysis, F.L.; investigation, K.L.; resources, L.Z.; data curation, R.L.; writing—original draft preparation, Z.W.; writing—review and editing, K.L. and H.M.; visualization, R.L.; supervision, F.L.; project administration, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Natural Science Foundation of China under Grant No. U2441226.

Institutional Review Board Statement

Ethical review and approval were waived for this study because it is an in vitro study, and there is no human and animal participation.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code and associated scripts (V1.0) for this study are openly available on GitHub at https://github.com/gujianhunwang/LLM_com accessed on 15 May 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLM	Large Language Model
NLP	Natural Language Processing
BER	Bit Error Ratio
SNR	Signal-to-Noise Ratio
AI	Artificial Intelligence
JSCC	Joint Source–Channel Coding
CSI	Channel Impulse Response
PSD	Power Spectral Density
BLEU	Bilingual Evaluation Understudy
WER	Word Error Rate
TER	Token Error Rate
bps	Bits Per Transmitted Symbol
AC	Arithmetic Coding
BERT	Bidirectional Encoder Representations from Transformers

References

Shannon, C.E. A mathematical theory of communication. Bell Syst. Technol. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Weaver, W. Recent contributions to the mathematical theory of communication. In ETC: A Review of General Semantics; Institute of General Semantics: New York, NY, USA, 1953; pp. 261–281. [Google Scholar]
Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Information; Massachusetts Institute of Technology: Cambridge, MA, USA, 1952. [Google Scholar]
Shao, Y.; Cao, Q.; Gündüz, D. A theory of semantic communication. IEEE Trans. Mob. Comput. 2024, 23, 12211–12228. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y.; Juang, B.H. Deep learning enabled semantic communication systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Zhou, Q.; Li, R.; Zhao, Z.; Peng, C.; Zhang, H. Semantic communication with adaptive universal transformer. IEEE Wirel. Commun. Lett. 2021, 11, 453–457. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y. Task-oriented multi-user semantic communications for VQA. IEEE Wirel. Commun. Lett. 2021, 11, 553–557. [Google Scholar] [CrossRef]
Weng, Z.; Qin, Z.; Li, G.Y. Semantic communications for speech signals. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
Huang, D.; Tao, X.; Gao, F.; Lu, J. Deep learning-based image semantic coding for semantic communications. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Wang, S.; Dai, J.; Liang, Z.; Niu, K.; Si, Z.; Dong, C.; Qin, X.; Zhang, P. Wireless deep video semantic transmission. IEEE J. Sel. Areas Commun. 2022, 41, 214–229. [Google Scholar] [CrossRef]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A Comprehensive Overview and Analysis of Large Language Models: Trends and Challenges. IEEE Access 2025, 13, 95851–95875. [Google Scholar] [CrossRef]
Guo, S.; Wang, Y.; Li, S.; Saeed, N. Semantic Communications with Ordered Importance using ChatGPT. arXiv 2023, arXiv:2302.07142. [Google Scholar]
Shi, G.; Li, Y.; Xie, X. Semantic communications: Outcome of the intelligence era. Pattern Recognit. Artif. Intell. 2018, 31, 91–99. [Google Scholar]
Azzouz, E.E.; Nandi, A.K.; Azzouz, E.E.; Nandi, A.K. Modulation recognition using artificial neural networks. In Automatic Modulation Recognition of Communication Signals; Springer: Berlin/Heidelberg, Germany, 1996; pp. 132–176. [Google Scholar]
Suetrong, N.; Taparugssanagorn, A.; Promsuk, N. Enhanced Modulation Recognition Through Deep Transfer Learning in Hybrid Graph Convolutional Networks. IEEE Access 2024, 12, 54553–54566. [Google Scholar] [CrossRef]
Jia, Y.; Huang, Z.; Luo, K.; Wen, W. Lightweight Joint Source-Channel Coding for Semantic Communications. IEEE Commun. Lett. 2023, 12, 18447–18450. [Google Scholar] [CrossRef]
Choi, K.; Tatwawadi, K.; Weissman, T.; Ermon, S. NECST: Neural joint source-channel coding. arXiv 2018, arXiv:1811.07557. [Google Scholar]
Wang, X.; Guan, K.; He, D.; Hrovat, A.; Liu, R.; Zhong, Z.; Al-Dulaimi, A.; Yu, K. Graph Neural Network enabled Propagation Graph Method for Channel Modeling. IEEE Trans. Veh. Technol. 2024, 73, 12280–12289. [Google Scholar] [CrossRef]
Yuan, C.; Wu, C.; Cheng, D.; Yang, Y. Deep learning in encoding and decoding of polar codes. In Proceedings of the 2018 2nd International Conference on Data Mining, Communications and Information Technology (DMCIT 2018), Shanghai, China, 25–27 May 2018; Volume 1060, p. 012021. [Google Scholar]
Soltani, M.; Pourahmadi, V.; Mirzaei, A.; Sheikhzadeh, H. Deep learning-based channel estimation. IEEE Commun. Lett. 2019, 23, 652–655. [Google Scholar] [CrossRef]
Hekland, F.; Floor, P.A.; Ramstad, T.A. Shannon-kotel-nikov mappings in joint source-channel coding. IEEE Trans. Commun. 2009, 57, 94–105. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 23 June 2025).
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Berant, J.; Chou, A.; Frostig, R.; Liang, P. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, DC, USA, 8–21 October 2013; pp. 1533–1544. [Google Scholar]
Fellbaum, C. WordNet. In Theory and Applications of Ontology: Computer Applications; Springer: Berlin/Heidelberg, Germany, 2010; pp. 231–243. [Google Scholar]
Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of tricks for efficient text classification. arXiv 2016, arXiv:1607.01759. [Google Scholar]
Alhammadi, A.; Shayea, I.; El-Saleh, A.A.; Azmi, M.H.; Ismail, Z.H.; Kouhalvandi, L.; Saad, S.A. Artificial Intelligence in 6G Wireless Networks: Opportunities, Applications, and Challenges. Int. J. Intell. Syst. 2024, 2024, 8845070. [Google Scholar] [CrossRef]
Jiang, F.; Peng, Y.; Dong, L.; Wang, K.; Yang, K.; Pan, C.; You, X. Large AI Model Empowered Multimodal Semantic Communications. arXiv 2023, arXiv:2309.01249. [Google Scholar] [CrossRef]
Jiang, F.; Peng, Y.; Dong, L.; Wang, K.; Yang, K.; Pan, C.; You, X. Large AI model-based semantic communications. IEEE Wirel. Commun. 2024, 31, 68–75. [Google Scholar] [CrossRef]
Shen, Y.; Shao, J.; Zhang, X.; Lin, Z.; Pan, H.; Li, D.; Zhang, J.; Letaief, K.B. Large language models empowered autonomous edge AI for connected intelligence. IEEE Commun. Mag. 2024, 62, 140–146. [Google Scholar] [CrossRef]
Valmeekam, C.S.K.; Narayanan, K.; Kalathil, D.; Chamberland, J.F.; Shakkottai, S. LLMZip: Lossless Text Compression using Large Language Models. arXiv 2023, arXiv:2306.04050. [Google Scholar]
Zhao, Y.; Yue, Y.; Hou, S.; Cheng, B.; Huang, Y. LaMoSC: Large Language Model-Driven Semantic Communication System for Visual Transmission. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 2005–2018. [Google Scholar] [CrossRef]
Chah, N. OK Google, What Is Your Ontology? Or: Exploring Freebase Classification to Understand Google’s Knowledge Graph. arXiv 2018, arXiv:1805.03885. [Google Scholar]
Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909. [Google Scholar]
Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
Rust, P.; Pfeiffer, J.; Vulić, I.; Ruder, S.; Gurevych, I. How good is your tokenizer? on the monolingual performance of multilingual language models. arXiv 2020, arXiv:2012.15613. [Google Scholar]
Farsad, N.; Rao, M.; Goldsmith, A. Deep learning for joint source-channel coding of text. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2326–2330. [Google Scholar]
Koehn, P. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit X: Papers, Phuket, Thailand, 13–15 September 2005; pp. 79–86. [Google Scholar]
Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv 2023, arXiv:2306.05685. [Google Scholar]
Maxwell, J.C. 5G NR User Equipment (UE) radio transmission and reception; Part 1: Range 1 standalone release 15 V. 15.2. 0 document 3GPP TS 38.101–1. Treatise Electr. Magn. 2018, 2, 68–73. [Google Scholar]
Park, S.; Simeone, O.; Kang, J. End-to-end fast training of communication links without a channel model via online meta-learning. In Proceedings of the 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Virtual, 26–29 May 2020; pp. 1–5. [Google Scholar]

Figure 1. The structure of LLM-SC.

Figure 2. BLEU scores versus the SNR for the same bps, with Huffman coding with RS (30,42) in 64-QAM, 5-bit coding with RS (42,54) in 64-QAM, Huffman coding with Turbo coding in 64-QAM, 5-bit coding with Turbo coding in 128-QAM, Brotli coding with Turbo coding in 8-QAM, the DNN-based JSCC [44] trained over AWGN channels and Rayleigh fading channels, DeepSC trained over the AWGN channels and Rayleigh fading channels [5], and finally, our proposed LLMS-SC.

Figure 3. Sentence similarity versus SNRs for the same number of bits per transmitted symbol, with Huffman coding with RS (30,42) in 64-QAM, 5-bit coding with RS (42,54) in 64-QAM, Huffman coding with Turbo coding in 64-QAM, 5-bit coding with Turbo coding in 128-QAM, Brotli coding with Turbo coding in 8-QAM, the DNN-based JSCC trained over AWGN channels and Rayleigh fading channels [44], DeepSC trained over the AWGN channels and Rayleigh fading channels [5], and finally, our proposed LLM-SC.

Figure 4. The BER and TER performance of LLM-SC and hard demodulation.

Figure 5. Performance of LLM-SC with different beam search widths.

Table 1. Summary of related works.

Research Dimension	Paradigm Characteristics	Related Works	Limitations and Research Gaps
AI-enabled wireless communication	Neural network-based, end-to-end, data-based	[15,16,17,18,19,20,21,22]	Limited to low-level signal processing; no semantic integration; no physical layer integration
Semantic representation	Embedding spaces, contextual encoding, knowledge structuring	[23,24,25,26,27,28,29,30]	Not designed for communication; isolated from PHY layer; static representations; no channel adaptation
LLM-enabled frameworks	Generative semantics, knowledge grounding, cross-modal alignment,	[13,31,32,33,34,35]	High complexity; shallow PHY integration; computational overhead

Table 2. Simulation settings.

Parameters	Value
LLM Model	Vicuna 7b v1.5
Dataset	European Parliament Proceeding
CPU	16 vCPU AMD EPYC 9654 96-Core Processor
GPU	NVIDIA RTX 4090
Average number of characters of sent text (100,000 samples)	348.87
Average number of tokens in sent text	78.93
SNR (dB)	2–20
Temperature	1.0
Beam size	1–30
Modulation scheme	8-QAM/16-QAM
Vocabulary size	32,000
The number of bits per token	15, 16
Channel model	AWGN/Rayleigh

Table 3. Coding rate of some text compression algorithms.

Coding Algorithm	Coding Rate (bpc)
Huffman	3.736
Zlib	4.904
AC + MultiPPM	5.76
LLM-SC	3.62

Table 4. The sample sentences between different methods over Rayleigh channels.

Method	Content	BLEU (1-Gram)	BER
transmitted sentence	life is just a series of trying to make up your mind about what you want to do and then doing it.	-	-
LLM-SC (12 dB)	life is just a series of trying to make up your mind about what you want to do and then doing it.	1	0
LLM-SC (6 dB)	life is just a series of steps to make up your mind about what you want to do and then doing it.	0.94	0.026
LLM-SC (3 dB)	hardly ever just a game of trying to make up your mind about what you want to do and then doing it.	0.86	0.031
DeepSC (12 dB)	funding is just a number of clearly to make up your note about what you want to do that and then at	0.72	0.32
DeepSC (6 dB)	secondly is just a m of having to make up your speaking about what you to have to do now so how at at	0.54	0.39

Table 5. An example of LLM demodulation.

Method	Time
LLM-SC	9.2 ms/character
DeepSC	1.24 ms/character

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Zou, L.; Wei, S.; Li, K.; Liao, F.; Mi, H.; Lai, R. Large-Language-Model-Enabled Text Semantic Communication Systems. Appl. Sci. 2025, 15, 7227. https://doi.org/10.3390/app15137227

AMA Style

Wang Z, Zou L, Wei S, Li K, Liao F, Mi H, Lai R. Large-Language-Model-Enabled Text Semantic Communication Systems. Applied Sciences. 2025; 15(13):7227. https://doi.org/10.3390/app15137227

Chicago/Turabian Style

Wang, Zhenyi, Li Zou, Shengyun Wei, Kai Li, Feifan Liao, Haibo Mi, and Rongxuan Lai. 2025. "Large-Language-Model-Enabled Text Semantic Communication Systems" Applied Sciences 15, no. 13: 7227. https://doi.org/10.3390/app15137227

APA Style

Wang, Z., Zou, L., Wei, S., Li, K., Liao, F., Mi, H., & Lai, R. (2025). Large-Language-Model-Enabled Text Semantic Communication Systems. Applied Sciences, 15(13), 7227. https://doi.org/10.3390/app15137227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Large-Language-Model-Enabled Text Semantic Communication Systems

Abstract

1. Introduction

2. Related Works

2.1. AI-Enabled Wireless Communication

2.2. Semantic Representation in Natural Language Processing

2.3. LLM-Enabled Semantic Communication

3. System Model and Mechanism

3.1. Problem Description

3.2. System Model

3.3. Decoding Algorithm of LLM-SC

3.4. Performance Metrics

4. Numerical Results

4.1. The Performance of BLEU and Sentence Similarity

4.2. The Performance of BER and TER

4.3. Effect of Beam Size

5. Discussion

5.1. Case Analysis and Interpretations

5.2. Complexity Considerations

5.3. Insights and Challenges

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI