Next Article in Journal
Analysis of Complex Network Attack and Defense Game Strategies Under Uncertain Value Criterion
Previous Article in Journal
Dynamic MAML with Efficient Multi-Scale Attention for Cross-Load Few-Shot Bearing Fault Diagnosis
Previous Article in Special Issue
Low-Complexity Automorphism Ensemble Decoding of Reed-Muller Codes Using Path Pruning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Efficiency Lossy Source Coding Based on Multi-Layer Perceptron Neural Network

1
Navigation College, Jimei University, Xiamen 361021, China
2
School of Ocean Information Engineering, Jimei University, Xiamen 361021, China
3
Department of Information and Communication Engineering, Xiamen University, Xiamen 361005, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(10), 1065; https://doi.org/10.3390/e27101065
Submission received: 16 September 2025 / Revised: 4 October 2025 / Accepted: 13 October 2025 / Published: 14 October 2025
(This article belongs to the Special Issue Next-Generation Channel Coding: Theory and Applications)

Abstract

With the rapid growth of data volume in sensor networks, lossy source coding systems achieve high–efficiency data compression with low distortion under limited transmission bandwidth. However, conventional compression algorithms rely on a two–stage framework with high computational complexity and frequently struggle to balance compression performance with generalization ability. To address these issues, an end–to–end lossy compression method is proposed in this paper. The approach integrates an enhanced belief propagation algorithm with a multi–layer perceptron neural network, aiming to introduce a novel joint optimization architecture described as “encoding–structured encoding–decoding”. In addition, a quantization module incorporating random perturbation and the straight–through estimator is designed to address the non–differentiability in the quantization process. Simulation results demonstrate that the proposed system significantly improves compression performance while offering superior generalization and reconstruction quality. Furthermore, the designed neural architecture is both simple and efficient, reducing system complexity and enhancing feasibility for practical deployment.

1. Introduction

With the rapid proliferation of applications such as environmental monitoring [1], industrial applications [2], and smart cities [3], sensor networks are generating data at an unprecedented scale, yet they operate under stringent constraints in edge computing capacity, communication bandwidth, and node energy budgets [4,5,6,7]. This fundamental mismatch between data volume and available resources intensifies the demand for high–throughput [8] and low–latency [9] data processing under limited computational capabilities. Furthermore, it necessitates the development of efficient transmission strategies that accommodate narrow bandwidth and multi–hop relaying [10], while avoiding congestion and excessive energy drain [11]. These challenges underscore the critical need for efficient data acquisition, in–situ processing, and low–overhead communication protocols in the development of scalable and sustainable sensor networks [12,13,14].
To address these constraints, data compression is often applied at sensor nodes to significantly reduce the bit rate within controllable distortion limits [15,16,17]. This approach effectively decreases link occupancy, end–to–end latency, and transmission energy consumption, while also alleviating back–end storage and processing burdens [18,19,20]. Among various compression strategies, the lossy source coding based on the rate–distortion trade–off offers a compelling framework for joint node and link level optimization by exchanging minor perceptual or statistical losses for subsantial compression gains [21].
Existing lossy source coding systems can be categorized into three main approaches. The first category relies on sparse–graph codes, which establish an operational duality between lossy compression and channel decoding [12]. Studies have shown that the low–density parity–check (LDPC) structures, combined with the belief propagation (BP) and its variants, can approach theoretical rate–distortion bounds for binary symmetric sources. The second category employs structured schemes based on protograph LDPC (P–LDPC) codes, which were initially validated on binary sources and later extended to noisy environments and continuous Gaussian sources [17]. These methods facilitate hardware reuse between source and channel coding blocks, thereby reducing overall system complexity. The third category integrates sparse graphs with neural networks in an end–to–end fashion, mapping sources into pseudo–codeword spaces for structured quantization and reconstruction [21]. However, the existing implementations are employed with shallow perceptrons, which limited both modeling capacity and generalization ability.
Hence, several critical challenges persist in current lossy source coding systems. The classical BP often requires numerous iterations during encoding and quantization, resulting in high computational complexity and latency that strain real–time processing capabilities and energy budgets [22]. Furthermore, the conventional two–stage “quantization–compression” pipelines are prone to error accumulation and bit–importance imbalance, leading to additional distortion and reduced coding efficiency [21]. Although the multi–level coding (MLC) can partially mitigate information inequality, it introduces higher architectural complexity and complicates rate allocation while still failing to achieve full equalization [23]. Moreover, the end–to–end learning–based approaches often rely on shallow neural architectures that lack explicit reconstruction mechanisms, thereby limiting their compression accuracy and generalization ability [24,25]. These limitations collectively exacerbate the constraints on delay, energy consumption, and link robustness in sensor network applications [26,27,28].
To address these issues, a novel lossy source coding system featuring an end–to–end “encoding–structured encoding–decoding” architecture is proposed. A multi–layer perceptron (MLP) is employed for nonlinear feature transformation and explicit reconstruction. Furthermore, an enhanced belief propagation (EBP) quantization module is incorporated with the random perturbation and a straight–through estimator (STE) to enable differentiable training. In the coding module, a tunable P–LDPC structure offers flexible support for a wide range of compression rates. In contrast to conventional two–stage pipelines and shallow learnable baselines, the proposed system achieves unidirectional end–to–end information flow, integrates an explicit trainable reconstruction, and delivers lower distortion with improved generalization, all while maintaining practical computational complexity and enhanced rate agility.
The remainder of this paper is structured as follows: Section 2 introduces the overall system framework by elaborating on the three core components of the system, including the encoding module, the structured encoding (quantization) module, and the decoding module. Section 3 discusses essential techniques for end–to–end training, including strategies for differentiable gradient propagation, and provides a comprehensive complexity analysis. Section 4 presents the experimental setup, comparative results, and evaluations of performance and generalization capability. Finally, Section 5 concludes the paper and suggests potential applications and future research directions.

2. System Model

For clarity, the principal symbols and their meanings used in this section are summarized in Table 1. The proposed compression system integrates neural networks with P–LDPC code as shown in Figure 1. The system is composed of three main components, including the encoding, structured encoding, and decoding modules. Both the encoding and decoding modules are implemented using the MLP–based deep neural networks for nonlinear feature extraction and approximate reconstruction of the input data. The structured encoding module utilizes a quantization method based on the BP algorithm to discretize the continuous features output by the encoding module under the P–LDPC structural constraints to produce binary codewords.
In this system, the encoder comprises a cascaded structure formed by the encoding module and the quantization, while the decoder consists of a dedicated decoding module. The workflow begins with a Gaussian source sequence being transformed by a deep neural network within the encoding module to produce feature representations. These features are subsequently mapped into structured binary sequences via a BP–based quantization. Finally, the binary codes are approximately reconstructed by the neural network–based decoding module to recover the original signal. In contrast to conventional schemes, the proposed architecture eliminates feedback connections between the transformation and the quantization, adopting instead a unidirectional end–to–end information flow. This design enhances both model simplicity and inference efficiency.
The proposed system introduces key improvements over traditional frameworks. While existing algorithms perform well within LDPC structures, their direct application to Gaussian source compression often leads to high reconstruction distortion. To mitigate this, a deep MLP–based encoding module is integrated with a structured BP quantization, significantly enhancing representational capacity and reducing distortion. The architecture is fully end–to–end, allowing the encoder and decoder to adaptively reconstruct the original signal without prior assumptions by leveraging the strong approximation capabilities of deep networks. A differentiable approximation within the BP quantization is incorporated to overcome the non–differentiability of quantization for inhibiting the gradient propagation. In this case, the effective gradient flow of the encoder is optimized based on reconstruction loss. Furthermore, the system supports a wider range of compression rates by adjusting the P–LDPC matrix structure and parameters, which offers a versatile and efficient solution for lossy compression across diverse scenarios.
Referring to Figure 1, a memoryless Gaussian source sequence s of length n is first fed into the encoding module. The MLP–based encoding maps the input sequence into a continuous feature vector z of the same length. Subsequently, a BP–based structured encoding module performs a projection on z to generate a binary codeword q of length m n , which satisfies predefined P–LDPC constraints, where the length values satisfy m , n Z , and Z is a set of positive integers. After that, the quantified result q serves as the output of the encoder and is directly passed into the decoding module. The decoding module is also implemented by the MLP, aiming to reconstruct an approximate estimation s of the original source s from q .
It is noteworthy that the proposed system employs a fully end–to–end and unidirectional information flow, eliminating any feedback connections from the quantization output. This design results in a more streamlined and efficient processing pipeline. The structure and functionality of each module are described in detail in the following sections.

2.1. Encoding Module

The encoder is composed of a cascaded encoding module and structured encoding module for realizing the transformation and quantization, respectively, as illustrated in Figure 2. For a memoryless Gaussian source sequence s = { s 1 , s 2 , , s n } of length n, the feature extraction is first performed by the transformation module. This module employs the MLP to carry out deep nonlinear mapping of the input signal, resulting in a continuous feature representation as z = { z 1 , z 2 , , z n } . The continuous feature vector z is then fed into the structured quantization module, where z is quantified by the BP algorithm into a binary codeword q = { q 1 , q 2 , , q n } that conforms to the structural constraints of the P–LDPC code. Here, the variables satisfy s i , z i , q i R , i Z , and R is a set of real numbers.
The principle of the MLP structure is shown in Table 2. The parameter n denotes the length of the quantized binary codeword, and d represents the dimension of the reconstructed signal, where d Z . Different from traditional multi–stage architectures with feedback cascades, the proposed system employs a single end–to–end encoding pathway that operates without feedback from the quantization output or concatenation with the original input. Each signal undergoes transformation and quantization only once, significantly simplifying the encoder structure and enhancing both training and inference efficiency. Leveraging the powerful nonlinear transformation capability of the deep MLP, the transformation module produces highly expressive feature representations that effectively facilitate subsequent structured quantization and reconstruction.
Subsequently, the source sequence s is directly fed into the MLP neural network within the transformation module for the nonlinear transformation. The MLP employed in this work consists of multiple fully connected layers, as illustrated in Figure 2. Each linear layer is followed by batch normalization (BN) and a LeakyReLU function, while the output layer uses a Tanh function to constrain the output within a bounded range. Taking an input s R p × n as an example, where p denotes the batch size and n is the dimensionality of the input data, and p Z . Hence, the transformation process of the MLP can be formulated as
z = Tanh f L f 2 ( f 1 ( s ) ) ,
and the LeakyReLU function is defined as
LeakyReLU ( x ) = x , if x 0 , α x , otherwise ,
where α is a small positive constant, for instance α = 0.01 , and each intermediate function f i ( · ) is defined as
f i ( x ) = LeakyReLU BatchNorm ( x W i + b i ) ,
where W i and b i denote the weight matrix and bias vector of the i –th fully connected layer, respectively.
Based on the MLP architecture, the transformation module effectively captures complex nonlinear features within the Gaussian source sequence and produces continuous feature representations that are well–suited for subsequent structured quantization. To further enhance network performance, the Xavier initialization method is adopted for parameter initialization. This strategy helps mitigate issues of gradient vanishing or explosion during the early training phase, thereby improving both training stability and the generalization capability of the model.
In summary, the transformation module significantly enhances the nonlinear processing capability for memoryless Gaussian source sequences through its deep architecture, combined use of activation functions and batch normalization, and appropriate parameter initialization strategies. As a result, it generates more structurally compatible features, which are better adapted to subsequent structured quantization.

2.2. Structured Encoding Module

The existing source compression systems typically employ the LDPC–based BP algorithm. However, when directly applied to different kinds of sources, the standard BP algorithm often leads to considerable distortion amplification and high computational complexity. To mitigate these limitations, the proposed system introduces a novel quantization scheme that combines deep neural networks with the BP algorithm, thereby enabling more efficient and robust compression of Gaussian sources.
Specifically, the system first employs a neural network encoder to transform the input continuous Gaussian source sequence into continuous feature representations. To enhance the robustness of quantization, random perturbations are introduced into these features. The perturbed features are subsequently quantified using the BP algorithm to generate discrete output symbols. To facilitate end–to–end training, the quantization module incorporates the STE technique, which approximates gradients through the otherwise non–differentiable quantization operation.
The proposed quantization module integrates neural encoding with BP–based structured quantization within an end–to–end trainable framework. Gaussian source sequences are first encoded into continuous features, perturbed by uniform noise to improve robustness, and then quantized through the BP message passing. A perturbation–enhanced STE is employed to approximate gradients, enabling effective end–to–end network training. This hybrid design effectively combines the structural advantages of the BP with the expressive power of deep neural networks, significantly improving compression performance and system adaptability.
The procedure of the proposed quantization process is outlined in Algorithm 1. The system first applies a neural network encoder E ( · ) to transform the input Gaussian source sequence x into a continuous feature vector as z = E ( x ) . To enhance the robustness, a uniformly distributed random perturbation u U ( 0.5 , 0.5 ) is added to the continuous feature vector z , resulting in a perturbed feature vector z ˜ = z + u . Firstly, the symbols of Algorithm 1 are defined as follows:
x R d Input Gaussian source sequence z R n Feature vector from the transformation module E ( · ) Neural network encoder Q BP ( · ) BP based quantization u U ( 0.5 , 0.5 ) , u R n Random perturbation a d Step size for gradient perturbation N iter Max backpropagation steps during training B P iter Max number of BP iterations q raw , q R n Binary sequence from the BP quantization
Algorithm 1 Quantization: BP neural network based on the STE
Input: 
x , E ( · ) , Q BP ( · ) , a d , B P iter , N iter
Output: 
q
 1:
function BP_Quantization( x , E ( · ) , Q BP ( · ) , a d , B P iter , N iter )
 2:
     z E ( x )
 3:
     u Uniform ( 0.5 , 0.5 )
 4:
     z ˜ z + u
 5:
     q raw Q BP ( z ˜ , B P iter )
 6:
     q q raw u
 7:
    if In_Training_Mode then
 8:
         z + z + a d
 9:
         z z a d
10:
         q + Q BP ( z + , B P iter )
11:
         q Q BP ( z , B P iter )
12:
         q + q
13:
        Update_Encoder_Parameters(∇)
14:
    end if
15:
    return  q
16:
end function
Subsequently, the perturbed feature y is input into the quantization Q BP ( · ) , where multi–step BP is performed to obtain an initial discrete quantization result, as
q r a w = Q B P ( y ) ,
where y indicates the continuous feature, and the system outputs the result after perturbation correction as
q = q r a w u ,
where the final output q serves as the result of the current quantization round. The detailed procedure of the BP–based quantization subroutine Q BP ( · ) is presented in Algorithm 2, and the variable definitions are listed as follows:
y R n Continuous feature to be quantified ( e . g . , z ˜ ) B P iter Maximum number of BP iterations H { 0 , 1 } m × n Parity check matrix of LDPC codes BP _ Initialize ( · ) Initialization function for BP messages BP _ Iterate ( · ) Sin gle iteration of BP message updating BP _ MakeDecision ( · ) Function for final decision ( e . g . , sign ) q raw { 0 , 1 } n Discrete output symbols from BP quantization
Algorithm 2 BP quantization subroutine
Input: 
y , B P iter , H
Output: 
q raw
 1:
function BP_Quantization( y , B P iter , H )
 2:
    Step 1: Initialization
 3:
     l l r y
 4:
     B P _ s t a t e BP _ Initialize ( l l r , H )
 5:
    Step 2: Belief Propagation Iterations
 6:
    for 1 to B P iter  do
 7:
         B P _ s t a t e BP _ Iterate ( B P _ s t a t e , H )
 8:
    end for
 9:
    Step 3: Final Decision
10:
     q raw BP _ MakeDecision ( B P _ s t a t e )
11:
    return  q raw
12:
end function
In Algorithm 2, the initial BP internal message state is denoted as B P s t a t e 0 . In each round of BP iteration, the message updates between check nodes c and variable nodes v follow the rules defined in Algorithm 3. In this way, the training loop of the proposed system is completed. The message passing between check nodes and variable nodes is formulated by the following equations:
m c v = 2 tanh 1 v V ( c ) v tanh m v c 2 ,
m v c = l l r v + c C ( v ) c m c v ,
where m c v and m v c are the check–to–variable and variable–to–check messages, respectively. The symbol v denotes another variable node that is connected to check node c, and v V ( c ) v indicates the set of all variable nodes connected to c except v. Similarly, c C ( v ) c represents the set of all check nodes connected to v except c. The term m c v represents the message sent from check node c to variable node v in the previous iteration.
After several iterations of the BP, the marginal probability estimation of variable node is given by
L L R v = l l r v + c C ( v ) m c v .
A final decision is then applied to obtain the quantized output, as
q r a w , v = sign ( L L R v ) ,
where sign ( · ) represents the symbolic function. All variable node decisions are aggregated to form q r a w , which serves as the output of the current BP–based quantization process.
In Algorithm 3, the message updating rules between check nodes c and variable nodes v are presented. The variables are defined as follows:
q raw R n Quantized input to decoder ( from BP quantization ) s R d Reconstructed signal from decoder W i , b i Weight and bias of layer i h i Output of layer i before activation BatchNorm ( x ) Batch normalization layer LeakyReLU ( x ) Activation function ( α = 0.01 ) Tanh ( x ) Hyperbolic tan gent activation Dropout ( x , p ) Randomly drops p = 0.3 of x Init . Xavier for W i , zeros for b i
Algorithm 3 Postprocessing of quantization
Input: 
q raw R n
Output: 
s R d
 1:
Layer 1: Linear + BN + LeakyReLU
 2:
h 1 q raw W 1 + b 1
 3:
h 1 , BN BatchNorm ( h 1 )
 4:
h 1 , ACT LeakyReLU ( h 1 , BN )
 5:
Layer 2: Linear + BN + LeakyReLU
 6:
h 2 h 1 , ACT W 2 + b 2
 7:
h 2 , BN BatchNorm ( h 2 )
 8:
h 2 , ACT LeakyReLU ( h 2 , BN )
 9:
Layer 3: Linear + BN + LeakyReLU
10:
h 3 h 2 , ACT W 3 + b 3
11:
h 3 , BN BatchNorm ( h 3 )
12:
h 3 , ACT LeakyReLU ( h 3 , BN )
13:
Layer 4: Linear + Tanh
14:
h 4 h 3 , ACT W 4 + b 4
15:
s raw Tanh ( h 4 )
16:
Dropout (applied during training)
17:
s Dropout ( s raw , p = 0.3 )
18:
return  s
During neural network training, to address the non–differentiability of the quantization operation, the following STE is adopted to approximate the gradient of the quantization function as
q z Q B P ( z + a d ) Q B P ( z a d ) ,
where a d denotes the gradient perturbation amplitude, which is typically set to a d = 1 2 . The symmetric–difference step is set to a d = 1 2 , which balances the gradient signal–to–noise ratio under the added uniform noise and prevents saturation in the STE. The balance performance is observed in a d [ 0.5 , 0.8 ] , and the default a d = 1 2 works robustly across all settings.

2.3. Decoding Module

The decoding module reconstructs an approximate version of the original Gaussian source signal from the binary codeword sequence output by the encoder–side quantization module, as illustrated in Figure 3. Given a codeword sequence of length n , denoted as q = { q 1 , q 2 , , q n } , the reconstruction module first maps it back into the continuous domain to produce a signal s = { s 1 , s 2 , , s n } , which has the same dimensionality as the original input. The internal structure of the MLP in the decoding module is shown in Table 3. Here, n denotes the length of the quantified binary codeword, d represents the dimension of the reconstructed signal, and the variables satisfy q i { 0 , 1 } and s i R .
The output dropout indicates that the dropout = 0.3 is applied only before the last hour during training, and it is disabled during reasoning. The input codeword q is produced by the quantization module and possesses the same dimensionality as the original input signal. Although the quantization is structurally designed based on the P–LDPC code, yielding a high compression rate, it inherently lacks continuous expressiveness. To overcome this limitation, the reconstruction module performs a nonlinear inverse mapping that converts the binary codeword back into a continuous–valued signal.
Common activation functions present distinct trade–offs: ReLU–family activations are non–saturating and promote sparsity but risk dead neurons; GELU provides smooth, stochastic gating at a higher computational cost; and Tanh, while saturating, naturally bounds outputs to [ 1 , 1 ] . Based on these properties, we employ LeakyReLU ( α = 0.01 ) in hidden layers to mitigate the dead neuron issue and ensure stable gradient flow, and tanh at the output layer to constrain the pre–quantization signal. This architecture was empirically found to offer the best trade–off between convergence stability and rate–distortion performance.
As shown in Figure 3, the MLP is symmetrically implemented in the transformation module. The reconstruction comprises a series of fully connected layers. Each linear layer is followed by batch normalization and a LeakyReLU activation function. The output layer uses a tanh activation to constrain the amplitude of the reconstructed signal within the range ( 1 , 1 ) . Hence, the numerical instability is prevented, and the output is aligned with the distribution of the original Gaussian source.
Assuming a batch size of p and input codewords q R p × n , the reconstruction process is expressed as the following function:
s = tanh g L ( g 2 ( g 1 ( q ) ) ) ,
where the mapping function at the i-th layer is denoted as g i ( · ) as
g i ( x ) = LeakyReLU ( BatchNorm ( x W i + b i ) ) ,
where W i and b i indicate the weight matrix and bias vector of the i-th layer, respectively.
By incorporating batch normalization layers, the model dynamically normalizes the mean and variance of each layer’s outputs during training, which accelerates convergence and enhances training stability. The LeakyReLU activation function preserves the sparsity properties of ReLU while alleviating the dying neuron issue, thereby strengthening the nonlinear representational capacity. The output layer employs a tanh activation function to confine the reconstructed signal within the appropriate numerical range, consistent with the statistical properties of the Gaussian source. Hence, the reconstruction quality is improved.
The entire neural network is trained by minimizing the mean squared error (MSE) between the original Gaussian source s and the reconstructed signal s , which is formulated as follows:
L MSE = 1 p n i = 1 p j = 1 n ( s i j s i j ) 2 ,
where L MSE denotes the mean squared error, p represents the number of samples processed in parallel (i.e., batch size), n is the dimensionality of each sample, corresponding to the length of the Gaussian source sequence, s i j is the j-th element of the i-th original sample, and s i j is the corresponding reconstructed value produced by the decoding module. The term ( s i j s i j ) 2 represents the squared reconstruction error for the j-th component of the i-th sample. The double summation i = 1 p j = 1 n aggregates the squared error over all elements of all samples. Finally, the normalization factor 1 p n computes the average error over the entire batch.
The MSE measures quantifies the pointwise average difference between the network output s and the ground–truth signal s , where the lower MSE indicates better reconstruction capability and higher compression fidelity. In contrast to traditional decoders that rely on lookup tables or shallow neural networks, the reconstruction module proposed in this work demonstrates significantly enhanced feature restoration and information recovery performance. Without requiring access to the original signal or intermediate continuous features from the encoder, the decoder reconstructs the signal directly from a single–path binary codeword. This approach substantially simplifies the decoding process, improves decoding efficiency, and enhances the practicality of the end–to–end compression system.

3. System Optimization and Technical Details

3.1. Gradient Backpropagation

In this section, the non–differentiability issue is investigated in the joint training of the BP–based quantizer and the neural network. Since the quantization module implemented through the BP algorithm is embedded as a layer within the neural network, it should supply informative gradients during backpropagation. These gradients are necessary to update the parameters of preceding network layers and to minimize the overall loss function. However, the quantization function is inherently discontinuous and non–differentiable. Direct gradient computation results in values that are zero almost everywhere. This leads to gradient truncation in the backward pass, which prevents effective parameter updates and hinders end–to–end optimization.
To address the aforementioned challenge, the STE strategy is employed to approximate the gradients of the quantization function, thereby facilitating effective gradient backpropagation throughout the network. Specifically, random perturbation is introduced to the continuous feature vector before quantization. The resulting perturbed outputs are then used to approximate the derivative of the otherwise non–differentiable function through a finite–difference method. This approximation process, detailed in Algorithm 3, adheres to the principles of the STE and enables stable gradient propagation across the non–differentiable quantization layer during training.
Let z be a continuous input feature vector. By introducing a uniformly distributed random perturbation u U ( 0.5 , 0.5 ) , the perturbed feature vector is obtained as z ˜ = z + u . Then, the derivative of the quantization function Q ( · ) in the expectation sense is expressed as
d d z E [ Q ( z + u ) ] = d d z t / 2 t / 2 Q ( z + v ) d v = Q ( z + t / 2 ) Q ( z t / 2 ) ,
where Q ( · ) denotes the quantization function based on the BP algorithm, and E [ · ] represents the expectation operator. The quantization step size and the perturbation range follow a uniform distribution with U ( t / 2 , t / 2 ) . The difference form in the above expression provides a numerically meaningful approximation of the gradient for the non–differentiable function, thereby enabling the propagation of informative gradient signals and supporting effective training of the neural network.
The STE configuration employs a uniform perturbation, u U ( 0.5 , 0.5 ) , and a symmetric–difference step size of a d = 1 / 2 . The underlying neural network utilized the LeakyReLU activation function with a negative slope of 0.01 and was initialized using the Xavier–uniform method, with biases set to zero. The LeakyReLU with a negative slope α = 0.01 is used in hidden layers. All biases are zero–initialized, while the final tanh layer uses Xavier uniform. These settings stabilize optimization and are compatible with the STE defined in Equations (14) and (15). In the implementation, the system adopts a fixed perturbation step size. The approximate gradient of the quantization function is then computed using the symmetric difference form as
Q ( z ) z Q ( z + a d ) Q ( z a d ) ,
where the size of the fixed perturbation step is given as a d = 1 2 .
The proposed gradient approximation facilitates stable, end–to–end quantization training by effectively mitigating the vanishing gradient problem. This method seamlessly integrates the quantization into the learning process, enabling accurate gradient propagation and bridging the train–inference gap while preserving standard BP. Consequently, it substantially improves both training stability and final quantization accuracy.
Unless otherwise specified, models are trained for 4000 iterations with the Adam optimizer (learning rate of 1 × 10 4 , β 1 = 0.9 , β 2 = 0.999 ), a batch size of 1000, and gradient clipping at | | 2 1.0 . The STE employs a uniform perturbation u U ( 0.5 , 0.5 ) , a symmetric–difference step a d = 1 / 2 , LeakyReLU activations ( α = 0.01 ), and Xavier/He initialization with zero biases. Evaluation is performed on a fixed set of 10,000 i.i.d. Gaussian samples, with results reported as the mean and ± 1 standard deviation over five independent runs.
The proposed framework is designed as an end–to–end trainable network, with the original continuous input sequence serving as the training target. The loss function is defined as the MSE between the input and the reconstructed signal. Optimization is carried out using the Adam algorithm with a learning rate of 0.0001 and a batch size of 1000. The model undergoes 4000 training iterations to ensure stable convergence. A detailed summary of the training configuration is provided in Table 4.

3.2. Complexity Analysis

The overall training complexity of the proposed end–to–end compression framework is decomposed into three major components, including the encoder network, the structured quantization module, and the decoder network.
In the encoder module, the encoder consists of L enc fully connected layers, each with a maximum hidden width h max , enc . For a batch size of B, the time complexity of the encoder per training step is O ( B · L enc · h max , enc 2 ) and the corresponding space complexity is O ( L enc · h max , enc 2 ) .
In the quantization module, there are subcomponents containing the additive noise injection and the BP algorithm. The additive perturbation step introduces uniform noise into each n-dimensional encoder output, resulting in a time complexity of O ( B · n ) . The BP process operates on a P–LDPC matrix containing E non–zero entries and executes I iterations per decoding, leading to a time complexity of O ( B · E · I ) .
In the decoder module, the structure mirrors that of the encoder, consisting of L dec fully connected layers with a maximum width of h max , dec . The time and space complexities per forward pass are O ( B · L dec · h max , dec 2 ) and O ( L dec · h max , dec 2 ) , respectively.
The total computational complexity integrates all components. The overall time complexity per training iteration is O B · L enc · h max , enc 2 + E · I + L dec · h max , dec 2 , and the space complexity is O L enc · h max , enc 2 + L dec · h max , dec 2 .
This analysis demonstrates that although the structured quantization module incurs additional computational cost through the BP, the process remains computationally tractable owing to the sparse structure of the LDPC matrix. Furthermore, the computational demands of the encoder and decoder modules scale quadratically with network width yet linearly with batch size and network depth, thereby facilitating efficient training on GPU architectures.

4. Simulation Results and Analyses

To evaluate the effectiveness of the proposed end–to–end neural network integrated with P–LDPC codes for lossy compression, a comprehensive set of comparative experiments is conducted. All experiments are performed under identical datasets and training configurations to ensure fairness. Training loss curves are recorded throughout the optimization process to assess the performance and generalization capability.
In this section, a standard Gaussian source X N ( 0 , 1 ) is used as a representative example. The MSE is used as the distortion measure, and the theoretical lower bound of the distortion–rate function is given by [29]
D ( R ) = σ 2 · 2 2 R ,
where D is the distortion, R is the compression rate in bits per dimension, and σ 2 = 1 for the standard Gaussian source.
The parity–check matrix of the proposed P–LDPC codes is constructed and extended using the progressive edge growth algorithm [30], designed to maximize the girth of the factor graph while preserving sparsity, thereby enhancing decoding performance. In the implemented system, the compression rate R is determined by the encoding matrix H F 2 ( 1 r ) n × n and is defined as
R = n m n ,
where n is the codeword length, m is the number of parity–check equations, and H is a sparse parity–check matrix of dimension m × n .
A higher compression rate R corresponds to reduced redundancy, thereby improving compression efficiency at the potential cost of increased decoding difficulty under low–distortion constraints. To evaluate the system across varying compression strengths, the row–to–column ratio of the PEG–generated matrix H is adjusted, enabling flexible control over the rate R.
The extension factor k determines the scale of the protograph, defining a parity–check matrix of size ( k m 0 ) × ( k n 0 ) . Scaling up k increases computational complexity due to growth in both the number of edges E and the BP iterations I. Therefore, reducing k lowers the system runtime. This reduction also tends to yield lower distortion for a fixed–capacity MLP, as it operates on a lower–dimensional target, aligning with the trends observed in Figure 4. This scaling increases the block length n and the number of check nodes m, thereby influencing both the encoding complexity and the resulting distortion.
In Figure 4, the distortion–rate performance based on the benchmark P–LDPC code AR3A [31] is compared with the RMD–based system [21], the conventional BP–based system [32], and the proposed EBP–based system. The benchmark code is extended to be 5 and 20 times, respectively. It is clear that the proposed system has lower distortion over 0.05 at the same rate. In addition, fewer extensions of the P–LDPC code present less distortion performance, which indicates that a lower dimension will decrease the output distortion and coding runtime simultaneously. In this case, the complexity of the proposed EBP–based system will be effectively reduced.
In Figure 5, the distortion–rate performance of the BP, RMD, and EBP algorithms is compared. The simulations are conducted using three benchmark P–LDPC codes as proposed in [31], including AR3A, ARA, and AR4JA. It is evident that the AR3A code achieves less distortion, approaching the theoretical distortion–rate bound more closely. Among the different methods, the EBP algorithm using the AR3A code demonstrates better performance than the conventional, indicating that the EBP is more efficient than both the RMD and the BP in terms of rate–distortion performance.
The parameters n and m for the evaluated codes are given as follows. The codes ARA0.4 and AR4JA0.4 both correspond to n = 25 and m = 15 , while the codes ARA0.82 and AR4JA0.73 correspond to n = 85 , m = 15 and n = 55 , m = 15 , respectively. These configurations demonstrate the variation in codeword length n and number of parity–check equations m across different code rates and structures.
As illustrated in Figure 6, the proposed EBP algorithm exhibits superior convergence and lower MSE loss compared to both conventional BP and the RMD algorithm during testing. The EBP reaches a stable test loss of approximately 0.69 after 4000 iterations, demonstrating a smoother convergence trajectory. In contrast, the conventional BP and RMD stabilize around 0.76 and 0.73, respectively.
Furthermore, the curves represent the mean over five independent runs, with error bars indicating the standard deviation. An obvious improvement of nearly 0.12 for EBP over the conventional methods. The proposed EBP algorithm shows markedly smaller error bars than the benchmarks, signifying greater robustness to random initialization. While the BP exhibits significant variability, especially initially, the RMD algorithm demonstrates intermediate stability. These results confirm that the proposed EBP method achieves superior performance, offering faster convergence, a lower final test loss, and enhanced stability compared to the BP and RMD algorithms, as evidenced by its smaller error bars and smoother convergence trajectory.
Overall, the proposed system substantially improves the quantization accuracy and reconstruction performance by integrating deep neural transformations, random perturbations, and the STE strategy. Even on variable data, it consistently achieves lower reconstruction distortion, demonstrating that the proposed method exhibits not only enhanced fitting capability but also significantly stronger generalization in end–to–end lossy compression of Gaussian sources. These attributes contribute to more reliable performance in practical deployment scenarios.
In a representative setting (code length n = 100 , batch size B = 1 ), the inference latency was measured on a single GPU, as shown in Table 5. While the core BP kernel per iteration is identical for both BP and EBP, the proposed EBP incurs only a negligible overhead from an additional encoder/decoder forward pass (approximately 0.10 ms per sample). This results in a total latency that is only 1.03 times that of standard BP at I = 10 iterations.

5. Conclusions

This paper introduces an end–to–end lossy source coding system combining the multi–layer perceptron neural network. By integrating deep nonlinear transformations, stochastic perturbations, and a straight–through estimator, the proposed approach effectively resolves the non–differentiability issue in the source quantization. Simulations show that the proposed system surpasses conventional methods in distortion–rate performance and generalization, while also converging faster and achieving lower distortion compared to classical P–LDPC codes. The framework maintains manageable complexity through reduced matrix dimensions and decoding iterations, offering an efficient and practical solution for the source compression in sensor networks.

Author Contributions

Conceptualization, Y.W. and D.S.; methodology, Y.W.; software, Y.W.; validation, Y.W., W.C. and L.S.; formal analysis, Y.W.; investigation, Y.W.; resources, D.S.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, D.S.; visualization, Z.X.; supervision, L.W.; project administration, D.S.; funding acquisition, D.S. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Natural Science Foundation of China under Grant 62501252, in part by the Fujian Provincial Natural Science Foundation of China under Grants 2024J01101 and 2024J01120, in part by the Fujian Province Young and Middle–aged Teacher Education Research Project under Grant JAT231044, in part by the Key Science and technology Project of Fujian Province under Grant 2024H6030, in part by the National Science Foundation of Xiamen, China under Grant 3502Z202372013 and 3502Z202473053, in part by the Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine Information Technology (Xiamen University) of the Ministry of Education, China, under Grant UAC202304, in part by the Jimei University Startup Research Project under Grant ZQ2024058.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Okafor, N.; Ingle, R.; Matthew, U.; Saunders, M.; Delaney, D. Assessing and Improving IoT Sensor Data Quality in Environmental Monitoring Networks: A Focus on Peatlands. IEEE Internet Things J. 2024, 11, 40727–40742. [Google Scholar] [CrossRef]
  2. Zeng, Z.; Xie, C.; Tao, W.; Zhu, Y.; Cai, H. Knowledge-Graph-Based IoTs Entity Discovery Middleware for Nonsmart Sensor. IEEE Trans. Ind. Inform. 2024, 20, 2551–2563. [Google Scholar] [CrossRef]
  3. Xie, Z.; Wang, Y.; Xu, Y.; Chen, P.; Fang, Y. Polarization-Aided Multi-User Spatial Modulation. IEEE Trans. Veh. Technol. 2025, 74, 12148–12159. [Google Scholar] [CrossRef]
  4. Xie, Z.; Chen, P.; Fang, Y.; Chen, Q. Polarization-Aided Coding for Nonorthogonal Multiple Access. IEEE Internet Things J. 2024, 11, 27894–27903. [Google Scholar] [CrossRef]
  5. Song, D.; Ren, J.; Wang, L.; Chen, G. Designing a Common DP-LDPC Code Pair for Variable On-Body Channels. IEEE Trans. Wirel. Commun. 2022, 21, 9596–9609. [Google Scholar] [CrossRef]
  6. Song, D.; Wang, L.; Xu, Z.; Chen, G. Joint Code Rate Compatible Design of DP-LDPC Code Pairs for Joint Source Channel Coding Over Implant-to-External Channel. IEEE Trans. Wirel. Commun. 2022, 21, 5526–5540. [Google Scholar] [CrossRef]
  7. Mao, W.; Zhao, Z.; Chang, Z.; Min, G.; Gao, W. Energy-efficient industrial internet of things: Overview and open issues. IEEE Trans. Ind. Inform. 2021, 17, 7225–7237. [Google Scholar]
  8. Li, Y.; Chen, G.; Chang, H.-C.; Chen, Q.; Truong, T.-K. An Improved Decoding Algorithm of the (71, 36, 11) Quadratic Residue Code without Determining Unknown Syndromes. IEEE Trans. Commun. 2015, 63, 4607–4614. [Google Scholar] [CrossRef]
  9. Zou, L.; Yan, H.; Dong, J.; Li, Y.; Chen, P.; Lau, F.C.M. On Construction of Low-Density Parity-Check Codes for Ultra-Reliable and Low Latency Communications. IEEE Trans. Commun. 2024, 72, 5290–5301. [Google Scholar]
  10. Chen, P.; Xie, Z.; Fang, Y.; Chen, Z.; Mumtaz, S.; Rodrigues, J.J.P.C. Physical-Layer Network Coding: An Efficient Technique for Wireless Communications. IEEE Netw. 2020, 34, 270–276. [Google Scholar]
  11. Song, D.; Xu, Z.; Sun, Y.; Gao, Z.; Wu, F.; Han, Y.; Liao, Y. Joint Semantic-Physical Neural Network for Underwater Communication. IEEE Internet Things Mag. 2025, 1–6. [Google Scholar] [CrossRef]
  12. Song, D.; Ren, J.; Wang, L.; Wu, H.; Chen, J.; Chen, G. Two-Component GMM Source Coding and Optimization. IEEE Trans. Intell. Transp. Syst. 2025, 1–14. [Google Scholar] [CrossRef]
  13. Fang, Y.; Bu, Y.; Chen, P.; Lau, F.C.M.; Otaibi, S.A. Irregular-Mapped Protograph LDPC-Coded Modulation: A Bandwidth-Efficient Solution for 6G-Enabled Mobile Networks. IEEE Trans. Intell. Transp. Syst. 2023, 24, 2060–2073. [Google Scholar] [CrossRef]
  14. Frey, M.; Bjelaković, I.; Gastpar, M.C.; Zhu, J. Simultaneous Computation and Communication Over MAC. IEEE Trans. Inf. Theory 2025, 71, 3644–3665. [Google Scholar] [CrossRef]
  15. Minani, J.B.; Sabir, F.; Moha, N.; Guéhéneuc, Y.-G.; Masuda, T. TISSEA: A Framework for Testing IoT Systems Based on Technical Software Engineering Aspects. IEEE Internet Things J. 2025, 1–18. [Google Scholar] [CrossRef]
  16. Lebonvallet, G.; Salazar-Zendeja, L.A.; Hnaien, F.; Snoussi, H.; Nélain, B. Perceiver IO Model for Efficient Vibration Signal Compression and Anomaly Detection. IEEE Trans. Instrum. Meas. 2025, 74, 2544912. [Google Scholar] [CrossRef]
  17. Song, D.; Ren, J.; Wang, L.; Chen, G. Gaussian Source Coding Based on P-LDPC Code. IEEE Trans. Commun. 2023, 71, 1970–1981. [Google Scholar] [CrossRef]
  18. Muramatsu, J. Channel Coding and Lossy Source Coding Using a Generator of Constrained Random Numbers. IEEE Trans. Inf. Theory 2014, 60, 2667–2686. [Google Scholar] [CrossRef]
  19. Jiang, X.; Xue, S.; Tang, J.; Huang, P.; Zeng, G. Low-Complexity Adaptive Reconciliation Protocol for Continuous-Variable Quantum Key Distribution. Quantum Sci. Technol. 2024, 9, 25008. [Google Scholar] [CrossRef]
  20. Jiang, X.; Wang, N.; Jin, J.; Feng, Y.; Hai, H.; Dai, J.; Huang, P. A Machine Learning Based Optimization Approach for Continuous-Variable Quantum Key Distribution. Adv. Quantum Technol. 2025. [Google Scholar] [CrossRef]
  21. Ren, J.; Song, D.; Wu, H.; Wang, L. Lossy P-LDPC Codes for Compressing General Sources Using Neural Networks. Entropy 2023, 25, 252. [Google Scholar] [CrossRef]
  22. Regalia, P. A Modified Belief Propagation Algorithm for Code Word Quantization. IEEE Trans. Commun. 2009, 57, 3513–3517. [Google Scholar] [CrossRef]
  23. Deng, H.; Song, D.; Miao, M.; Wang, L. Design of Lossy Compression of the Gaussian Source with Protograph LDPC Codes. In Proceedings of the 15th International Conference on Signal Processing and Communication Systems (ICSPCS), Sydney, Australia, 13–15 December 2021; pp. 1–6. [Google Scholar]
  24. Gupta, A.; Verdu, S. Operational Duality between Lossy Compression and Channel Coding: Channel Decoders as Lossy Compressors. In Proceedings of the 2009 Information Theory and Applications Workshop, San Diego, CA, USA, 8–13 February 2009; pp. 119–123. [Google Scholar]
  25. Golmohammadi, A.; Mitchell, D.G.M.; Kliewer, J.; Costello, D.J. Encoding of Spatially Coupled LDGM Codes for Lossy Source Compression. IEEE Trans. Commun. 2018, 66, 5691–5703. [Google Scholar] [CrossRef]
  26. Chen, Y.; Lin, H.; Zeng, H.; Fang, Y.; Xie, Z.; Lau, F.C.M. Optimization of Hierarchical-Modulated and LDPC-Coded BICM With Physical-Layer Network Coding. IEEE Internet Things J. 2024, 11, 35036–35047. [Google Scholar] [CrossRef]
  27. Liu, S.; Wu, X.; Fang, Y.; Chen, J.; Chen, C.; Zhou, L. Performance Analysis and Enhancement of P-LDPC Codes for Lossy Compression of Binary Sources. IEEE Trans. Commun. 2025. [Google Scholar] [CrossRef]
  28. Fang, Y.; Yang, J.; Ma, D.; Yang, M.; Xu, Z.; Chen, X. Integrated Sensing and Backscatter Communication with Movable Antennas: State–of–the–Art Survey and A Novel Inverse Scattering Framework. IEEE Trans. Netw. Sci. Eng. 2025, 1–20. [Google Scholar] [CrossRef]
  29. Cover, T.M. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1999. [Google Scholar]
  30. Hu, X.-Y.; Eleftheriou, E.; Arnold, D.-M. Regular and Irregular Progressive Edge-Growth Tanner Graphs. IEEE Trans. Inf. Theory 2005, 51, 386–398. [Google Scholar] [CrossRef]
  31. Divsalar, D.; Dolinar, S.; Jones, C.R.; Andrews, K. Capacity-Approaching Protograph Codes. IEEE J. Sel. Areas Commun. 2009, 27, 876–888. [Google Scholar] [CrossRef]
  32. Deng, H.; Song, D.; Xu, Z.; Sun, Y.; Wang, L. Belief Propagation Optimization for Lossy Compression Based on Gaussian Source. Sensors 2023, 23, 8805. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The proposed lossy source coding system based on a joint optimization architecture.
Figure 1. The proposed lossy source coding system based on a joint optimization architecture.
Entropy 27 01065 g001
Figure 2. Detailed architecture of the proposed encoder integrating an MLP and a structured quantization module.
Figure 2. Detailed architecture of the proposed encoder integrating an MLP and a structured quantization module.
Entropy 27 01065 g002
Figure 3. Detailed architecture of the proposed decoder.
Figure 3. Detailed architecture of the proposed decoder.
Entropy 27 01065 g003
Figure 4. Distortion–rate based on the AR3A code with varying extension times and decoders.
Figure 4. Distortion–rate based on the AR3A code with varying extension times and decoders.
Entropy 27 01065 g004
Figure 5. Distortion–rate performance based on the AR3A, AR4JA, and ARA codes.
Figure 5. Distortion–rate performance based on the AR3A, AR4JA, and ARA codes.
Entropy 27 01065 g005
Figure 6. Loss–iterations comparisons among three methods.
Figure 6. Loss–iterations comparisons among three methods.
Entropy 27 01065 g006
Table 1. Notations used throughout the paper.
Table 1. Notations used throughout the paper.
SymbolMeaning
nCodeword length (bits)
mNumber of parity–check equations
R = ( n m ) / n Compression rate (bits/dim)
dSignal dimension (input/output)
q { 0 , 1 } n Quantized binary codeword
z R n Encoder feature vector
u U ( 0.5 , 0.5 ) Uniform perturbation for robustness/STE
a d Symmetric–difference step in STE (default a d = 1 / 2 )
E , I Nonzeros in parity–check matrix of BP iterations
L enc / dec , h max , enc / dec Layers and the maximum hidden width of the encoder or decoder
Table 2. Internal structure of the MLP in the encoding module.
Table 2. Internal structure of the MLP in the encoding module.
LayerTypeInput Dim.Output Dim.Activation
Inputdd
Layer 1Linear + BNd 2 ( d + n ) LeakyReLU
Layer 2Linear + BN 2 ( d + n ) 4 ( d + n ) LeakyReLU
Layer 3Linear + BN 4 ( d + n ) 8 ( d + n ) LeakyReLU
Layer 4Linear 8 ( d + n ) nTanh
Outputnn
Table 3. Internal structure of the MLP in the decoding module.
Table 3. Internal structure of the MLP in the decoding module.
LayerTypeInput Dim.Output Dim.Activation
Input-nn-
Layer 1Linear + BNn 8 ( d + n ) LeakyReLU
Layer 2Linear + BN 8 ( d + n ) 4 ( d + n ) LeakyReLU
Layer 3Linear + BN 4 ( d + n ) 2 ( d + n ) LeakyReLU
Layer 4Linear 2 ( d + n ) dTanh
OutputDropout-d-
Table 4. Training hyperparameters of the proposed network.
Table 4. Training hyperparameters of the proposed network.
HyperparameterTraining
Loss FunctionMSE
OptimizerAdam
Learning Rate0.0001
Batch Size1000
Maximum Iterations4000
Gradient ApproximationSTE
Table 5. The inference latency comparisons.
Table 5. The inference latency comparisons.
MethodIterations IBP Time/IterMLP Fwd.Total/Sample
BP–only10 0.35 ms 3.50 ms
EBP10 0.35 ms 0.10 ms 3.60 ms
Relative ratio (EBP/BP): 1.03 ×
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Chen, W.; Song, L.; Xu, Z.; Song, D.; Wang, L. High-Efficiency Lossy Source Coding Based on Multi-Layer Perceptron Neural Network. Entropy 2025, 27, 1065. https://doi.org/10.3390/e27101065

AMA Style

Wang Y, Chen W, Song L, Xu Z, Song D, Wang L. High-Efficiency Lossy Source Coding Based on Multi-Layer Perceptron Neural Network. Entropy. 2025; 27(10):1065. https://doi.org/10.3390/e27101065

Chicago/Turabian Style

Wang, Yuhang, Weihua Chen, Linjing Song, Zhiping Xu, Dan Song, and Lin Wang. 2025. "High-Efficiency Lossy Source Coding Based on Multi-Layer Perceptron Neural Network" Entropy 27, no. 10: 1065. https://doi.org/10.3390/e27101065

APA Style

Wang, Y., Chen, W., Song, L., Xu, Z., Song, D., & Wang, L. (2025). High-Efficiency Lossy Source Coding Based on Multi-Layer Perceptron Neural Network. Entropy, 27(10), 1065. https://doi.org/10.3390/e27101065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop