Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links

Kim, Serin; Won, Yong-Yuk; Park, Jiwon

doi:10.3390/photonics13040356

Open AccessArticle

Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links

by

Serin Kim

¹

,

Yong-Yuk Won

^2,*

and

Jiwon Park

¹

Department of Artificial Intelligence Convergence, Myongji University, 116 Myongji-ro, Cheoin-gu, Yongin 17058, Republic of Korea

²

Department of Electronic Engineering, Myongji University, 116 Myongji-ro, Cheoin-gu, Yongin 17058, Republic of Korea

^*

Author to whom correspondence should be addressed.

Photonics 2026, 13(4), 356; https://doi.org/10.3390/photonics13040356

Submission received: 5 March 2026 / Revised: 5 April 2026 / Accepted: 6 April 2026 / Published: 8 April 2026

(This article belongs to the Special Issue Neural Networks in Optical Communications and Optical Computing: Implementation, Applications, and Prospects)

Download

Browse Figures

Versions Notes

Abstract

Visible light communication (VLC)-based Li-Fi systems suffer from limitations in transmission capacity expansion due to the restricted modulation bandwidth of LEDs. In this study, a latent representation-based NRZ-OOK Li-Fi transmission framework that exploits the statistical feature distribution of the latent space is proposed to improve transmission efficiency without expanding the physical bandwidth. An autoencoder is employed to transform input images into low-dimensional latent vectors, which are then quantized and modulated for transmission. At the receiver, hard decision and inverse quantization are performed, and the image is reconstructed through a trained decoder by leveraging the distribution characteristics of the latent representation. The effective transmission capacity gain

G_{c a p}

is defined to quantify the amount of representable information relative to the original data under the same physical link resources according to the latent dimension, achieving up to a 49-fold data representation efficiency. The experimental results over practical optical links (0.5–1.5 m) showed that, in short-range conditions, larger latent dimensions maintained higher reconstruction PSNR, whereas under channel degradation conditions, smaller latent dimensions exhibited higher robustness, demonstrating a performance inversion phenomenon. Furthermore, it was confirmed that the dominant factor governing reconstruction performance shifts from the representational capability of the data to error accumulation characteristics depending on the channel condition. These results suggest that the latent representation-based transmission framework is an effective Li-Fi strategy that can simultaneously consider transmission efficiency and channel robustness through information representation optimization in bandwidth-limited environments.

Keywords:

visible light communication (VLC); Li-Fi; autoencoder; latent representation; transmission capacity enhancement; NRZ-OOK; channel robustness

1. Introduction

In the 6G era, the exponential growth of data traffic necessitates wireless communication systems offering ultra-high data rates and low latency [1]. Light Fidelity (Li-Fi), utilizing the unlicensed visible light spectrum, has emerged as a compelling alternative to RF-based systems due to its immunity to electromagnetic interference and high spatial security [2,3,4]. Furthermore, since it can directly leverage existing LED lighting infrastructure, Li-Fi is considered a cost-effective solution for deployment and operation in various indoor settings, including homes, offices, hospitals, and public facilities [5,6,7]. Despite its potential, the transmission capacity of Li-Fi is fundamentally limited by the physical constraints of LEDs, such as the slow response of the phosphor layer, which restricts the modulation bandwidth to below 1 MHz [8,9]. Conventional efforts to enhance capacity have focused on hardware modifications, such as RGB-LEDs or GaN-based micro-LEDs, or signal processing techniques like DC bias optimization and linear equalization. However, these approaches often require complex hardware changes or lack adaptability to dynamic channel noise and power fluctuations in practical indoor environments [10,11,12,13].

Table 1 summarizes and compares conventional approaches proposed for transmission capacity enhancement in visible light communication (VLC) systems with the deep learning-based approach presented in this work. RGB-LED-based white light sources with improved phosphor structures and gallium nitride (GaN)-based blue micro-LED arrays have achieved capacity enhancement by alleviating the modulation bandwidth and response speed limitations imposed by the phosphor characteristics of conventional transmitters [10,11]. However, these approaches require modifications to existing lighting hardware, which limits their practicality in current indoor infrastructures.

Conventional LED-based Li-Fi systems deployed in indoor environments suffer from signal distortion and inter-symbol interference (ISI) caused by the nonlinear characteristics of LEDs. In addition, multipath propagation and obstacles may lead to distortion of the received signal and attenuation of optical power. Consequently, hardware-oriented performance improvements alone are insufficient to fully address the transmission capacity degradation caused by channel noise and signal distortion in practical Li-Fi links, thereby necessitating signal-processing-based approaches that are compatible with existing lighting infrastructure. To mitigate these issues, prior studies have optimized the DC bias of LEDs to increase transmission rates within the same bandwidth, thereby enhancing channel capacity [12]. Furthermore, to compensate for signal distortion and high-frequency attenuation resulting from the limited frequency response of phosphor-coated white LEDs, linear post-equalizers have been applied at the receiver, achieving NRZ-based transmission at 100 Mb/s and improving transmission capacity [13]. Nevertheless, DC bias optimization and linear equalization techniques rely on fixed models and linear compensation structures, which limit their adaptability to environments characterized by increased noise and reduced received power in practical Li-Fi links.

To address these challenges, this study proposes an autoencoder-based latent representation encoding framework that is compatible with existing LED infrastructure without hardware modification. By learning an optimized low-dimensional representation of the input data, the proposed system can transmit compressed feature information that is robust to channel noise. In addition, the transmission capacity enhancement in this study does not imply an increase in the physical channel capacity, but rather an improvement in the information representation efficiency of the data under the same transmission resources. For example, when a latent representation is utilized within the bit resources required to transmit a single image, up to approximately 49 times higher information representation efficiency can be achieved compared to the original image.

Existing studies on autoencoder-based communication systems have been reported, and the main differences from this study are summarized in Table 2 [9]. Conventional approaches are based on an encoder–channel–decoder structure and aim to learn signal representations that are robust to channel conditions by training the entire system, including an AWGN channel, in an end-to-end manner. In this process, random messages are converted into one-hot vectors and mapped into high-dimensional continuous-valued vectors to construct constellations containing redundancy, while at the receiver, the problem is formulated as a classification task with the objective of minimizing BER.

In contrast, this study proposes a structure that compresses image data into low-dimensional latent representations and transmits them in order to address the reduction in transmission capacity caused by the limited bandwidth of LED-based VLC systems. After learning the representation through an autoencoder, quantization and NRZ-OOK modulation are applied to construct a practical communication system, and end-to-end optimization, including the channel, is not performed. Therefore, this study is formulated as a reconstruction problem rather than a conventional classification problem, and proposes a framework that simultaneously considers data representation efficiency and reconstruction performance. The main objective of this study is to improve the effective transmission capacity under the same bit resources by utilizing autoencoder-based latent representations, and to analyze the trade-off with reconstruction performance under different channel conditions. To this end, the effectiveness of the proposed method is evaluated by experimentally verifying the relationship between latent dimension and PSNR under various transmission distance conditions.

2. Autoencoder-Based Latent Representation Sampling Scheme

In Li-Fi environments, the physical bandwidth limitation of phosphor-coated white LEDs—typically below 1 MHz—imposes constraints on achieving transmission rates higher than 1 Mb/s when employing NRZ–OOK signaling [8]. To transmit a greater amount of information within the same time duration under such bandwidth constraints, this study proposes a signal-processing-based feature sampling scheme. Specifically, a convolutional autoencoder is utilized to encode the essential features of the transmission data into a latent representation, which is subsequently converted into an IM/DD (Intensity Modulation/Direct Detection)-based NRZ–OOK transmission signal.

2.1. Autoencoder Architecture

Figure 1 illustrates the overall architecture of the autoencoder employed in this study. The autoencoder (AE) is an unsupervised neural network composed of an encoder and a decoder, designed to extract meaningful features from input data and encode them into a low-dimensional latent representation. Based on this latent representation, the network reconstructs an output that closely resembles the original input. The encoder maps the input signal into a latent space, where the latent representation is defined as the output of the encoder. The latent representation consists of a vector whose dimensionality is specified by the user. The number of latent dimensions (

z_{d i m}

) determines the dimensionality of the latent space. The decoder takes the latent representation as input and progressively expands it through a multilayer neural network structure to restore the spatial and structural characteristics of the original input signal. During training, the decoder learns the nonlinear mapping relationship between the latent representation and the output signal. As a result, it can generate reconstructed outputs that approximate the original signal even when the latent representation contains channel noise and distortion.

In this study, the encoder consists of a total of six layers, including four CNN-based convolutional layers, a flatten layer, and a fully connected (dense) layer. The convolutional layers progressively extract spatial and structural features from the MNIST input images, and the flatten layer converts the extracted two-dimensional feature maps into a one-dimensional vector. The fully connected layer maps the flattened feature vector into a latent representation with a dimension of

z_{d i m}

. This low-dimensional latent representation reduces the amount of transmission data compared to the original data, thereby contributing to improved data representation efficiency in bandwidth-limited environments. In addition, the encoder is designed to learn the statistical characteristics (mean and variance) of each latent dimension, enabling the formation of noise-robust representations during the downsampling and reconstruction processes. Meanwhile, to reflect the noise occurring in Li-Fi systems, additive white Gaussian noise (AWGN) is added to the IM/DD-based NRZ–OOK transmission waveform generated from the quantized latent representation. The decoder is trained based on such noise-corrupted signals, allowing stable reconstruction even in practical channel environments.

A total of 60,000 images from the MNIST dataset, each consisting of a 28 × 28 × 1 single-channel grayscale image, were used as training inputs. All images were normalized to the range of [0, 1]. In the following, the encoder and decoder architecture of the autoencoder employed in this study, along with the role of each layer and the overall training structure, are described step by step.

Low-Level Feature Extraction

The first and second convolutional layers are designed to extract low-level features from the input image. The first layer employs a 3 × 3 kernel with ‘same’ padding and a Sigmoid activation function, transforming the input image into 32-channel feature maps. This layer learns fundamental visual features such as edges and corners. The second layer applies a ReLU activation function with a stride of (2, 2), reducing the spatial resolution by half (14 × 14) while increasing the number of channels to 64, thereby extracting intermediate-level visual patterns.

2.: High-Level Feature Extraction

The third and fourth convolutional layers are configured to maintain 64 channels, like the second layer, while learning more abstract and high-level features. Through this process, the network progressively abstracts and represents structural and morphological information related to digit shapes.

3.: Latent Representation Generation

Finally, the feature maps generated by the convolutional layers are transformed into a one-dimensional vector through a flattening operation. This vector is then mapped to a latent representation of predefined dimensionality using a Dense layer. The dimensional size of the latent representation determines the length of the data to be transmitted in the subsequent transmission process and serves as a key criterion for evaluating the transmission capacity gain relative to the original data.

4.: Latent Representation Reconstruction

The latent representation generated by the encoder is quantized and modulated using IM/DD-based NRZ–OOK, and AWGN is added to the modulated data to represent the noise introduced during transmission. For the received signal, hard decision-based downsampling and inverse quantization are performed to recover the latent representation. In addition, an additional reconstruction process is carried out by exploiting the distribution characteristics of the learned latent representation.

The recovered latent representation is then used as the input to the decoder, which is constructed based on a structure symmetric to that of the encoder and progressively reconstructs the latent representation into the original input image dimension. The decoder is designed to mitigate reconstruction errors under channel degradation conditions by learning the nonlinear mapping relationship between the noise-corrupted latent representation and the original image.

5.: Training Method and Loss Function

To effectively capture the pixel-wise probabilistic differences between the original and reconstructed images, Binary Cross-Entropy (BCE) is employed as the loss function. Since the input images are single-channel grayscale images normalized to the range [0, 1], applying a sigmoid activation function at the output layer of the decoder allows each pixel value to be interpreted as a probabilistic activation, making it well-suited for image reconstruction learning.

L_{B C E} (x, \hat{x}) = - \frac{1}{N} \sum_{i = 1}^{N} [x_{i} l o g ({\hat{x}}_{i}) + (1 - x_{i}) l o g (1 - {\hat{x}}_{i})]

(1)

Equation (1) represents the Binary Cross-Entropy (BCE) loss function used in this study. Here,

x_{i}

and

{\hat{x}}_{i}

denote the i-th pixel values of the original and reconstructed images, respectively, and

N

represents the total number of pixels. The loss function defined in Equation (1) is designed to minimize the pixel-wise discrepancy between the original and reconstructed images. Its objective is to jointly train and iteratively update the parameters of both the encoder and decoder.

6.: Optimization Method and Model Performance Evaluation

To optimize the parameters of the autoencoder model, the Adam (Adaptive Moment Estimation) optimizer is employed, and the model is trained for a total of 10 epochs. Adam is an adaptive learning rate optimization method that simultaneously considers the first and second moments, providing stable convergence characteristics even in neural networks with noisy inputs and nonlinear structures. Due to these properties, efficient parameter optimization is achieved in the autoencoder structure of this study, where differences in latent representation dimensions and layer-wise parameter scales exist, without the need for additional learning rate tuning. After training, reconstructed images are generated for the test data, and the model reconstruction performance is quantitatively evaluated using BER (Bit Error Rate) and PSNR (Peak Signal-to-Noise Ratio) metrics.

2.2. Transmission Data Generation and Modulation Process

Since the generated latent representations must undergo quantization for transmission, they are first scaled based on their statistical properties (mean and variance) and predefined clipping ranges. This statistics-based scaling serves as a preprocessing step to mitigate information loss caused by quantization. The decoder is accordingly designed to learn a stable reconstruction mapping for latent representations that reflect these statistical characteristics. To modulate the latent representations extracted from the original data into NRZ–OOK signals, both quantization and sampling processes are required. If the dimensionality of the latent representation is excessively reduced, significant information loss may occur, leading to degradation in signal reconstruction performance. Therefore, selecting an appropriate latent dimensionality for the Li-Fi transmission environment is a critical factor that simultaneously determines transmission capacity and reconstruction accuracy. In general, digital communication system quantization approximates continuous signals with discrete values, thereby reducing sensitivity to noise. Sampling, on the other hand, enables signal reconstruction within limited resolution constraints, maintaining a balance between transmission reliability and signal resolution.

Meanwhile, to convert the real-valued latent vectors in the encoded latent space into binary signals suitable for NRZ–OOK modulation through a quantization process, a scaling procedure that accounts for the distribution characteristics of the latent representations must be performed in advance. In this section, the quantization and sampling processes are mathematically defined, and the procedure for generating NRZ–OOK-based Li-Fi transmission data according to parameter settings is described.

z = [z_{1}, z_{2}, \dots, z_{z_{d i m}}]

(2)

Equation (2) represents the

z_{\dim}

-dimensional latent representation

z

extracted through the encoder. Each component

Z_{i}

denotes the i-th latent variable constituting the latent representation and is a real-valued vector element. Since the distribution characteristics differ across dimensions and the value range of the latent representation is not inherently bounded, directly applying quantization may result in significant quantization errors. Therefore, a scaling process that accounts for the statistical distribution characteristics of the latent representation is required prior to quantization.

z_{i}^{c l i p} = c l i p (z_{i}, - K σ_{i} + μ_{i}, K σ_{i} + μ_{i})

(3)

Equation (3) represents the clipped latent representation for each dimension

z_{i}

extracted by the encoder, where clipping is applied based on the mean

μ_{i}

and standard deviation

σ_{i}

estimated from the training dataset. Here,

i = 1,2, \dots, z_{\dim}

denotes the dimensional index of the latent representation. The parameters

μ_{i}

and

σ_{i}

are computed from the overall latent representation distribution of the training data and reflect the statistical characteristics of each latent variable. The clipping coefficient

K

is set to 3 to mitigate quantization errors that may arise from extreme values (outliers) in the distribution.

z_{i}^{n o r m} = \frac{z_{i}^{c l i p} - (μ_{i} - K σ_{i})}{2 K σ_{i}}, z_{i}^{n o r m} \in [0, 1]

(4)

Equation (4) represents the result of linearly scaling the clipped latent variable

z_{i}^{clip}

to the range of [0, 1] based on the dimension-wise statistical parameters

μ_{i}

and

σ_{i}

.

Q_{i} = ⌊ z_{i}^{n o r m} \cdot L ⌋, Q_{i} \in {0, 1, \dots, L - 1}

(5)

Equation (5) describes the process of converting the normalized latent variable

z_{i}^{norm}

, which lies within the range of [0, 1], into a uniformly quantized value

Q_{i}

with

L

levels. In this study, the quantization level

L

is set to 256.

b_{i} = B i n a r y (Q_{i}, N), N = {l o g}_{2} L

(6)

Equation (6) represents the conversion of the quantized value

Q_{i}

into an

N

-bit binary vector

b_{i}

. Here,

N

denotes the number of bits required to represent the

L

-level quantized value. The binary vectors

b_{i}

generated for

i = 1, \dots, z_{d i m}

are sequentially concatenated to construct the overall transmission bit stream

B

.

B_{t x} = P ∥ B

(7)

To enable synchronization detection of the generated transmission bit stream

B

, a 32-bit preamble sequence

P

was inserted at the beginning of the stream. The preamble

P

was designed by repeating the pattern “0011” eight times. This structure was adopted to mitigate LED drift and fluctuations in average optical output that may occur when identical bits persist for an extended duration. The final transmission bit stream is constructed by concatenating the preamble and the data bit sequence, as expressed in Equation (7).

X = r e p e a t (B_{tx}, s)

(8)

Equation (8) describes the process of applying the number of samples per bit,

s

, to each bit of the transmission bit stream

B_{tx} = [B_{1}, B_{2}, \dots, B_{z_{d i m} \times N}]

. In this process, each

B_{i}

is replicated into

s

identical samples, resulting in an expanded bit stream

X

that is

s

times longer than

B_{tx}

. In this study,

s

is fixed to 2. The generated

X

is used as the digital signal for transmission and is subsequently modulated into an NRZ–OOK waveform to convert it into an optical signal suitable for application in the Li-Fi system.

NRZ–OOK is a simple modulation scheme widely used in digital communication systems, where a logical ‘1’ is represented by the optical source in the ON state and a logical ‘0’ by the OFF state. Since this scheme does not include a return-to-zero interval during transmission, it allows more information to be conveyed within the same time duration [14]. NRZ–OOK exhibits high compatibility with the simple ON/OFF control mechanism of LEDs, and due to its relatively straightforward circuit implementation and low-power operation, it is effectively utilized in Li-Fi systems. Under the physical constraints of optical source control inherent in Li-Fi environments, NRZ–OOK is regarded as a practical modulation scheme owing to its high implementation and operational efficiency compared to more complex modulation techniques. In this study, the transmission data

X

generated in Equation (8) were modulated using the NRZ–OOK scheme. The resulting waveform was then transmitted to the Arbitrary Function Generator (AFG) via a USB interface using the AFG’s Arbitrary Waveform Tool.

2.3. Received Data Reconstruction and Post-Processing

To clarify the receiver processing flow, the overall reconstruction pipeline is summarized as follows. The received analog signal is first normalized and converted into binary data through a two-stage hard-decision process. Subsequently, the detected bit stream is used to recover the quantized latent representation via inverse quantization and statistical rescaling before being fed into the decoder.

In practical Li-Fi systems, the received signal is inevitably corrupted by noise due to various transmission conditions such as optical attenuation and ambient light interference. As a result, the transmitted NRZ–OOK signal is received not as an ideal binary waveform but as a continuous analog signal vector. This distortion may affect the accuracy of signal detection and reconstruction at the receiver. To recover the received continuous signal into digital binary data, a hard-decision-based downsampling and binarization process is required. The received signal is first normalized to the range of [0, 1], after which a two-stage hard-decision-based reconstruction procedure is performed.

\hat{b} = {\begin{matrix} 1, & r \geq γ \\ 0, & r \leq γ \end{matrix}

(9)

In Equation (9),

r

denotes the average of the normalized received sample values, and

γ

represents the decision threshold. In the first hard-decision-based reconstruction stage, five upsampled received samples captured by the oscilloscope are integrated into a single symbol by defining

r

as the mean of these five samples. Subsequently, a decision threshold of

γ = 0.4

is applied to perform the hard decision. Under ideal conditions, the decision threshold for hard decision detection may be set to 0.5. However, in practical Li-Fi environments, the symbol distribution was not perfectly symmetric due to DC offset and asymmetric noise characteristics. When the threshold was set to 0.5 in the experiment, an increased frequency of misclassification was observed, particularly in regions that should have been detected as ‘0’ but were incorrectly classified as ‘1’. Accordingly, the threshold value that minimizes the bit error rate (BER) was experimentally determined, and the lowest error rate was observed at 0.4. Therefore, this value was adopted in the proposed system.

In the first step, the preamble is detected from the downsampled bit stream to reorder the bit sequence of the transmitted data. Preamble detection is performed by computing the correlation between the received bit sequence and a predefined preamble pattern (P), and determining the point at which the maximum correlation value occurs as the starting point of the frame. Through this process, bit offsets that may occur during asynchronous reception are corrected. In addition, it was observed that, in some received data, false detection of the preamble led to unstable frame alignment, which can affect reconstruction performance. Therefore, in this study, the frame alignment state is verified based on the preamble detection results, and if necessary, additional alignment correction is performed to ensure the accuracy of signal reconstruction.

In the second hard-decision-based reconstruction stage, two upsampled samples corresponding to a single transmitted bit were downsampled into one symbol by applying the same averaging operation, defining

r

as the mean of the two samples. In this stage, the decision threshold was set to

γ = 0.55

based on the criterion of minimizing the BER.

{\hat{z}}_{i}^{n o r m} = \frac{{\hat{Q}}_{i}}{L}

(10)

Finally, the downsampled received bit stream

\hat{b}

is reconstructed in units of

N

bits to recover the quantized integer values

{\hat{Q}}_{i}

. Subsequently, the normalized latent variable

{\hat{z}}_{i}^{n o r m}

is obtained through the dequantization process defined in Equation (10), expressed as follows. Here,

L

denotes the quantization level, which is set to 256 during the transmission data generation process.

{\hat{z}}_{i}^{c l i p} = {\hat{z}}_{i}^{n o r m} \cdot (2 K σ_{i}) + (μ_{i} - K σ_{i})

(11)

Subsequently, linear inverse mapping is performed according to Equation (11) to restore the scale prior to clipping. This process corresponds to the inverse operation of the normalization and clipping procedures applied at the transmitter and is intended to statistically align the decoder input distribution with the latent distribution observed during training. Through this step, scale distortion introduced by quantization is compensated, and the statistical stability of the decoder input distribution is ensured, thereby preventing degradation in reconstruction performance.

\hat{z} = [{\hat{z}}_{1}^{c l i p}, {\hat{z}}_{2}^{c l i p}, \dots, z_{z_{d i m}}^{c l i p}]

(12)

Equation (12) represents the reconstructed latent variables assembled into a vector form across all dimensions, corresponding to the final restored latent vector fed into the decoder. The reconstructed latent vector

\hat{z}

is subsequently passed through the trained decoder to reconstruct the original image.

2.4. Reconstruction Performance Evaluation Metric

To quantitatively evaluate the quality of images reconstructed from the received signals in the proposed autoencoder-based Li-Fi system, the PSNR metric was employed. PSNR provides a quantitative measure of the similarity between the original and reconstructed images and is widely used for assessing the performance of image transmission systems [15]. PSNR is defined based on the maximum possible signal value and the pixel-wise difference between the reconstructed and original images. Its computation involves the Mean Squared Error (MSE) between the two images. A higher PSNR value indicates that the reconstructed image is more like the original image, and it is commonly adopted as a standard metric for evaluating image transmission quality.

M S E = \frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {[I (i, j) - K (i, j)]}^{2}

(13)

Equation (13) defines the computation of the MSE between the reconstructed image and the original image. Here,

I (i, j)

and

K (i, j)

denote the pixel values of the original and reconstructed images, respectively, while

m

and

n

represent the height and width of the image.

P S N R = 10 \cdot \log_{10} ({M A X}_{i}^{2} / M S E)

(14)

Equation (14) defines the PSNR based on the computed MSE value and the maximum pixel value of the image.

{M A X}_{i}

denotes the maximum possible pixel value in the image [16].

2.5. Comparison with Conventional Image-Based VLC Transmission Methods

In conventional VLC-based image transmission systems, images are typically compressed using standardized source coding techniques such as JPEG [ITU-T T.81] based on fixed linear transformations (DCT), and the resulting bitstream is modulated and transmitted through an optical channel [17]. Distortion and noise caused by the channel are compensated at the receiver using signal processing techniques such as equalization. In contrast, in this study, the input image is transformed into a low-dimensional latent representation through a learned nonlinear transformation using a CNN-based encoder, and then transmitted. This constitutes an integrated framework that considers transmission efficiency from the data representation stage. The proposed method improves data representation efficiency by compressing and representing the essential features of the image in the latent space. In addition, channel-induced distortion and noise are mitigated at the receiver using the statistical characteristics of the latent representation and the trained decoder, and the recovered latent representation is ultimately reconstructed into the original image. This approach differs from conventional compression based on fixed linear transformations (DCT) and enables more effective data compression.

In this section, the theoretical principles of the autoencoder structure and the data latent representation sampling and reconstruction methods, which are key components for improving communication efficiency and reliability in Li-Fi systems, are described along with corresponding equations. Furthermore, the PSNR metric used to quantitatively evaluate the quality of reconstructed images is explained in a mathematical form, and a comparison between conventional VLC methods and the proposed autoencoder-based latent representation communication method is presented. Based on this theoretical foundation, the next section describes in detail the transmitter and receiver system structure and methodology implementing the proposed scheme.

3. System Design and Research Method

3.1. System Overview

Figure 2 illustrates the overall configuration of the Li-Fi system implementing the proposed autoencoder-based latent representation sampling technique in an LED transmission link environment, while Figure 3 presents a photograph of the actual experimental transmission setup. The proposed Li-Fi system consists of an autoencoder-based signal processing module for transmitting and reconstructing latent-represented images, a power supply, an Arbitrary Waveform Generator (AFG) that converts digital data into voltage-level electrical signals, a white LED that modulates the electrical signal into an optical signal for transmission, a photodetector (PD) that converts the received optical signal back into an electrical signal, and an oscilloscope that analyzes the received signal. The AFG and the LED are electrically connected, while the LED and the PD are connected via an optical fiber. The output signal from the PD is fed into the oscilloscope for received signal analysis. In this study, the MNIST handwritten digit dataset was used as the input image source. MNIST consists of digit images ranging from 0 to 9, where each image is a 28 × 28-pixel grayscale single-channel image. Each pixel is represented as an 8-bit unsigned integer (uint8), resulting in an original image size of 784 (pixels) × 8 (bits) = 6272 bits per image. Although MNIST has a relatively simple structure, it is widely used to evaluate feature extraction and reconstruction performance in image-based models. Therefore, it is well-suited for validating the effectiveness of the proposed autoencoder-based transmission framework. In this experiment, 10 images were randomly selected in advance for each digit class, and the same image set was consistently used across different transmission distances and latent dimensional configurations.

According to Figure 2, the configuration of the proposed scheme is described as follows. A trained autoencoder consisting of four CNN layers in both the encoder and decoder was employed. The latent representation

z

, composed of features extracted from the input image by the encoder, is mapped into transmission data through quantization, NRZ–OOK modulation, and sampling processes. The generated transmission waveform is fed into the AFG and transmitted as an optical signal through a white LED. The transmitted optical signal is concentrated using a convex lens and directed to the PD. After being converted into an electrical signal by the PD, it is delivered to the oscilloscope and output as the received signal. The received signal undergoes hard-decision-based downsampling, preamble detection, dequantization, and linear inverse mapping using the statistical parameters of the latent variables (

μ_{i}, σ_{i}

) to reconstruct the restored latent vector

\hat{z}

. Subsequently,

\hat{z}

is fed into the decoder to reconstruct the original image.

3.2. Experimental Environment and Parameter Settings

3.2.1. Experimental Setup

The experiments were conducted under typical indoor lighting conditions without additional optical shielding, maintaining the ambient illumination environment. The AFG used in the experiment was the AFG-2100 manufactured by GW Instek (Taipei, Taiwan). As the optical transmitter, a white LED (LUW W5AM) from OSRAM (Munich, Germany) was employed. On the receiver side, a C5331 PD module from Hamamatsu (Hamamatsu, Japan) was used, and the received signals were measured using a Tektronix DPO7104 oscilloscope (Beaverton, OR, USA).

Table 3 summarizes the experimental parameters of the proposed autoencoder-based Li-Fi system. In the digital transmission parameters, the latent dimensionality

z_{d i m}

was set to 16, 32, 64, and 128 to comparatively analyze the changes in information representation efficiency and reconstruction performance according to different latent space dimensions. The quantization level was fixed at 256 (8 bits). This configuration was adopted to minimize the influence of quantization error, thereby enabling an independent analysis of the relationship between latent dimensionality and reconstruction performance. The number of samples per bit was set to 2, and with an AFG sampling frequency of 20 MHz, the resulting bit rate was determined to be 10 Mb/s. During the signal recovery process, the received signal was downsampled using hard-decision thresholds of

γ_{1} = 0.4

and

γ_{2} = 0.55

.

At the transmitter, reflecting the limited response characteristics of the illumination LED, the effective analog bandwidth (effective LED bandwidth) is observed to be below approximately 1 MHz. Accordingly, the effective modulation rate of the LED used in this study is limited to below approximately 1 MHz. The sampling rates set at the AFG are 1 MHz and 20 MHz, resulting in bit transmission rates of 0.5 Mb/s and 10 Mb/s, respectively. Considering that the LED bandwidth is limited to approximately 1 MHz, the 0.5 Mb/s condition corresponds to transmission within the channel bandwidth, whereas the 10 Mb/s condition exceeds the channel bandwidth and can be interpreted as an environment where inter-symbol interference (ISI) occurs. The corresponding apparent spectral efficiency is given as 0.5 bit/s/Hz and 10 bit/s/Hz, respectively, according to the defined metric based on system configuration. In this study, the apparent spectral efficiency is defined as (

R_{b} / B

), where (

R_{b}

) denotes the bit transmission rate and (

B

) denotes the effective bandwidth of the LED. This metric is a configuration-based ratio that does not reflect channel noise, distortion, or BER performance, and therefore does not represent the actually achievable spectral efficiency. Consequently, it is not directly comparable to Shannon capacity and is used in this study as an auxiliary metric to indicate the relative efficiency of transmission settings under the experimental conditions. The LED bias voltage is set to 3 V, and the LED is driven by applying an NRZ-OOK signal under conditions of 10 Vpp AFG output amplitude and 0 V offset.

In the channel and optical configuration, the transmission distance between the LED and the photodiode (PD) is set to 0.5 m, 1 m, and 1.5 m to emulate short- to mid-range indoor optical link environments. This is intended to quantitatively analyze the effect of reduced received optical power and channel degradation with increasing distance on reconstruction performance. The LED at the transmitter and the PD at the receiver are coaxially aligned to match their optical axes, and the transceivers are arranged in a facing configuration to maintain a line-of-sight (LOS) condition. Through this setup, additional optical loss due to misalignment is minimized, allowing only the channel characteristics corresponding to distance variation to be reflected. In addition, no additional optical filters, such as blue filters, are used, and a convex lens is applied at the receiver to focus the incident optical signal onto the PD.

At the receiver, the PD supply voltage was set to 5 V, and the received signal was measured using an oscilloscope with a sampling frequency of 100 MHz for each frame. 20,000 samples were acquired to ensure that the entire bit sequence was sufficiently captured.

3.2.2. Computational Complexity and Processing Time

In this study, the transmission and reception process in a Li-Fi environment is implemented through simulation, and the encoder and decoder are trained for 10 epochs for each latent dimension. During the training process, the input image is transformed into a latent representation through the encoder, and the corresponding latent vector passes through the channel after undergoing quantization, NRZ-OOK modulation, sampling, and AWGN addition. Subsequently, the received signal is demodulated through hard decision-based downsampling and inverse quantization, and a normalization-based reconstruction process is performed using the statistical characteristics of the encoder output latent variables. The recovered latent representation is used as the input to the decoder and is ultimately reconstructed into the original image. This process forms a structure in which the autoencoder is trained to reconstruct the original image from the latent representation corrupted by the channel environment.

Table 4 presents the training time according to the latent dimension and the average processing time (inference time) when the trained model is applied to the actual VLC process. The inference time is defined as the total processing time required to reconstruct the original image from the received signal, including demodulation, inverse quantization, normalization-based reconstruction, and decoder operations. The detailed computational structure and the applied optimization methods are described in Section 2.1. Training time shows a tendency to gradually increase as the latent dimension increases due to the growth in computational complexity.

4. Results

4.1. Transmission Efficiency and Reference Reconstruction Performance

G_{c a p} = \frac{B_{orig}}{B_{tx}} = \frac{784 \times 8}{Z_{\dim} \times 8} = \frac{784}{Z_{\dim}}

(15)

In this study, transmission capacity enhancement is defined as a concept that represents how much the information representation efficiency increases relative to the original data under the same physical Li-Fi link conditions. To quantify the transmission capacity enhancement, the effective transmission capacity gain

G_{cap}

is defined according to Equation (15). This metric indicates how much more efficiently information can be represented relative to the original data under the same transmission resources. Here,

B_{orig} = 784 \times 8

denotes the total number of bits in a single original MNIST image, and

B_{tx} = z_{d i m} \times 8

represents the total number of bits required to transmit the latent representation. Accordingly, when the latent dimension

z_{d i m}

is set to 16, 32, 64, and 128, the effective transmission capacity gain is calculated as 49×, 24.5×, 12.25×, and 6.125×, respectively. These results indicate that, by significantly reducing the number of transmission bits required for a single image through data compression, more information can be delivered within the same transmission resources. In particular, the proposed method achieves up to approximately a 49-fold improvement in information representation efficiency under the same physical link bandwidth conditions, suggesting that a substantially higher level of information delivery is possible under the same bit budget.

Table 5 summarizes the effective transmission capacity gain, the transmission size relative to the original image, and the reference PSNR excluding the optical channel effect according to the latent dimension for digit ‘5’, which corresponds to a medium level of structural complexity among the MNIST digit classes. In the case of structurally simple digits such as ‘1’, the reconstruction PSNR tends to be relatively high, whereas for digits such as ‘8’ with complex closed-loop structures, the reconstruction difficulty increases, resulting in lower PSNR. Accordingly, in this study, digit ‘5’, which can represent the average reconstruction characteristics of all digit classes, is selected as the analysis target.

The reference PSNR is a performance metric obtained by reconstructing the transmitted bitstream without passing through the optical channel via inverse quantization and the trained decoder, representing the baseline reconstruction performance in the absence of channel distortion. This serves as a baseline for quantitatively comparing the effects of channel degradation according to increasing transmission distance.

To quantitatively evaluate how much higher transmission efficiency the proposed method can achieve compared to conventional image compression while maintaining the same reconstruction performance, the number of transmission bits between the JPEG-based compression method and the proposed autoencoder-based latent representation compression method is compared under the same reference PSNR condition. In this study, the latent representation is quantized to 8 bits per dimension, and thus, the AE bit size is calculated as latent dimension × 8. In contrast, the JPEG bit size refers to the number of bits resulting from JPEG compression configured to achieve the same reconstruction quality as the reference PSNR obtained for each latent dimension. In addition, Gain over JPEG indicates how much higher data representation efficiency the proposed method achieves compared to conventional JPEG under the same reconstruction performance condition. For example, when the latent dimension is 16, a total of 128 bits achieves approximately 24 dB PSNR, which corresponds to about 25.9 times higher compression efficiency compared to JPEG compression achieving the same PSNR level. These results clearly demonstrate that the proposed autoencoder-based compression method can achieve significantly higher data representation efficiency and effective transmission capacity compared to conventional JPEG-based methods under the same reconstruction performance conditions.

As the latent dimension decreases, the effective transmission capacity increases, while the transmission size relative to the original image decreases. On the other hand, as the latent dimension decreases, the dimensionality of feature information that the encoder can represent becomes limited, resulting in a reduction in the amount of information that can be reconstructed. Consequently, the reference PSNR shows a gradual decreasing trend. This indicates that reducing the dimension of the latent representation to improve transmission efficiency may lead to a certain level of degradation in reconstruction quality, reflecting the fundamental trade-off between transmission efficiency and reconstruction performance.

4.2. Impact of Sampling Rate on System Performance

Figure 4 is presented to explain the reason for setting a modulation rate of 20 MHz, which exceeds the bandwidth of the LED in this study. Figure 4a shows the PSNR of reconstructed images according to transmission distance variations for latent dimensions

z_{d i m} = 16

and 128 at sampling rates of 1 MHz and 20 MHz, respectively. Under both conditions, when the transmission distance is 0.5 m, reconstruction performance comparable to the reference PSNR is achieved, and in this region, the representational capability of the latent representation serves as the dominant factor determining reconstruction performance.

However, as the transmission distance increases to 1.5 m, the increase in the number of transmitted bits due to a larger latent dimension (

z_{d i m} = 128

) amplifies the error accumulation effect, resulting in a degradation of reconstruction performance, i.e., a performance inversion phenomenon. This can be interpreted as the increased probability of error accumulation under channel degradation conditions when the number of transmitted bits increases. In particular, under the 1.5 m condition at 20 MHz, a significant decrease in PSNR is observed, which can be attributed to the combined effects of ISI caused by high-speed transmission exceeding the LED modulation bandwidth and increased noise due to the reduction in received signal power with distance.

In contrast, when the transmission distance is 1 m or less, the 20 MHz setting exhibits superior reconstruction performance across all latent dimension conditions. The reason for this can be confirmed from Figure 4b. Figure 4b shows the BER performance according to the sampling rate when the transmission distance is 1 m. The BER is calculated by applying a hard decision process to the received signal and comparing the recovered binary bits with the originally transmitted data. As a result, the 20 MHz condition shows lower BER than the 1 MHz condition for all latent dimensions, indicating more reliable signal detection.

Under channel conditions where the difference in reconstruction performance is not significant, the 20 MHz (10 Mb/s) setting provides approximately 20 times higher bit transmission rate than the 1 MHz (0.5 Mb/s) setting, allowing more information to be transmitted under the same physical link resources. Therefore, under such conditions, the 20 MHz setting becomes a more advantageous choice in terms of information transmission efficiency in bandwidth-limited Li-Fi environments. Accordingly, in this study, the sampling rate is set to 20 MHz, and performance analysis is conducted under this condition with respect to transmission distance and latent dimension variations.

4.3. Distance-Dependent Performance Under Enhanced Transmission Capacity

In this section, under the effective transmission capacity gain conditions based on the latent dimension defined in Section 4.1 and the sampling rate of 20 MHz selected in Section 4.2, the variation in reconstruction performance with increasing transmission distance is analyzed when passing through a practical optical channel. The experiments are conducted under the same optical transmission and reception environment, and the reconstruction PSNR is measured by varying the transmission distance to 0.5 m, 1.0 m, and 1.5 m for each latent dimension.

Figure 5 illustrates the variation in reconstruction PSNR as a function of transmission distance for different latent dimensions. A gradual decline in PSNR is observed across all latent dimensions as the transmission distance increases. At a short distance of 0.5 m, the reconstruction performance for all latent dimensions converges to their respective reference PSNR values, indicating that the received signal strength is sufficient to maintain a nearly error-free link.

However, as the distance increases to 1.5 m, a distinct performance inversion phenomenon is observed. While the configuration with

z_{d i m}

= 128 (6.125 × capacity gain) provides the highest fidelity in ideal conditions, it exhibits the most significant PSNR degradation under channel impairment. In contrast, the case of

z_{d i m}

= 16 (49 × capacity gain) demonstrates superior robustness, maintaining a relatively stable reconstruction quality. This behavior is primarily attributed to the accumulation of bit errors during the dequantization and decoding processes. A larger latent dimension requires the transmission of a greater number of bits; consequently, for a given bit error rate (BER), the probability of multiple bit errors occurring within a single latent vector increases. These errors propagate through the nonlinear decoder, leading to severe structural distortion in the reconstructed image. These results confirm that reducing the latent dimensionality not only enhances transmission capacity but also serves as an effective strategy to mitigate error propagation in power-limited Li-Fi channels.

Therefore, the experimental results confirm that reducing the latent dimensionality can achieve up to a 49-fold increase in transmission capacity while simultaneously mitigating the impact of error accumulation under channel-degraded conditions. This finding suggests that the latent representation-based transmission scheme not only improves information representation efficiency but also serves as an effective transmission strategy in practical optical channel environments. These observations indicate that the latent representation-based transmission approach can be effectively applied in real optical channel environments beyond simply improving the efficiency of information representation.

4.4. BER Performance Across Latent Dimensions and Transmission Distance

In this section, to explain the reconstruction PSNR results observed for different latent dimensions in Section 4.3, the BER characteristics according to transmission distance variations are analyzed together.

Figure 6 shows the BER performance as a function of transmission distance for different latent dimensions under the condition where the sampling rate is set to 20 MHz, corresponding to a bit transmission rate of 10 Mb/s. The BER is calculated by comparing the recovered bit sequence, obtained through threshold-based hard decision applied to the received analog signal for all digit classes, with the originally transmitted bit sequence.

As the transmission distance increases, the BER shows an increasing trend due to channel attenuation and noise. At short distances (0.5 m and 1 m), relatively low BER is maintained even at high latent dimensions (

z_{d i m}

= 128), indicating that under favorable channel conditions, the probability of error occurrence remains limited despite the increased number of transmitted bits. However, at a distance of 1.5 m, a sharp increase in BER is observed for high latent dimensions. In particular, when the latent dimension is 128, the BER increases significantly, indicating that high-dimensional latent representations are not reliably transmitted under channel degradation conditions. These results are consistent with the previously observed performance inversion phenomenon caused by error accumulation, demonstrating that lower latent dimensions exhibit more robust transmission performance in degraded channel environments.

Table 6 presents the BER and throughput according to transmission distance and latent dimension based on the data in Figure 6. In this study, throughput refers to the effective information delivery, calculated by reflecting the ratio of successfully received bits without error to the total number of transmitted bits.

At short transmission distances (0.5 m), even when a high latent dimension (

z_{d i m} = 128

) is used, the impact on transmission efficiency is limited because the BER remains very low, resulting in a similar level of transmission efficiency. In contrast, in severely degraded channel conditions (1.5 m), a high latent dimension leads to higher BER and a corresponding reduction in throughput, quantitatively confirming that low-dimensional latent representations exhibit greater robustness under channel degradation conditions.

4.5. Class-Dependent Reconstruction Performance Across Latent Dimensions

In this section, we compare the class-wise PSNR of reconstructed images for latent dimensions of 16 and 128 at transmission distances of 0.5 m and 1.5 m, in relation to the BER results presented in Section 4.4.

In this section, reconstruction PSNR characteristics across digit classes were compared for latent dimensions of 16 and 128 under transmission distances of 0.5 m and 1.5 m. This analysis focuses on the interaction between image structural complexity and channel conditions in determining reconstruction performance. Figure 7 provides a comparative analysis of PSNR across different digit classes to evaluate the impact of image structural complexity on reconstruction performance.

Under the 0.5 m condition (Figure 7a), reconstruction quality is primarily governed by the representational capacity of the latent space. Digit classes with high structural complexity, such as ‘8’, which contains intersecting loops, consistently exhibit lower PSNR compared to simpler structures like ‘1’. In this regime, the configuration with the larger latent dimension (

z_{d i m}

= 128) outperforms the smaller one (

z_{d i m}

= 16) across all digit classes by preserving finer morphological details of the images. The representative reconstructed images also confirm that the overall shapes are well preserved for both latent dimensions; however, the case with

z_{d i m}

= 128 exhibits clearer contours and more distinct structural details than the case with

z_{d i m}

= 16.

However, in the channel-degraded environment at 1.5 m (Figure 7b), the determinant of performance shifts from representational capacity to error accumulation characteristics. A universal performance reversal occurs where the configuration with

z_{d i m}

= 16 achieves higher PSNR than

z_{d i m}

= 128 for all digit classes. The representative reconstructed images also support this observation: while the case with

z_{d i m}

= 128 has noticeable boundary distortions and loss of fine structural details, the configuration with

z_{d i m}

= 16 maintains a relatively stable overall shape despite its limited detail representation. Notably, the PSNR variation across different digits becomes less pronounced for

z_{d i m}

= 16, suggesting that lower-dimensional latent representations are less sensitive to the structural complexity of the input data under low SNR conditions. These findings suggest that for robust Li-Fi transmission, the latent dimensionality must be jointly optimized with the prevailing channel conditions to balance feature preservation and error resilience.

Overall, the experimental results confirm that differences in the number of transmitted bits, determined by latent dimensionality, can alter the dominant factor governing reconstruction performance depending on channel conditions. Under channel-degraded environments, the suppression of error accumulation may become more critical to reconstruction performance than higher representational capacity. This finding suggests that the design of latent dimensionality for transmission capacity enhancement should be jointly optimized in consideration of the prevailing channel conditions.

These results demonstrate that the proposed latent representation-based transmission scheme effectively improves transmission efficiency while maintaining robust reconstruction performance under various channel conditions, highlighting its practical applicability in bandwidth-limited Li-Fi systems.

5. Discussion

The performance inversion phenomenon observed under different channel conditions in this study is not merely a result of specific experimental settings, but rather indicates a fundamental structural trade-off inherent in latent representation-based transmission systems. This implies that transmission efficiency is determined not simply by the bit rate, but by the balance between the representational capability of the data and the error robustness of the channel. From this perspective, the generalization capability of the proposed framework and its scalability to other data types are discussed as follows:

Analysis of BER Characteristics under Different Sampling Rates

In this system, a phenomenon is observed in which the BER at a sampling rate of 20 MHz is lower than that at 1 MHz. The cause of this phenomenon originates from the signal distortion characteristics occurring at a low sampling rate (1 MHz, 0.5 Mb/s). Under the 1 MHz condition, there exist intervals where the same bit is maintained for a long duration, during which DC imbalance and baseline wander cause a drift phenomenon in which the signal baseline gradually changes. As a result, errors occur in which a bit that should be recovered as ‘0’ is intermittently detected as ‘1’. Unlike ISI, which is influenced by neighboring bits, these errors occur in regions where the same bit persists, making them difficult to compensate through averaging effects during the downsampling process, and consequently acting as a major factor increasing the bit error rate (BER). In contrast, under the 20 MHz condition, although ISI exists due to the bandwidth limitation of the LED, the drift phenomenon is relatively reduced, enabling more stable signal detection based on the average value. As a result, overall lower BER performance is observed.

2.: Impact of Error Characteristics on Reconstruction Performance

According to the reconstruction framework of this system, the characteristics of errors that cause BER can also have a significant impact on reconstruction performance. Errors caused by the influence of neighboring bits, such as ISI, tend to exhibit a certain level of correlation, and such structured distortions may be partially compensated through the normalization-based reconstruction process and the trained decoder, which leverage the statistical distribution of latent variables.

In contrast, errors caused by baseline drift occur independently in specific intervals and take the form of unstructured errors, which may alter the statistical distribution of the latent representation itself. In such cases, the mismatch between the learned data distribution of the decoder and the distorted input may increase, making it difficult to effectively mitigate errors during the reconstruction process. This difference suggests that even at the same BER level, the impact on reconstruction performance (PSNR) can vary depending on the type of error.

3.: Trade-off between Representation Capacity and Error Accumulation

The performance inversion phenomenon observed in this study demonstrates that a structural trade-off exists between representational capability and channel error robustness in latent representation-based transmission systems. As the latent dimension increases, the representational capability of the input data improves, enabling superior reconstruction performance. However, under channel degradation conditions, as the number of transmitted bits increases, the impact of error accumulation becomes more pronounced. Due to this characteristic, in certain conditions, a larger latent dimension may instead lead to degraded reconstruction performance, resulting in a performance inversion phenomenon.

This indicates that the selection of the latent dimension cannot be determined solely based on improving representational capability, but must also consider the characteristics of error accumulation depending on channel conditions. Therefore, the optimal latent dimension may vary according to the channel environment and should be determined based on a balance between representational capability and error robustness.

4.: Applicability to Other Data Types

The proposed framework employs an autoencoder-based structure that learns the statistical characteristics of input data, and therefore has the potential to be extended to various data types beyond images. In particular, it can be applied to data with spatial or temporal correlations, such as video data or time-series signals.

However, in the case of data with limited structural characteristics, such as random bit sequences, performance degradation may occur due to the lack of learnable patterns. Therefore, further studies are required to investigate the applicability of the proposed framework to various data types.

5.: Generalization to More Complex Datasets

This study is conducted based on a dataset with relatively simple structures, such as MNIST, and the generalization capability to more complex datasets is an important consideration.

Although MNIST and CIFAR-10 differ in terms of structural complexity, they share the common characteristic of being image data with spatial correlations. Datasets such as CIFAR-10 require higher representational capability as they include color information and diverse textures; however, the performance inversion phenomenon observed in this study is not dependent on the simplicity of a specific dataset, but rather originates from the structural characteristics arising from the interaction between latent dimension and channel noise.

In other words, as the latent dimension increases, the representational capability improves, but at the same time, the number of transmitted bits increases, leading to greater error accumulation under noisy conditions, which may degrade reconstruction performance under certain conditions. This trade-off can be interpreted as a general characteristic that may appear regardless of the data type. However, as the complexity of the data increases, the required representational capability changes, and thus the threshold conditions under which performance inversion occurs may also vary. Accordingly, further validation on diverse datasets is required in future work.

6.: Implications for Transmission Capacity Enhancement

The latent representation-based transmission scheme proposed in this study demonstrates that, by reducing the number of transmitted bits per image under the same physical bandwidth conditions, up to a 49-fold improvement in information representation efficiency can be achieved compared to conventional pixel-based image representation. While conventional image transmission methods convert input data into bitstreams through JPEG compression for transmission, the proposed framework transforms the key features of the data into compressed latent representations using an autoencoder and transmits them. As a result, similar levels of reconstruction performance can be achieved with fewer transmitted bits under the same channel conditions, thereby improving the efficiency of information delivery per unit bandwidth.

Furthermore, the performance inversion phenomenon observed in this study suggests that not only reducing the number of transmitted bits, but also selecting an optimal level of representation according to channel conditions, is an important factor for improving transmission efficiency. In other words, the latent dimension should not be treated as a fixed value, but rather be adaptively determined based on the channel environment. This demonstrates that, in bandwidth-limited Li-Fi systems, transmission efficiency can be improved not only by increasing the bit transmission rate, but also through the optimization of data representation.

6. Conclusions

In this study, to overcome the limitation of transmission efficiency caused by the physical bandwidth constraint of LEDs in Li-Fi systems, an autoencoder (AE)-based latent representation transmission framework is proposed and experimentally verified. By leveraging the feature extraction capability of a CNN-based encoder, it is confirmed that the information representation efficiency can be significantly improved within the same bandwidth without hardware modification or additional physical bandwidth expansion.

In addition, it is observed that, in latent representation-based transmission systems, the dominant factor determining reconstruction performance shifts from representational capability to error accumulation depending on channel conditions. In particular, in short transmission distance environments, the difference in reconstruction performance according to sampling rate is not significant, while a higher sampling rate provides the advantage of transmitting more information under the same resources.

Therefore, adaptively setting the latent dimension and transmission conditions according to the channel environment is important for maximizing transmission efficiency, indicating that efficient information delivery can be achieved through data representation optimization in bandwidth-limited environments. These findings are supported by the experimental results presented in Section 4, which validate the effectiveness of the proposed approach under practical optical link conditions.

The key findings and academic contributions of this research are summarized as follows:

Significant Capacity Enhancement: By defining the effective transmission capacity gain (G_cap), we confirmed that the proposed scheme achieves up to a 49-fold increase in information representation efficiency compared to conventional image transmission methods.
Discovery of Performance Inversion: The experimental results over practical optical links (0.5–1.5 m) revealed a critical trade-off between latent dimensionality and channel robustness. While higher-dimensional latent spaces ( $z_{d i m}$ = 128) offer superior reconstruction fidelity in high SNR environments, lower-dimensional representations ( $z_{d i m}$ = 16) exhibit higher resilience under severe channel degradation.
Shift in Performance Determinants: We identified that the dominant factor governing reconstruction quality shifts from representational capacity (structural complexity of the data) to error accumulation characteristics (number of transmitted bits) as the channel condition deteriorates.
Feasibility for Future Communication Paradigms: This work aligns with the emerging field of semantic communications, where the focus of transmission shifts from bit-level accuracy to meaningful feature delivery. The proposed framework provides a robust foundation for intelligent wireless systems that must operate within constrained physical resources.

For future work, the proposed framework will be extended by combining it with multi-carrier modulation schemes such as OFDM. In addition, applying deep learning-based signal separation techniques at the receiver is expected to enable further improvement in transmission efficiency through bandwidth sharing.

Meanwhile, in this study, fixed thresholds and static scaling are used as a configuration to verify the baseline performance of the proposed framework. However, in practical Li-Fi environments, channel conditions may vary depending on time, transmission distance, obstacles, and ambient light interference. Therefore, future work needs to extend the framework to an end-to-end learning structure that jointly optimizes the encoder–channel–decoder by incorporating such channel conditions.

In conclusion, the proposed latent representation-based transmission strategy can serve as a promising approach for next-generation high-capacity Li-Fi applications, such as smart homes, IoT networks, and real-time high-resolution media streaming.

Author Contributions

Conceptualization, S.K. and Y.-Y.W.; Methodology, S.K.; Software, S.K.; Validation, S.K.; Formal analysis, S.K.; Investigation, S.K. and J.P.; Data curation, S.K. and J.P.; Writing—original draft, S.K.; Writing—review & editing, S.K., Y.-Y.W. and J.P.; Visualization, S.K. and J.P.; Supervision, Y.-Y.W.; Project administration, Y.-Y.W.; Funding acquisition, Y.-Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Regional Innovation System Education (RISE) program jointly funded by the Ministry of Education of the Republic of Korea and Gyeonggi Province, and implemented through the Gyeonggi RISE Center (2025-RISE-09-A15).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cui, Q.; You, X.; Wei, N.; Nan, G.; Zhang, X.; Zhang, J.; Lyu, X.; Ai, M.; Tao, X.; Feng, Z.; et al. Overview of AI and communication for 6G network: Fundamentals, challenges, and future research opportunities. Sci. China Inf. Sci. 2025, 68, 171301. [Google Scholar] [CrossRef]
Haas, H.; Imran, M.R.A.; Zidan, M.S.A. Principles of LED Light Communications: Towards Networked Li-Fi; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Alsabah, M.; Abdulrazzaq Naser, M.; Mahmood, B.M.; Abdulhussain, S.H.; Eissa, M.R.; Al-Ali, A.M.; Saif, S.M.; Al-Utaibi, K.A.; Alafif, M.F.; Hashim, D.A. 6G wireless communications networks: A comprehensive survey. IEEE Access 2021, 9, 148191–148243. [Google Scholar] [CrossRef]
Politi, C.T.; Serpi, H.; Tselios, C.; Denazis, S. Integrated sensing, communication and lighting visible light communication for indoor 6G networks. In Proceedings of the 24th International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 14–18 July 2024; pp. 1–4. [Google Scholar] [CrossRef]
Rehman, S.U.; Ullah, S.; Chong, P.H.J.; Yongchareon, S.; Komosny, D. Visible light communication: A system perspective—Overview and challenges. Sensors 2019, 19, 1153. [Google Scholar] [CrossRef] [PubMed]
Singh, S.; Kakamanshadi, G.; Gupta, S. Visible light communication—An emerging wireless communication technology. In Proceedings of the 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS), Chandigarh, India, 21–22 December 2015; pp. 1–3. [Google Scholar] [CrossRef]
Mana, S.M.; Jungnickel, V.; Bober, K.L.; Hellwig, P.; Hilt, J.; Schulz, D.; Paraskevopoulos, A.; Freund, R.; Hirmanova, K.; Janca, R.; et al. Distributed multiuser MIMO for LiFi: Experiments in an operating room. J. Light. Technol. 2021, 39, 5730–5743. [Google Scholar] [CrossRef]
Yu, T.-C.; Huang, W.-T.; Lee, W.-B.; Chow, C.-W.; Chang, S.-W.; Kuo, H.-C. Visible light communication system technology review: Devices, architectures, and applications. Crystals 2021, 11, 1098. [Google Scholar] [CrossRef]
O’Shea, T.; Hoydis, J. An introduction to deep learning for the physical layer. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 563–575. [Google Scholar] [CrossRef]
Song, J.; Zhang, W.; Zhou, L.; Zhou, X.; Sun, J.; Wang, C.-X. A new light source of VLC combining white LEDs and RGB LEDs. In Proceedings of the IEEE/CIC International Conference on Communications in China (ICCC), Qingdao, China, 22–24 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Huang, Y.; Guo, Z.; Wang, X.; Li, H.; Xiang, D. GaN-based high-response frequency and high-optical power matrix micro-LED for visible light communication. IEEE Electron Device Lett. 2020, 41, 1536–1539. [Google Scholar] [CrossRef]
Gutema, T.Z.; Haas, H.; Popoola, W.O. Bias point optimisation in LiFi for capacity enhancement. J. Light. Technol. 2021, 39, 5021–5027. [Google Scholar] [CrossRef]
Le Minh, H.; O’Brien, D.; Faulkner, G.; Zeng, L.; Lee, K.; Jung, D. 100-Mb/s NRZ visible light communications using a postequalized white LED. IEEE Photonics Technol. Lett. 2009, 21, 1063–1065. [Google Scholar] [CrossRef]
Karunatilaka, D.; Zafar, F.; Kalavally, V.; Parthiban, R. LED based indoor visible light communications: State of the art. IEEE Commun. Surv. Tutor. 2015, 17, 1649–1678. [Google Scholar] [CrossRef]
Tandra, R.; Sahai, A. SNR walls for signal detection. IEEE J. Sel. Top. Signal Process. 2008, 2, 4–17. [Google Scholar] [CrossRef]
Horé, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
ITU-T. Information Technology—Digital Compression and Coding of Continuous-Tone Still Images (JPEG); Recommendation T.81; ITU-T: Geneva, Switzerland, 1992. [Google Scholar]

Figure 1. Autoencoder Architecture Diagram.

Figure 2. Block Diagram of the Proposed Latent-Domain Autoencoder-Based Li-Fi System.

Figure 3. Photograph of the Experimental Li-Fi Transmission Setup.

Figure 4. Impact of sampling rate and latent dimension on reconstruction performance: (a) PSNR versus transmission distance, (b) BER versus sampling rate.

Figure 5. Reconstruction PSNR versus transmission distance across latent dimensions (Digit 5).

Figure 6. BER performance as a function of transmission distance for different latent dimensions under a 20 MHz sampling rate.

Figure 7. Comparison of PSNR across digit classes for latent dimensions of 16 and 128 at transmission distances of (a) 0.5 m and (b) 1.5 m.

Table 1. Comparison of Existing Studies and the Proposed Work for Transmission Capacity Enhancement in Visible Light Communication Systems.

Comparison Item	[10] RGB-LED	[11] GaN μLED	[12] DC Bias Optimization	[13] Linear Equalizer	Proposed Work (AE-Based)
Problem Solving Strategy	Modification of LED light source structure	High-speed enhancement of LED device itself	Optimization of LED driving voltage	Linear compensation of received signal	Transmission capacity enhancement within the limited bandwidth of the LED
Hardware Modification Required	Yes	Yes	No	No	No
Target Physical Limitation	Bandwidth limitation due to phosphor	LED response speed limitation	LED nonlinear distortion	Channel-induced signal distortion	Bandwidth limitation due to phosphor
Transmission Capacity Enhancement Approach	Separation of illumination and high-speed communication paths using RGB-LED-based light source	High-frequency modulation enabled by GaN-based blue μLED array structure	DC bias optimization considering LED nonlinearity	Compensation of low-frequency LED response using post-equalization for high-speed transmission	Optimization of transmission data size and representation method using learning-based latent representation
Key Research Contribution	Bandwidth extension to tens of MHz compared to white LEDs	Achieved modulation bandwidth up to 401 MHz	Approximately 25% channel capacity improvement	Achieved 100 Mb/s NRZ transmission	Signal reconstruction achieved after transmitting 4% of the original data over a practical Li-Fi link

Table 2. Comparison between conventional autoencoder-based communication systems and the proposed representation-based transmission framework.

Category	Conventional AE-Based Communication Systems [9]	Proposed Method
Learning Framework	End-to-end encoder–channel–decoder learning	Representation learning using an encoder–decoder, followed by a separate communication system
Channel Modeling	AWGN channel included in joint optimization	Channel excluded from training (AWGN added to latent)
Channel Adaptivity	Channel-adaptive representation learning	No channel-adaptive learning (future work)
Modulation Scheme	Learned constellation representation (alternative to QPSK)	Practical NRZ-OOK modulation for VLC systems
Input Data	Random one-hot messages	Real image data (28 × 28 grayscale, 0–255)
Representation	High-dimensional continuous latent vectors forming constellations	Low-dimensional latent representation for compression
Problem Formulation	Classification (BER minimization)	Reconstruction (PSNR + efficiency)
System Objective	Channel robustness	Improving information representation efficiency in bandwidth-limited VLC systems
Practicality	Idealized communication model	Reflects a practical LED-based VLC environment

Table 3. Experimental Parameters of the Proposed Li-Fi System.

Category	Parameter	Value	Unit
Digital Signal Processing	Latent dimension $z_{d i m}$	16, 32, 64, 128	-
	Quantization level $L$	256	-
	Quantization bits $N$	8	bits
	Samples per bit (sps)	2	samples/bit
	Preamble length	32	bits
	Hard Decision threshold $γ_{1}$	0.4	-
	Hard Decision threshold $γ_{2}$	0.55	-
Transmitter	Effective LED bandwidth	$\leq 1$	MHz
	AFG sampling rate $f_{s}$	1, 20	MHz
	Bit rate $R_{b}$	0.5, 10	Mb/s
	Apparent Spectral Efficiency	0.5, 10	bit/s/Hz
	LED bias voltage	3	V
	AFG output amplitude	10	Vpp
	AFG output offset	0	V
Channel/Optical Setup	Transmission distance	0.5, 1, 1.5	m
	Optical alignment	Coaxial (LOS)	-
	Optical filtering	None	-
	Lens	Convex lens	-
Receiver	PD supply voltage	5	V
	Oscilloscope sampling rate	100	MHz
	Captured waveform length	20,000	samples

Table 4. Training and inference time across different latent dimensions.

Latent Dimension ( $z_{d i m}$ )	Training Time	Inference Time (s/Image)
16	34 min	2.5
32	48 min	2.5
64	1.5 h	2.5
128	<4 h	2.5

Table 5. Transmission efficiency and reference PSNR across latent dimensions (Digit 5).

Latent Dimension ( $z_{d i m}$ )	AE Bit Size (Bits)	JPEG Bit Size (Bits)	Gain over JPEG (×)	Effective Capacity Gain (×)	Normalized Data Size (%)	Reference PSNR (dB)
16	128	3321	25.9	49	2.04	24.35
32	256	3784	14.8	24.5	4.08	28.58
64	512	4057	7.9	12.25	8.16	30.47
128	1024	4148	4.05	6.125	16.33	31.29

Table 6. BER and throughput at representative distances (0.5 m and 1.5 m).

Distance (m)	Latent Dimension ( $z_{d i m}$ )	BER	Throughput (Mb/s)
0.5	16	1.00 × 10⁻⁴	9.99
0.5	128	1.00 × 10⁻³	9.999
1.5	16	0.0288	9.71
1.5	128	0.0624	9.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, S.; Won, Y.-Y.; Park, J. Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links. Photonics 2026, 13, 356. https://doi.org/10.3390/photonics13040356

AMA Style

Kim S, Won Y-Y, Park J. Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links. Photonics. 2026; 13(4):356. https://doi.org/10.3390/photonics13040356

Chicago/Turabian Style

Kim, Serin, Yong-Yuk Won, and Jiwon Park. 2026. "Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links" Photonics 13, no. 4: 356. https://doi.org/10.3390/photonics13040356

APA Style

Kim, S., Won, Y.-Y., & Park, J. (2026). Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links. Photonics, 13(4), 356. https://doi.org/10.3390/photonics13040356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Capacity-Enhanced Li-Fi Transmission Using Autoencoder-Based Latent Representation: Performance Analysis Under Practical Optical Links

Abstract

1. Introduction

2. Autoencoder-Based Latent Representation Sampling Scheme

2.1. Autoencoder Architecture

2.2. Transmission Data Generation and Modulation Process

2.3. Received Data Reconstruction and Post-Processing

2.4. Reconstruction Performance Evaluation Metric

2.5. Comparison with Conventional Image-Based VLC Transmission Methods

3. System Design and Research Method

3.1. System Overview

3.2. Experimental Environment and Parameter Settings

3.2.1. Experimental Setup

3.2.2. Computational Complexity and Processing Time

4. Results

4.1. Transmission Efficiency and Reference Reconstruction Performance

4.2. Impact of Sampling Rate on System Performance

4.3. Distance-Dependent Performance Under Enhanced Transmission Capacity

4.4. BER Performance Across Latent Dimensions and Transmission Distance

4.5. Class-Dependent Reconstruction Performance Across Latent Dimensions

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI