Research Progress on Modulation Format Recognition Technology for Visible Light Communication

Zhou, Shengbang; Du, Weichang; Li, Chuanqi; Liu, Shutian; Li, Ruiqi

doi:10.3390/photonics12050512

Open AccessReview

Research Progress on Modulation Format Recognition Technology for Visible Light Communication

by

Shengbang Zhou

^1,*,†,

Weichang Du

^1,†

,

Chuanqi Li

¹,

Shutian Liu

^2,3 and

Ruiqi Li

¹

Guangxi Key Laboratory of Functional Information Materials and Intelligent Information Processing, Nanning Normal University, Nanning 530001, China

²

Guangxi Geographical Indication Crops Research Center of Big Data Mining and Experimental Engineering Technology, Nanning Normal University, Nanning 530001, China

³

Guangxi Key Laboratory of Earth Surface Processes and Intelligent Simulation, Nanning Normal University, Nanning 530001, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Photonics 2025, 12(5), 512; https://doi.org/10.3390/photonics12050512

Submission received: 4 April 2025 / Revised: 9 May 2025 / Accepted: 13 May 2025 / Published: 19 May 2025 / Corrected: 29 July 2025

Download

Browse Figures

Versions Notes

Abstract

As sixth-generation mobile communication (6G) advances towards ultra-high speed and global coverage, visible light communication (VLC) has emerged as a crucial complementary technology due to its ultra-high bandwidth, low power consumption, and immunity to electromagnetic interference. Modulation format recognition (MFR) plays a vital role in the dynamic optimization and adaptive transmission of VLC systems, significantly influencing communication performance in complex channel environments. This paper systematically reviews the research progress in MFR for VLC, comparing the theoretical frameworks and limitations of traditional likelihood-based (LB) and feature-based (FB) methods. It also explores the advancements brought by deep learning (DL) technology, particularly in enhancing noise robustness, classification accuracy, and cross-scenario adaptability through automatic feature extraction and nonlinear mapping. The findings indicate that DL-based MFR substantially enhances recognition performance in intricate channels via multi-dimensional feature fusion, lightweight architectures, and meta-learning paradigms. Nonetheless, challenges remain, including high model complexity and a strong reliance on labeled data. Future research should prioritize multi-domain feature fusion, interdisciplinary collaboration, and hardware–algorithm co-optimization to develop lightweight, high-precision, and real-time MFR technologies that align with the 6G vision of space–air–ground–sea integrated networks.

Keywords:

visible light communication; modulation format identification; deep learning; feature extraction

1. Introduction

Research on sixth-generation mobile communication (6G) is rapidly advancing. The 6G network is anticipated to deliver ultra-high speed—ranging from 100 to 1000 times that of 5G—along with ultra-low latency and global coverage capabilities, effectively addressing the needs of emerging applications such as Industry 4.0, virtual reality, and smart healthcare [1]. However, traditional radio frequency (RF) communication encounters significant limitations concerning spectrum resources, transmission speed, and coverage range. In this context, visible light communication (VLC) has emerged as a vital complementary technology for the 6G heterogeneous network due to its ultra-high bandwidth of 400–800 THz, absence of electromagnetic interference, and low power consumption [2]. VLC transmits data by modulating the brightness of LED light sources, enabling high-speed communication while simultaneously fulfilling lighting functions. This technology is particularly suitable for specialized coverage scenarios, including indoor positioning, vehicular communication, and underwater link, thus contributing to the realization of the 6G vision of “integrated space-air-ground-sea” communication coverage. VLC systems utilizing multi-color integrated LEDs have achieved transmission rates of 23.43 Gb/s [3], while underwater visible light communication (UVLC) systems can also facilitate the real-time transmission of 25 Gb/s in turbid waters [4], thereby demonstrating the technical potential of VLC.

The performance of VLC systems is significantly influenced by the design and optimization of modulation techniques. Various modulation formats have been effectively applied in VLC systems, including on–off Keying (OOK) [5], pulse amplitude modulation (PAM) [6], quadrature amplitude modulation (QAM) [7], orthogonal frequency division multiplexing (OFDM) [8], etc. Each modulation format possesses distinct advantages and disadvantages regarding spectral efficiency, bandwidth capacity, noise resistance, and implementation complexity. While OOK is straightforward, it suffers from low spectral efficiency, whereas high-order QAM can enhance capacity but requires more rigorous channel equalization [9]. Consequently, it is essential to adopt adaptive modulation format-switching methods tailored to different application scenarios and complex channel environments, such as dynamically adjusting modulation orders or encoding methods to optimize transmission efficiency and enhance robustness under low signal-to-noise ratio (SNR) conditions.

Modulation format recognition (MFR) technology serves as a core prerequisite for enabling adaptive demodulation. Its fundamental principle involves analyzing and processing received signals to capture underlying patterns or characteristics, which are then used to determine the signal’s modulation format through threshold-based decision making or classifier-based judgments. Essentially, MFR is a form of pattern recognition. The basic principle of MFR technology in VLC is illustrated in Figure 1. Common signal analysis methods include likelihood functions and their maximum likelihood estimation, as well as feature construction based on time–frequency representations, constellation diagrams, and statistical measures, etc. In recent years, with the advancement of deep learning, automatic feature learning has become feasible. MFR technology provides the necessary foundation for subsequent adaptive demodulation, playing a pivotal role in optimizing the performance of VLC systems. It is noteworthy that MFR technology has been widely applied in traditional radio communication domains, such as spectrum sensing, cognitive radio [10], the Internet of Things (IoT) [11], and electronic warfare [12]. However, research on MFR in the VLC domain is still relatively limited. Therefore, advancing MFR technology research within the VLC field is crucial, as it can further enhance VLC communication capacity and spectral efficiency. A systematic review of the fundamental architecture, developmental trajectory, and unique challenges associated with modulation format recognition techniques in VLC systems will facilitate the identification of future research directions for this technology.

Modulation format identification in current VLC systems grapples with challenges arising from channel complexities, including LED nonlinearity [13], multipath scattering, and background noise. The light-scattering effect in turbid waters can lead to a significant increase in the bit error rate [14], while real-time scenarios, such as vehicle networking, demand algorithms that prioritize low latency and high computational efficiency. Traditional likelihood-based (LB) and feature-based (FB) methods often suffer from insufficient robustness and high computational complexity in dynamic channels. In recent years, deep learning (DL) has transcended the limitations of traditional models, which rely on prior mathematical assumptions, by automatically learning the intricate relationships between channel features and modulation formats, demonstrating significant advantages.

This paper systematically reviews existing research and the latest advancements in VLC modulation format identification technology within the context of 6G. It focuses on analyzing various feature-extraction methods, as well as the potential and challenges of machine learning algorithms. By comparing the performance differences between traditional methods and DL approaches, this paper highlights the significant improvements in noise robustness, classification accuracy, and cross-scenario adaptability achieved through DL’s automatic feature extraction and nonlinear mapping capabilities. Furthermore, it explores the adaptability of different algorithms in dynamic channel environments, particularly emphasizing the innovative applications of multi-domain feature fusion, lightweight architectures, and meta-learning paradigms in developing lightweight, high-precision, and real-time MFR technologies. These advancements are crucial for enhancing the performance of 6G space–air–ground–sea integrated networks. Future research directions are also envisaged in alignment with the demands of 6G networks for intelligence and heterogeneous integration, addressing challenges in hardware deployment, real-time requirements, and dataset adaptability.

2. Likelihood-Based Modulation Recognition Methods

Likelihood-based (LB) methods represent a category of modulation recognition approaches that calculate the likelihood ratio between the received signal and known modulation signals (as illustrated in Figure 2). While these methods are typically optimal in a Bayesian sense, they suffer from high computational complexity. Although LB methods were proposed earlier and have been extensively studied in traditional wireless-signal modulation recognition, their application in the domain of VLC modulation recognition remains relatively limited. Below, we provide a brief introduction to several primary LB methods mentioned in [15,16].

2.1. Signal Model

The complex baseband form of the received signal in an additive white Gaussian noise (AWGN) channel can be expressed as:

r (t) = s (t) + n (t),

(1)

where

s (t)

is the modulated signal, and

n (t) \sim C N (0, σ^{2})

represents the noise. After discretization, the received signal for the

n

-th symbol is:

r_{n} = s_{n} (u_{i}) + n_{n},

(2)

here,

u_{i}

represents the set of parameters under the modulation hypothesis

H_{i}

, including amplitude, phase, symbol timing, etc. Under the hypothesis

H_{i}

, the conditional probability density function (PDF) of the received signal is:

p (r_{n}| H_{i}, u_{i}) = \frac{1}{π σ^{2}} \exp (- \frac{{|r_{n} - s_{n} (u_{i})|}^{2}}{σ^{2}}),

(3)

where

σ^{2}

is the noise power, and

{|r_{n} - s_{n} (u_{i})|}^{2}

represents the squared Euclidean distance between the received symbol

r_{n}

and the hypothesized signal

s_{n} (u_{i})

. For

N

independent observed samples, the joint likelihood function is:

Γ (R| H_{i}) = \prod_{n = 0}^{N - 1} p (r_{n}| H_{i}, u_{i}),

(4)

where

R

denotes the set of received signal samples, i.e.,

R = {r_{0}, r_{1}, \dots, r_{N - 1}}

.

2.2. Average Likelihood Ratio Test (ALRT)

When the parameter

u_{i}

is a random variable with a known distribution, ALRT constructs the likelihood function by taking the statistical average over the unknown parameters:

Γ_{ALRT} (H_{i}) = \prod_{n = 0}^{N - 1} E_{u_{i}} [Γ (r_{n}| H_{i}, u_{i})],

(5)

here,

E_{u_{i}} [\cdot]

represents the expectation operation over the random parameter

u_{i}

. For example, when the carrier phase

θ_{c}

(uniformly distributed in

[- π, π]

) is unknown, ALRT integrates over the phase:

Γ_{ALRT} = \frac{1}{2 π} \int_{- π}^{π} \prod_{n = 0}^{N - 1} p (r_{n}| H_{i}, θ_{c}) d θ_{c} .

(6)

Assume that the symbols are independent and uniformly distributed, the ALRT expression for M-PSK signals in [16] is simplified as:

Γ (R| H_{i}) = E_{s_{n}^{k, i} |_{n = 0}^{N - 1}} [I_{0} (\sum_{n = 0}^{N - 1} \frac{2 |r_{n}^{*} s_{n}^{k, i}|}{σ^{2}}) \times \exp (- \sum_{n = 0}^{N - 1} \frac{{|s_{n}^{k, i}|}^{2}}{σ^{2}})],

(7)

where

I_{0} (\cdot)

is the zeroth-order modified Bessel function, and

s_{n}^{k, i}

is the theoretical value of the

k

-th candidate symbol under the modulation hypothesis

H_{i}

. The ALRT method addresses parameter uncertainty by statistically averaging received signals under various modulation assumptions, theoretically offering high recognition accuracy. However, its computational complexity increases sharply with the dimensionality of parameters, posing a significant challenge in practical applications. This complexity is exacerbated in VLC due to the effects of LED nonlinearity and multipath scattering, further limiting the practical applicability of the ALRT method.

2.3. Generalized Likelihood Ratio Test (GLRT)

GLRT treats unknown parameters as deterministic quantities and reduces computational complexity by performing maximum likelihood estimation on a symbol-by-symbol basis:

Γ_{GLRT} = \prod_{n = 0}^{N - 1} \max_{u_{i}} Γ (r_{n}| H_{i}, u_{i}),

(8)

where

\underset{u_{i}}{m a x}

represents the maximum likelihood estimation (MLE) of the parameter

u_{i}

. This method is more efficient in dealing with unknown parameters but may introduce bias for some nested modulation types.

2.4. Hybrid Likelihood Ratio Test (HLRT)

HLRT combines the advantages of ALRT and GLRT to address the bias issue associated with GLRT for nested modulation types. HLRT treats some parameters as deterministic and processes them using maximum likelihood estimation (MLE), while statistically averaging the remaining parameters. For example, in the context of OFDM-IM modulation classification [17], the active subcarrier index

I_{g}

is estimated first, and then the symbols are averaged. The decision metric can be simplified as:

Γ_{HLRT} = E_{S_{g}} [Γ (R| H_{i}, \hat{I_{g}}, S_{g})],

(9)

where

R

represents the multi-dimensional matrix of received signals, and

Γ (R| H_{i}, \hat{I_{g}}, S_{g})

represents the joint likelihood function given the modulation hypothesis

H_{i}

, the estimated active subcarrier index

\hat{I_{g}}

, and the symbols

S_{g}

. This is merely a highly simplified expression for a brief introduction. The detailed reasoning formulas can be found in [17]. However, the complexity of the HLRT method remains high, especially when multiple parameters need to be addressed simultaneously. Additionally, the HLRT method has a high dependence on the initial parameter estimates; inaccuracies in these estimates can adversely affect the final recognition results. Likelihood-based methods achieve high-precision classification through rigorous statistical modeling. Although they are optimal in the Bayesian sense, they face challenges in computational complexity and parameter estimation, limiting their practical application.

2.5. Summary of Section 2

This chapter systematically elaborates on likelihood-based (LB) modulation recognition methods, focusing on their theoretical framework and essential algorithm characteristics. Grounded in Bayesian statistical theory, these methods classify signals by constructing likelihood functions of the received signal under different modulation hypotheses. The theoretical model assumes signal transmission over an AWGN channel, modeling the received signal as a linear superposition of modulation symbols and noise, and deriving the joint likelihood function based on conditional probability density functions. The LB method provides an important theoretical basis for dealing with parameter uncertainty. Despite being theoretically optimal within the Bayesian framework, their practical application is limited by computational complexity and the accuracy of parameter estimation.

3. Feature-Based Modulation Recognition Methods

Feature-based (FB) methods focus on extracting representative features from the received signal, which are then used by a classifier to determine the modulation type (as illustrated in Figure 3). These methods typically exhibit lower computational complexity, making them suitable for practical applications. In modulation format recognition tasks, the emphasis is on extracting time-domain and frequency-domain features, along with higher-order cumulants, moments, and instantaneous features from the signal. These features capture the characteristics of signals across different modulation formats and serve as a foundation for subsequent classification [18].

3.1. Higher-Order Statistics (HOS) Features

Higher-order statistics (HOS) address the limitations of traditional second-order statistics in VLC systems, which are impacted by LED nonlinearity and channel noise, and exhibit insufficient discrimination for complex modulation schemes such as 16-QAM and 16-PAM.

In 2017, H. Ren et al. [19] applied fourth-order cumulants to ACO-OFDM VLC systems. Since the theoretical value of the fourth-order cumulant of Gaussian noise is zero, while the cumulants of non-Gaussian modulated signals are unique, this method can effectively suppress noise and distinguish different modulation formats. The mathematical essence of higher-order cumulants can be derived through the moment-generating function. For a zero-mean random process

y [n]

, the definition of the fourth-order cumulant is given by:

C_{4 k} = E [y^{4} [n]] - 3 {(E [y^{2} [n]])}^{2} (k = 0),

(10)

where

E [\cdot]

denotes the expectation operation. For discrete signals, the practical estimation form is:

\hat{C_{40}} = \frac{1}{N} \sum_{n = 0}^{N - 1} y^{4} [n] - 3 {(\frac{1}{N} \sum_{n = 0}^{N - 1} y^{2} [n])}^{2} .

(11)

Similarly, the definitions of the one-lag and two-lag fourth-order cumulants are:

\hat{C_{41}} = \frac{1}{N} \sum_{n = 0}^{N - 1} y^{3} [n] y^{*} [n] - 3 \hat{C_{20}} \hat{C_{21}},

(12)

\hat{C_{42}} = \frac{1}{N} \sum_{n = 0}^{N - 1} {|y [n]|}^{4} - {|\hat{C_{20}}|}^{2} - 2 \hat{C_{21}^{2}},

(13)

where

\hat{C_{20}}

and

\hat{C_{21}}

are the estimates of the second-order moments. The authors collected four modulation formats, namely 4/16-PAM and 4/16-QAM, in an ACO-OFDM VLC system. By leveraging the asymmetric clipping characteristic of the ACO-OFDM signal, there is an essential difference in the cumulants of QAM signals compared to PAM signals, which stems from the original signal’s real/imaginary parts. The normalized cumulants

\tilde{C_{4 i}}

of different-order PAM signals exhibit separable statistical boundaries. By testing classification thresholds through Monte Carlo simulations, a correct recognition rate of 88.9% can be achieved with 2000 samples when the SNR exceeds 15 dB. However, performance drops sharply when the SNR falls below 5 dB due to noise dominance (as shown in Figure 4). This study confirms that HOS possess both noise robustness and modulation sensitivity. However, the high computational complexity of HOS limits its feasibility in dynamic real-time applications.

3.2. Constellation Diagram Features

The constellation diagram is a vital tool in digital communication systems for representing the characteristics of modulated signals. It illustrates the geometric features of modulation symbols in the complex plane, with each symbol corresponding to a point defined by its in-phase (I) and quadrature (Q) components. Different modulation formats exhibit significant variations: low-order modulations such as 4-QAM feature sparse constellation points, while high-order modulations like 64-QAM display densely packed points. Channel noise causes the constellation points to spread out, with the degree of spread being negatively correlated with the SNR. Consequently, the constellation diagram is useful not only for modulation recognition but also for assessing channel quality.

In 2020, He Jing et al. [20] proposed a method for the efficient recognition of M-QAM signals (M = 4, 16, 32, 64) in OFDM-VLC systems based on the geometric and statistical properties of constellation diagrams. After normalizing the received signal, they performed clustering analysis on the symbols using ideal M-QAM constellation templates. They calculated the Euclidean distance between each symbol point and the cluster centers of the templates and classified the symbols into the nearest cluster, thereby matching the ideal constellation distribution. This process can be expressed as:

d (x_{i}, c_{j}) = \sqrt{{(x_{1} - c_{1}^{(j)})}^{2} + {(x_{2} - c_{2}^{(j)})}^{2}},

(14)

where

(x_{i} = (x_{1}, x_{2}))

represents the coordinates of the received symbol, and

(c_{j} = (c_{1}^{(j)}, c_{2}^{(j)}))

is the center of the

j

-th cluster. After clustering, they established a two-dimensional Gaussian model to extract feature parameters: for low-order modulation (4-QAM), they used the mean of the high-probability region

T_{2}

; for high-order modulations, they selected the means of the top 75% high-probability clusters

T_{4}

,

T_{5}

, and

T_{6}

. They constructed a multi-level decision tree: if

T_{2}

is significantly high, it is determined to be 4-QAM; comparing

T_{4}

and

T_{6}

distinguishes between 16-QAM and 64-QAM. This method achieved 100% recognition of 4-QAM at OSNR = 4 dB and the same precision for 64-QAM at 18 dB, reducing the requirement by 1–2 dB compared to traditional unsupervised clustering method in reference [15], thus verifying its advantage at low SNR. However, constellation diagrams have limited representation of the signal’s complex features, particularly in nonlinear channels and environments with significant background noise, where distortion in constellation point distribution can hinder recognition accuracy. Additionally, constellation diagrams cannot differentiate between all VLC modulation formats; for example, they are unable to distinguish time-domain modulation methods such as pulse width modulation or pulse position modulation (PWM/PPM).

3.3. Wavelet Transform

To address the issue of decreased modulation recognition accuracy in optical communication systems under time-varying noise conditions, Xiong et al. [21] proposed an intelligent modulation recognition algorithm based on the fusion of wavelet transform and pattern recognition to meet the real-time classification needs of three typical modulation formats: MPSK, MQAM, and MAPSK. The algorithm significantly improves recognition efficiency under low-SNR conditions through multi-domain feature extraction and the collaborative optimization of data-driven classifiers. In the signal preprocessing stage, considering the characteristics of time-varying noise, the received signal is subjected to multi-scale decomposition using the discrete wavelet transform (DWT). The discrete wavelet basis function is constructed as follows:

ψ_{j, k} (t) = a_{0}^{- j / 2} ψ (a_{0}^{- j} t - k b_{0}),

(15)

where

ψ (t)

is the mother wavelet function,

a_{0}

is the scale discretization base, j is the decomposition level controlling the dilation of the wavelet,

b_{0}

is the translation step,

k

is the translation parameter, and

a_{0}^{- j / 2}

is the normalization factor ensuring the energy consistency of the wavelet basis functions across different scales. The received signal is analyzed for time-frequency localization, combined with a threshold processing function based on noise statistics

\hat{x} = T_{h} (ω_{j, k}, δ),

(16)

to achieve the nonlinear suppression of high-frequency noise components. Here,

δ

is the selected threshold based on noise statistical characteristics, and

ω_{j, k}

is the wavelet coefficient corresponding to decomposition level

j

and translation parameter

k

. Compared to traditional Fourier transforms, this method significantly enhances the processing capability for non-stationary signals.

In terms of feature extraction, information from the time domain and frequency domain is integrated. The maximum amplitude spectral density is extracted as the pivotal feature, while the normalized peak spacing is amalgamated to augment the separability of modulation modes. For the classifier design, data-mining techniques are leveraged to train classification models through supervised learning, facilitating the rapid differentiation of high-order modulated signals. The wavelet transformation method offers significant advantages in handling time-varying noise and non-stationary signals, effectively extracting multi-domain features and enhancing recognition accuracy. However, the choice of wavelet basis functions and the determination of decomposition levels significantly impact the final performance, necessitating optimization based on specific application scenarios and classification tasks.

3.4. Integral Feature Extraction

In VLC systems, the L-PPM modulation technique is characterized by high energy efficiency. However, the dynamic recognition of its modulation order remains a challenge that has not been effectively addressed. Traditional receivers require a preset fixed L value, which struggles to meet the transmission optimization demands under dynamic channel conditions. In 2022, Tanyildizl et al. [22] proposed a machine learning classification framework based on integral feature extraction, which for the first time realized the real-time recognition of multi-order PPM signals. The core innovation of this research lies in the time-domain integral feature-extraction method: targeting the differences in time slot widths of different L-PPM signals. The time slot of 2-PPM is

T_{2 - P P M} = T_{b} / 2

, while that of 16-PPM is

T_{16 - P P M} = T_{b} / 4

. An integral window of

12 T

, which is the least common multiple of the periods, is used to calculate the energy accumulation value of the received signal. As the L value increases, the corresponding integral result shows a stepwise decay. For example, the theoretical integral values of 2-PPM and 16-PPM are 6

A T

and 0.75

A T

, respectively. This method suppresses Gaussian noise through energy accumulation while enhancing the feature distinguishability of different L values.

Experiments compared the performance of four classifiers (as shown in Figure 5). The linear model (LM), which classifies based on geometric distance thresholds, is computationally efficient but its accuracy drops to 77.25% at long distances (3 m). The decision tree (DT) constructs a tree structure using the natural boundary points of features, achieving an accuracy of 97.85% at short distances (2.20 m), but it is sensitive to feature overlap at long distances. The k-nearest neighbors (KNN) method, which employs dynamic neighborhood search, performs best at medium distances (2.32 m), maintaining an accuracy of 79.99% at 3 m. The support vector machine (SVM), which maximizes the classification margin through the hyperplane

y = w^{T} x + b

, achieves comparable performance to LM at medium distances (2.25–2.86 m), but the complexity of kernel function tuning affects real-time performance. Moreover, this method is highly dependent on the time-slot structure of the signal. The inaccurate estimation of time-slot width during actual hardware deployment may negatively affect recognition performance.

3.5. Chaotic Mapping and Autocorrelation Estimation

Mohamed et al. [23] proposed a lightweight MFR scheme for optical wireless communication (OWC) systems based on the concept of cancellable biometrics, which integrates chaotic Baker mapping (CBM), wavelet image fusion, and autocorrelation estimation to efficiently classify eight modulation formats (including 2/4/8/16-PSK and 8/16/32/64-QAM). The method first uses CBM to nonlinearly shuffle the constellation diagram, generating an encrypted template by randomizing the spatial distribution of constellation points, which effectively reduces the impact of background noise.

Algorithm 1 outlines the implementation process of the CBM:

Algorithm 1: Chaotic Baker Map Permutation for Constellation Diagram Pixels

Input:

C D

: Constellation diagram (

N \times N

matrix)

N

: Image size (integer)

n_{i}

: Partition sequence (array of integers where

N = \sum n_{i}

)
Output:

C D_{p e r m u t e d}

: Permuted constellation diagram (

N \times N

matrix)
Procedure:
//Initialize

C D_{p e r m u t e d}

:

1:: Create an empty $N \times N$ matrix $C D_{p e r m u t e d}$ to store the permuted pixel values.

// Partition and permutation:

2:: $I = length (n_{i})$ // Total number of partitions
3:: $N_{i} = 0$ // Cumulative partition start index
4:: for each partition $i$ in $1$ to $I$ do:
5:: $n_{i} = n_{i} [i - 1]$ // Current partition width

// Iterate over partition columns

6:: for $r$ from $N_{i}$ to $N_{i} + n_{i} - 1$ do:
7:: for $s$ from $0$ to $N - 1$ do:

// Compute CBM permuted pixel coordinates

8:: $r^{'} = \frac{N}{n_{i}} (r - N_{i}) + s m o d (\frac{N}{n_{i}})$
9:: $s^{'} = \frac{n_{i}}{N} (s - s m o d (\frac{N}{n_{i}})) + N_{i}$
10:: $C D_{p e r m u t e d} [r^{'}] [s^{'}] = C D [r] [s]$ // Assign permuted pixel
11:: end for
12:: end for
13:: $N_{i} = N_{i} + n_{i}$ // Update cumulative partition index
14:: end for
15:
16:: Return $C D_{p e r m u t e d}$ as the output. // Return permuted constellation diagram

The CBM is a nonlinear pixel permutation algorithm based on chaotic dynamics. Its core principle involves dynamically rearranging the pixel positions of an original constellation diagram through iterative chaotic sequences, thereby constructing an encrypted constellation space with high complexity. During this process, the sensitivity of the chaotic system to initial keys ensures that even slight parameter variations can generate entirely distinct permutation patterns, thus increasing the projection distances between adjacent constellation points in the encrypted space. This nonlinear property introduces noise-like pixel distributions, causing encrypted constellation diagrams of different modulation formats to exhibit significantly differentiated texture patterns in the image domain. This effectively addresses the challenge of overlapping modulation format features under low-SNR conditions.

Subsequently, wavelet-domain image fusion technology is employed to merge the shuffled constellation diagrams of five adjacent SNR levels into a single reference template to reduce storage overhead. After this processing step, the constellation diagrams demonstrate sufficient feature separability. The pixel values in these diagrams are not completely random; rather, they exhibit certain statistical characteristics. Consequently, during the classification stage, the normalized correlation coefficient between the shuffled diagram of the test signal and the reference template is calculated as follows:

C_{p q} = E [\frac{(p_{i} - μ_{p}) (q_{i} - μ_{q})}{σ_{p} σ_{q}}],

(17)

where

p_{i}

and

q_{i}

are the values of the

i

-th pixel in images

p

and

q

, respectively.

μ_{p}

and

μ_{q}

are the mean values of the pixels in images

p

and

q

.

σ_{p}

and

σ_{q}

are the standard deviations of the pixel values in images

p

and

q

, measuring the degree of dispersion of the pixel values. Experiments were conducted to evaluate the classification performance of this method across a SNR range of 5–30 dB, encompassing both low and high SNR scenarios. The results demonstrate that this method achieves an area under the receiver operating characteristic curve (AROC) of 1 at an SNR of 5 dB. This indicates that the model can perfectly distinguish between positive and negative cases; specifically, for any positive case, its score is higher than that of any negative case. The equal error rate (EER) is below 0.01, and the method exhibits robustness against phase noise. By enhancing inter-class differences through nonlinear projection, this scheme provides a practical low-power modulation recognition solution for OWC systems with lower complexity than traditional deep learning models. However, the randomness of chaotic mapping may lead to fluctuations in results. In dynamic time-varying channels, distortions such as varying degrees of rotation and phase shifts in the original constellation diagram can also be amplified by chaotic mapping, resulting in misclassification. Additionally, when handling new modulation formats, this method may require the retraining or adjustment of reference templates, potentially increasing maintenance costs.

3.6. Frequency-Domain Histogram

In underwater optical wireless communication (UOWC) systems, dynamic channel conditions pose challenges for the demodulation of orthogonal frequency division multiplexing index modulation (OFDM-IM). The difficulty lies in the need for the receiver to identify the number of activated subcarriers

l

within a sub-block in real-time to complete the joint demodulation of index bits and constellation symbols. Zhang et al. [24] proposed an intelligent index recognition scheme based on the histogram of the frequency-domain signal. By analyzing the amplitude distribution characteristics of OFDM-IM signals in the frequency domain under different

l

values, they found that the difference in the number of activated subcarriers significantly changes the morphological features of the signal histogram: when

l = 1

, the amplitude is concentrated in the low-order range, and when

l = 4

(i.e., the traditional OFDM case), it shows a uniform distribution. The researchers normalized the amplitude of the received signal after fast Fourier transform (FFT) and frequency-domain equalization (FDE) as follows:

\hat{X_{i}^{z}} = \frac{|X_{i}^{z}| - {|X^{z}|}_{m i n}}{{|X^{z}|}_{m a x} - {|X^{z}|}_{m i n}}, i = 1,2, \dots, N,

(18)

where

X_{i}^{z}

represents the

i

-th frequency-domain signal sample after FFT and FDE processing.

z

is the index of the OFDM symbol, and

i

is the sequence number of the subcarrier. They then generated an

H

-dimensional histogram feature vector by dividing the amplitude range into

H

intervals and used machine learning and deep learning algorithms to classify the

l

values.

Experiments showed that the four schemes could effectively distinguish the activation patterns of

l \in 1,2, 3,4

in

n = 4

sub-blocks in a 2 m underwater channel environment. The recognition performance was jointly affected by the number of histogram intervals

H

and the cumulative number of symbols

K

. As shown in Figure 6, the k-nearest neighbors (k-NN) algorithm demonstrates significant strengths. When applied to BPSK, 4QAM, and 8QAM modulations, it achieves recognition accuracy close to 100% at SNRs of 8.6 dB, 9.1 dB, and 13 dB, respectively. Its majority voting mechanism exhibits strong robustness against noise distribution variations. In contrast, the decision tree (DT) algorithm achieves the lowest accuracy due to its insufficient sensitivity to features with continuously varying amplitudes. The accuracy curves of support vector machine (SVM) and convolutional neural network (CNN) closely overlap, suggesting that deep feature extraction does not confer additional performance gains given the limited dimensionality of histogram features. This observation implies that traditional machine learning algorithms may be more practically applicable for such low-dimensional feature classification tasks. This method is sensitive to the number of histogram intervals and symbols. Insufficient partitioning of amplitude ranges may result in the loss of classification feature details, while excessive partitioning increases sensitivity to noise and computational complexity. Furthermore, the normalization process in this method assumes stable channel gain; however, fluctuations in light intensity in practical UOWC can lead to distortion in the normalized amplitude distribution.

3.7. Summary of Section 3

Feature-based (FB) modulation recognition methods construct classification criteria by extracting multi-dimensional features from signals, effectively balancing the reduction in computational complexity with noise robustness. These methods are widely utilized in VLC and UOWC scenarios. The core concept involves mining the distinguishing features of modulated signals from various domains—such as the time domain, frequency domain, statistical properties, and geometric distribution—and integrating them with classifiers to achieve efficient recognition. This chapter focuses on six types of feature-extraction techniques (as presented in Table 1), each characterized by distinct application scenarios and performance profiles, necessitating trade-offs based on channel conditions and system requirements. However, these methods also have limitations; for instance, their feature distinguishability often relies on a high SNR and is sensitive to channel conditions. Moreover, manual feature extraction depends heavily on prior knowledge, making it challenging to adapt to new modulation formats or non-stationary channels. This increases potential maintenance costs for future large-scale hardware deployments.

Future work should continue to explore multi-domain feature fusion and adaptive extraction, investigate algorithms such as self-attention mechanisms to dynamically allocate feature weights and reduce reliance on manual design, retain the interpretability of traditional features while leveraging collaborative optimization between deep learning and feature engineering, and focus on feature enhancement under complex channel conditions, integrating channel estimation to jointly optimize classification performance.

4. Deep Learning-Based Modulation Recognition Methods

In the task of modulation format recognition, extracting unique features from received signals across different modulation formats is essential for achieving high-precision classification. Traditional feature-based methods focus on manually designing interpretable features. They enhance classification performance through optimized feature representation. Simple classifiers such as decision trees are commonly used. In contrast, deep learning-based methods, as shown in Figure 7, focus on automatic feature learning driven by the model. Automatic recognition is achieved through innovations in neural network architectures. Preprocessing steps, such as constellation diagram generation and processing, serve mainly as auxiliary means for data adaptation.

4.1. Convolutional Neural Networks (CNN)

In the field of MFR, phase and frequency offsets pose significant challenges that can degrade classification performance. To address this, Jin et al. [25] proposed a CNN-based blind MFR method. Their approach achieved the high-precision recognition of five modulation formats (BPSK, QPSK, 8PSK, 16QAM, and 64QAM) through two key innovations. First, they constructed multi-dimensional signal feature inputs. Second, they implemented an offset-resistant training strategy. This combination enables the method to maintain strong performance even under severe phase- and frequency-offset conditions. To overcome the limitations of traditional manual feature extraction, the researchers decomposed the received signal into four channels: real part, imaginary part, amplitude, and phase. This decomposition generates a feature tensor of dimension

N \times 4

. The input can be represented as:

X = [R (r (n)), I (r (n)), |r (n)|, ∠ r (n)],

(19)

where the signal length

N

is 128. The explicitly introduced phase component

∠ r (n)

enables the network to directly perceive the continuous phase-shift features caused by frequency offsets, while the amplitude component

|r (n)|

retains the signal envelope characteristics. For network architecture, a seven-layer convolutional network was designed with 3 × F_I × F_O convolutional kernels and configurable channel dimensions. Skip connections were incorporated to reuse features, forming a residual learning structure. This structure mitigates the vanishing gradient problem through cross-layer identity mapping, enabling the network to fuse shallow details with deep semantic features during training. Simulation experiments show that the proposed method outperforms traditional techniques (e.g., order statistics and HOC) and existing shallow CNN models [26], especially under phase/frequency offsets. This advantage stems from three factors: the deeper network architecture, the inclusion of amplitude/phase input features, and the use of a dataset with diverse channel perturbations. The multidimensional fusion strategy employed by this method serves as a significant reference for optical-domain signal processing.

In discrete multitone modulation visible light communication (DMT-VLC) systems, the dynamic allocation of subcarrier modulation formats is essential due to time-varying channels and bandwidth limitations. To tackle the challenges of real-time operation and data scarcity in modulation recognition under short-time channel conditions, reference [27] proposed a lightweight deep learning classification method based on pseudo-constellation diagram generation. By generating pseudo-constellation images, this method achieves high-precision classification using only tens of received symbols. This approach effectively solves the challenges of real-time processing and data scarcity in modulation recognition. Assuming the received symbol sequence is

{s_{1}, s_{2}, \dots, s_{N}}

, pseudo-symbols are generated by calculating the differential vectors of all symbol pairs:

s_{i, j}^{pseudo} = s_{i} - s_{j} (i \neq j; i, j = 1,2, \dots, N) .

(20)

This operation expands the number of original symbols to

N (N - 1)

, forming dense pseudo-constellation points to enhance the features of sparse data. After normalization, these pseudo-constellation points are mapped to grayscale images, which are subsequently transformed into RGB three-channel images using multi-scale Gaussian filtering. These images serve as inputs to the GoogLeNet V3 network, allowing for the extraction of spatial features at different scales. Experiments demonstrate that the proposed scheme achieves a classification accuracy of 98% for six modulation formats, ranging from 2-QAM (BPSK) to 64-QAM. Under the bit error rate (BER) threshold, the modulation format of each DMT subcarrier can be classified with 100% accuracy using only 75 received symbols. Furthermore, the accuracy of the proposed scheme is not only higher but also more stable compared to the cumulant-based method, while requiring fewer received symbols.

Bidaa Mortada et al. [28] addressed the challenge of blind recognition of high-order modulations in optical wireless communication by introducing fan-beam projection (FBP) from computed tomography (CT) imaging technology as a feature enhancement method. This technique transforms the two-dimensional constellation diagram into high-discrimination projection features, effectively overcoming the classification failures of traditional methods under low optical signal-to-noise ratio (OSNR < 10 dB) conditions due to histogram similarity. The FBP performs multi-angle line integrals using diverging ray clusters to compute projection data of constellation diagram image matrices, extracting more distinguishable features.

The geometric principle of FBP is illustrated in Figure 8. The rotation angle of the light source is characterized by

β

, which determines its offset relative to the initial position. The direction of each projection ray is precisely defined by the coordinate pair

(σ, β)

, where

σ

represents the angle between the ray and the central axis. The fan-beam (FB) projection transforms the ray coordinates

(σ, β)

into polar coordinates

(s, θ)

using the following formulas:

s = D s i n σ,

(21)

θ = σ + β,

(22)

where

D

is the distance from the projection source to the image center. This transformation normalizes the ray coordinates to polar form. By rotating the angle

β

from

0

to

2 π

, multi-angle projection data are obtained, forming a feature vector. This process transforms the constellation image, significantly enhancing inter-class separability.

The researchers created a simulation dataset with eight modulation formats (2/4/8/16-PSK and 8/16/32/64-QAM) across an OSNR range of 5–30 dB. During preprocessing, constellation diagrams were converted to grayscale, and FBP was used to generate feature maps. These maps were then fed into networks like AlexNet and VGG for training. Experimental results demonstrated that FBP enabled AlexNet to achieve a full-format recognition rate of 100% at an OSNR of ≥15 dB, thereby reducing the required OSNR for classification by 5–8 dB compared to the original constellation diagram, highlighting the strong noise robustness of the FB features. This work, through interdisciplinary technological integration, established a mapping mechanism between FBP and constellation diagram feature enhancement, providing a low-complexity, noise-insensitive solution for adaptive modulation systems.

In 2022, Gu et al. [29] tackled the challenge of modulation recognition in the dynamic channel environment of satellite-to-ground free-space optical communication (FSO) by proposing a blind recognition framework based on CNN. This study moved away from the traditional reliance on prior channel information, transforming the received complex signal into two-dimensional I/Q data as input to simulate an image-processing mechanism for end-to-end classification. To enhance the model’s generalization capability, the Gamma–Gamma channel model was employed to accurately simulate the effects of weak, moderate, and strong turbulence. Training data were generated by randomly combining channel parameters within an SNR range of 10–30 dB, compelling the network to learn common features across various scenarios. The network architecture consisted of a six-layer convolutional block structure, which balanced feature abstraction and detail retention by utilizing max pooling and average pooling. In strong turbulence scenarios, robustness was enhanced by integrating average pooling and batch normalization techniques, with a dropout rate of 0.6. Experiments demonstrated that this approach achieved a recognition rate of 99.98% under strong turbulence channel conditions.

Gao et al. [30] addressed the limitations of traditional coordinate feature representation and the decline in recognition performance under low-SNR conditions by introducing a novel method for constructing a four-dimensional composite feature matrix that combines I/Q components and polar coordinates (

A / θ

) within the VLC scenario. This approach aims to overcome the recognition bottleneck encountered in low-SNR environments. Experimental verification shows that this multi-domain feature fusion enables CNNs to simultaneously capture the Cartesian spatial distribution patterns of QAM-type modulations, which rely on I/Q components, as well as the phase-ring characteristics of PSK/APSK-type modulations, which depend on the

θ

component. The coordinate fusion training method proposed in this paper effectively identifies 16th-order signals. Under high-SNR conditions, the MFR accuracy can reach 98.3%. Even under extremely poor SNR conditions, this method can still distinguish high-order signals, achieving an accuracy improvement of over 10% compared to traditional IQ signal training models.

Wang et al. [31] proposed a modulation format recognition algorithm based on the improved YOLOv5s to address the decreased recognition accuracy caused by background noise and multipath fading interference in complex channel environments. The research team integrated Mixup and Mosaic data augmentation methods at the input stage, thereby enhancing the model’s generalization ability against noise interference and signal distortion through linear mixing of time-frequency diagrams and multi-image stitching techniques. Additionally, an adaptive spatial feature fusion (ASFF) mechanism was introduced in the feature fusion layer, optimizing the integration of multi-scale features through dynamic weight allocation and mitigating the issue of feature redundancy commonly encountered in traditional feature pyramid networks. The enhanced algorithm achieved a mean average precision (mAP@0.5) of 0.903 under mixed-SNR conditions ranging from 0 to 20 dB, reflecting a 0.7% improvement over the original YOLOv5s. At 20 dB high SNR, it reached an mAP of 0.993, all while maintaining a model parameter size of 24.17 MB, which represents a 79.3% reduction compared to YOLOv3. This study effectively enhanced the algorithm’s robustness under mixed-SNR conditions through the use of bimodal data augmentation and adaptive feature fusion strategies. However, while the Mixup and Mosaic data augmentation methods in this reference enhance the model’s robustness on the current dataset, they may not completely account for all noise and interference patterns present in real-world channel conditions.

In vehicular visible light communication (V-VLC) scenarios, dynamic traffic environments, and complex channel conditions present significant challenges for modulation format recognition. Arafa et al. [32] proposed a fusion scheme that integrates the Hough transform with deep learning techniques. Focusing on the infrastructure-to-vehicle (I2V) communication scenario, the study constructed a three-dimensional road model using non-sequential ray tracing and developed a channel model that incorporates weather attenuation effects. Innovatively, the Hough transform was applied to constellation diagram feature extraction through polar coordinate transformation, represented by the equation:

ρ = x \cos θ + y \sin θ .

(23)

This transformation projects the modulated signal constellation into the (

ρ, θ

) parameter space, enhancing the amplitude and phase features of modulation formats such as QPSK and QAM. Experimental results demonstrated that by utilizing the Hough transform in conjunction with a pre-trained AlexNet classifier, the classification accuracy for eight modulation formats (including 4/8/16/32/64-QAM and Q/8/16-PSK) reached 100% under clear weather conditions with an SNR of ≥11 dB. Notably, the required SNR for achieving 100% recognition of high-order modulations (e.g., 64-QAM) was significantly lower than the 18 dB needed for clustering analysis in reference [20]. In foggy conditions, only a 12 dB SNR threshold was necessary for 100% recognition, and when the communication distance was extended to 100 m, AlexNet maintained a classification accuracy of approximately 93%. This approach, through efficient feature extraction combined with deep learning, provides a highly robust modulation recognition framework for dynamic vehicular VLC systems.

4.2. Recurrent Neural Networks (RNN)

Recurrent neural networks (RNNs) are specifically designed for processing sequential data, allowing them to remember information from sequences through recurrent units (hidden units) and pass it to subsequent time steps, effectively capturing temporal dependencies. However, RNNs encounter challenges related to vanishing and exploding gradients during training, which limit their capacity to learn long-range dependencies. To overcome these limitations, researchers have introduced improved architectures such as long short-term memory (LSTM) and gated recurrent unit (GRU), which incorporate gating mechanisms to better control the flow of information and enhance the modeling of long sequences.

In 2024, Gao et al. [33] addressed the inherent difficulty of distinguishing between QAM and APSK modulated signals in the Cartesian coordinate system within UVLC systems. They proposed a joint recognition algorithm based on bidirectional gated recurrent units (BiGRU) and polar coordinate transformation. The study focused on ten modulation formats, including MQAM (M = 2, 4, 8, 16, 32, 64) and MAPSK (M = 8, 16, 32, 64). The original I and Q components of the received signal were combined with the polar coordinate-transformed

ρ

,

θ

to create a four-dimensional feature vector, following similar processing methods outlined in references [25,30]. The core of the algorithm lies in the BiGRU network’s in-depth mining of temporal features. BiGRU captures the forward and backward dependencies in the signal sequence through its bidirectional gating mechanism, with the gating unit calculations as follows:

The reset gate controls the influence of historical information:

R_{t} = σ (W_{r} [x_{t}, h_{t - 1}] + b_{r}) .

(24)

The update gate determines the proportion of information to be retained:

Z_{t} = σ (W_{z} [x_{t}, h_{t - 1}] + b_{z}) .

(25)

The candidate state integrates the current input with the historical information filtered by the reset gate:

\tilde{h_{t}} = \tanh (W [x_{t}, R_{t} ⊙ h_{t - 1}] + b_{h}) .

(26)

The final hidden state is dynamically updated by the update gate:

h_{t} = Z_{t} ⊙ h_{t - 1} + (1 - Z_{t}) ⊙ \tilde{h_{t}} .

(27)

Here,

W_{r}

,

W_{z}

,

W

are the weight coefficients within the GRU module;

b_{r}

,

b_{z}

,

b_{h}

are the corresponding bias coefficients;

h_{t}

is the output of the GRU module at the current time step; the symbol

⊙

denotes the element-wise multiplication, also known as the Hadamard product. The temporal features

h_{t 1}, h_{t 2}

output by the bidirectional GRU layer are then passed through a fully connected layer and Softmax classification to recognize the ten modulation formats. Experiments showed that in the 0.2–0.4 V linear working area (BER < 3.8 × 10⁻³), the algorithm achieved an average recognition accuracy of over 96%. Additionally, the training speed was improved by 100% compared to scenarios that did not incorporate polar coordinate features.

4.3. Other Innovative Deep Learning Models

Zhang et al. [34] proposed a few-shot modulation recognition framework based on progressive growth meta-learning (PGML) to tackle the signal degradation caused by underwater turbulence and LED nonlinearity. This framework enables rapid adaptation with minimal samples through a staged training approach and a complexity-incremental mechanism. In the early stages of training, tasks from fewer batches are utilized to learn the common features associated with channel fading. Subsequently, more batches including more diverse task combinations are gradually introduced, creating a progressive knowledge path. The framework employs a double-layer optimization structure inherent to meta-learning, facilitating dynamic knowledge transfer. The inner layer quickly adjusts parameters within a single task, while the outer layer aggregates and optimizes across multiple tasks. Coupled with data compensation strategies such as rotation augmentation, the framework effectively captures essential features, including phase rotation invariance. Simulation results indicate that the framework achieves a validation accuracy of 95.63% under a low-SNR scenario of 6 dB, demonstrating good robustness against Poisson noise and nonlinear distortion. Compared to the traditional model-agnostic meta-learning (MAML) [35] method, it exhibits significant improvements and reduces the average training time per episode by 23.27%.

Zhao et al. [36] proposed an enhanced solution that introduces active learning (AL) and transfer learning (TL) to address the challenge of acquiring labeled data in VLC systems. They employed the AlexNet-AL/TL model to classify six types of MQAM (M = 2, 4, 8, 16, 32, 64) modulated signals. To tackle the difficulty of feature extraction from small sample sizes, they utilized neighborhood statistics and tricolor encoding on the constellation points of the received signal, constructing a contour stellar image that enhanced density features. Additionally, they applied random geometric augmentation with rotations ranging from −90° to 90° to increase data diversity. Using AlexNet as the base CNN model, the AL approach predicted unlabeled samples in each training round, selecting the most uncertain samples for labeling and subsequent training based on the margin sampling (MS) strategy. The TL approach involved fine-tuning a model pre-trained on ImageNet specifically for the VLC-AMC task. The results indicated that in an extremely small sample scenario with only 60 labeled samples, the introduction of AL and TL achieved accuracy improvements of 6.82% and 14.6%, respectively, compared to the traditional AlexNet model. Following the application of data augmentation techniques, the classification accuracy of the AlexNet-AL model increased from 77.45% to 88.78%. In this reference, there appears to be a significant domain shift between the source domain (ImageNet classification) and the target domain (VLC MFR), which may limit the effectiveness of transfer learning. Future research could benefit from using the RadioML series datasets. These datasets are widely utilized in RF signal modulation format recognition. They could serve as a valuable resource for pre-training.

Li et al. [37] proposed a lightweight solution based on reservoir computing (RC) to address the challenge of real-time modulation recognition in UVLC. To tackle the issue of non-distinctive features of PSK signals in the Cartesian coordinate system, they introduced a dual-domain coordinate transformation method. For the original complex signal

y (i)

, they extracted both the IQ constellation map in the Cartesian coordinate system and the joint feature map of amplitude and phase in the polar coordinate system. They then designed a folding function based on the signal-phase symmetry to iteratively compress the feature space. The algorithm retained local salient features through 3–4 foldings, reducing redundant information and highlighting local features, effectively enhancing the feature distinguishability of complex modulated signals such as 16APSK.

In terms of the model, they employed a RC structure with fixed random weights, featuring a leaky integration mechanism in the state update:

u (t) = (1 - α) u (t - 1) + α \tanh (W_{i n} x (t) + W_{r e s} u (t - 1)),

(28)

here,

u (t) \in R^{N}

represents the reservoir state vector at time step

t

, with

N

being the number of reservoir nodes;

α \in (0,1]

is the leakage rate parameter, balancing the fusion ratio of historical states and current inputs;

W_{i n}

and

W_{r e s}

are the input weights and the internal sparse connection weight matrix, respectively. The output weights

W_{o u t}

were optimized through ridge regression:

\hat{W_{o u t}} = {(U U^{T} + ϵ I)}^{- 1} (Y U^{T}),

(29)

where

U = [u (1), u (2), \dots, u (T)] \in R^{N \times T}

is the state matrix composed of all reservoir states concatenated column-wise, with

T

being the input sequence length;

Y \in R^{L \times T}

is the target output matrix, with each column corresponding to the label of the input sequence at time step

t

, and L being the total number of classes;

ϵ > 0

is the ridge regression regularization coefficient. This structure reduced the number of trainable parameters to 0.3% of that of traditional CNNs, with training time shortened to the second level. Experiments demonstrated that within the LED bias range of 0.2–1.3 V, the system achieved over 90% recognition accuracy for six modulation types (OOK, 4QAM, 8QAM-DIA, 8QAM-CIR, 16APSK, 16QAM), with the highest accuracy reaching 100%. Compared to the benchmark dropout-CNN [38] model, the proposed method shows a maximum improvement of up to 30% in recognition accuracy, along with a significant enhancement in inference speed.

In UVLC systems, the nonlinearity of LEDs poses a significant challenge for modulation recognition. Reference [39] proposed a super lightweight recognition framework based on communication-inspired knowledge distillation (CIKD), achieving breakthroughs in performance through the deep integration of physical priors and knowledge transfer. Central to this solution is the collaborative design of a dynamic feature purification mechanism and a hierarchical knowledge transfer architecture. Recognizing the strong correlation between LED nonlinearity and driving voltage, the authors constructed a learnable dynamic masking module. This module adaptively adjusts the truncation of high-amplitude distortion regions in the constellation diagram, transforming the original dense constellation into a sparse representation while preserving essential discriminative features. This physics-driven preprocessing significantly enhances the system’s resilience to nonlinear distortion. The study further developed a teacher–student collaborative distillation architecture. In this setup, the teacher model is a multi-layer CNN that extracts geometric features from the constellation diagram, while the student model consists of only a single-layer classifier to process the purified sparse features. To bridge the capacity gap between the two models, a hierarchical distillation mechanism was proposed. On one hand, intermediate-layer feature similarity preservation (ILFSP) was employed

L_{S P} = \frac{1}{b s^{2}} | G_{T} - G_{S} |_{F}^{2},

(30)

to constrain the similarity between the student model’s purified features

G_{S}

and the teacher’s intermediate features

G_{T}

, enabling the student model to replicate the teacher’s high-order feature distribution in a lower-dimensional space. On the other hand, representation vector decoupling (RVD) was introduced to decompose the teacher model’s output probabilities into target and non-target class knowledge, enhancing dark knowledge transfer through dynamically weighted KL divergence:

L_{Decouple} = L_{K L} (b^{T}, b^{S}) + (1 - p_{t}^{T}) L_{K L} (\hat{p^{T}}, \hat{p^{S}}),

(31)

here,

\hat{p^{T}}

and

\hat{p^{S}}

represent the non-target class distributions after removing the target class probability

p_{t}^{T}

, with

(1 - p_{t}^{T})

as a weight to suppress interference from high-confidence target classes, and

\hat{p^{T}}

and

\hat{p^{S}}

being the renormalized probabilities. This enhances the fine-grained discrimination ability for similar modulation formats. Ultimately, by jointly optimizing the feature alignment loss

\hat{L_{1}}

, decoupled distillation loss

\hat{L_{2}}

, and classification loss

L_{3}

:

L o s s_{C I K D} = α \hat{L_{1}} + β \hat{L_{2}} + (1 - α - β) L_{3}

(32)

The deep coupling between the communication prior module and the knowledge distillation path was effectively established. Experimental validation demonstrated that the solution exhibited strong robustness across 12 driving voltage points ranging from 100 to 1200 mV. For eight modulation formats (PAM4, QPSK, 8QAM-CIR, 8QAM-DIA, 16QAM, 16APSK, 32QAM, and 32APSK), at the optimal operating point of 800 mV, the student model achieved a recognition accuracy of 100%, representing an 87% improvement over the non-distilled baseline model. Even in the region of strong nonlinearity at 1200 mV, the accuracy remained at 70%, thereby confirming the solution’s robustness against complex interference. Moreover, the student model had only 18% of the parameters of the teacher model, with reduced inference latency, highlighting the advantage of CIKD in terms of hardware deployment efficiency.

Zheng et al. [40] proposed a two-stage processing framework called VLCMnet to address the performance degradation caused by constellation diagram distortion in modulation format recognition for indoor VLC systems. The framework initially employs a hybrid model that combines temporal convolutional networks (TCN) and LSTM networks for channel equalization. This model extracts local signal features through five layers of residual modules that include dilated and causal convolutions, effectively mitigating the gradient vanishing problem through cross-layer connections. It then retains essential temporal information using a two-layer LSTM, significantly improving the quality of constellation diagrams and enabling accurate modulation format recognition even under low SNR conditions. In the classification stage, a multi-mixed attention network (MMAnet) is designed, which serially connects single-attention (SA) modules on the channel–spatial dual path and mixed-attention (MA) modules integrated with the squeeze-and-excitation (SE) mechanism. This design enhances sensitivity to constellation point distribution and improves the model’s ability to discern edge features of higher-order modulations. Experiments utilized a dataset generated from DCO-OFDM that comprised five types of MQAM (M = 4, 8, 16, 32, 64) with SNRs ranging from −10 to 30 dB. The results indicated that VLCMnet achieved a recognition accuracy of 99.2% at a SNR of 4 dB and maintained an overall accuracy of 93.3% at a lower SNR of 0 dB, significantly surpassing the baseline CNN (78.8%), VGG16 (72.8%), and ResNet (68.4%).

4.4. Introduction to the VLC MFR Datasets

Currently, there is no comprehensive open-source dataset for MFR in VLC. Most researchers collect data using simulation software such as MATLAB (see https://www.mathworks.com accessed on 12 May 2025) and Optisystem (see https://optiwave.com accessed on 12 May 2025), or through laboratory VLC systems. However, these datasets are not publicly available, and different teams consider varying communication scenarios and channel conditions when constructing their datasets. Next, we briefly introduce some commonly used open-source datasets in the field of RF modulation recognition as references and discuss the current status of dataset construction in VLC-MFR research.

In the field of RF-MFR, several open-source datasets are widely used, including the RML2016 and RML2018 series from RadioML and HisarMod2019 (introduced in Table 2). The RML2016.10a dataset [26] simulates time-varying random channel effects (e.g., carrier frequency offset) using GNU Radio, generating 220,000 samples with 11 modulation types. The SNR ranges from −20 dB to 18 dB with a step size of 2 dB. This dataset has been widely utilized. The RML2018.01a dataset [41] is generated in a real laboratory environment with better conditions and more complex channel scenarios. It contains 24 modulation types, with a sample dimension of 2 × 1024 and a total of 2,555,904 samples. The SNR ranges from −20 dB to 30 dB. While this dataset is large-scale, it also demands more resources. The HisarMod2019.1 dataset [42] provides multiple channel conditions, including ideal, static Rayleigh, Rician, and Nakagami-m channels. It contains 26 modulation types and 780,000 samples. This dataset is suitable for studying modulation recognition performance under different channel conditions and offers a rich and diverse data resource for AMR research.

Next, we introduce some typical self-built datasets in current VLC modulation recognition research (the summary is presented in Table 3). In paper [28], Mortada B. et al. constructed an OWC system using Optisystem. They simulated eight modulation formats: 2/4/8/16-PSK and 8/16/32/64-QAM. The signals were generated at a bit rate of 10 Gbps and transmitted through a 4-km free-space optical link with an attenuation of 0.43 dB/km and an OSNR range of 5–30 dB. At the receiver, a constellation diagram analyzer estimated the constellation diagrams and converted them into grayscale images, each containing 4000 sampling points. Feature vectors were extracted using FB projection technology to form the dataset. The size of the dataset was not explicitly stated.

In paper [29], Gu Y. et al. built a simulation system using MATLAB to emulate the satellite-to-ground FSO channel conditions. They considered four modulation formats: OOK, BPSK, QPSK, and 16-QAM. For each format, 20,000 signal frames were generated, with each frame containing 1024 sampling points. The signals were transmitted through a Gamma–Gamma atmospheric channel with AWGN added at SNR values ranging from 10 to 30 dB. The dataset consisted of 80,000 frames, with 80% allocated for training and 20% for validation. An additional 16,000 test frames were generated using the same method as the training data but with a different random seed.

In paper [31], Wang Y. et al. generated a dataset using MATLAB. They considered interference factors such as AWGN, Doppler shift, Rician multipath fading, and clock offset. The study focused on four modulation formats: BPSK, QPSK, 8PSK, and 16QAM. Within the SNR range of 0–20 dB, 100 time-frequency diagrams were generated for each SNR condition at intervals of 2 dB. This resulted in a total of 4400 frames, with 3520 used for training and 880 for testing.

In paper [32], Arafa N. et al. generated signals for eight VLC modulation formats using MATLAB. These formats included QPSK, 8-PSK, 16-PSK, 4-QAM, 8-QAM, 16-QAM, 32-QAM, and 64-QAM. A 3D CAD model was constructed using OpticStudio (see https://www.ansys.com/products/optics/ansys-zemax-opticstudio accessed on 12 May 2025), incorporating a path loss model based on non-sequential ray tracing to simulate a real V-VLC scenario. Three weather conditions were considered: clear, rainy, and foggy. AWGN intensity was adjusted to simulate SNR conditions ranging from 5 to 25 dB. The received signal constellation diagrams were saved in JPEG format at 656 × 656 pixels. Features were extracted through grayscale conversion, Canny edge detection, and Hough transformation. The final dataset contained over 32,000 Hough-transformed image samples for training and testing three deep neural network models that had been pre-trained on ImageNet.

In paper [33], Gao W. et al. built a real UVLC system to collect the dataset. This dataset covered 10 modulation formats, including 2/4/8/16/32/64-QAM and 8/16/32/64-APSK. The signals were processed through upsampling, pulse-shaping filtering, and amplification by an arbitrary waveform generator and an electrical amplifier. They were then emitted by a laser diode with a peak-to-peak voltage ranging from 0.1 V to 0.55 V. The voltage range of 0.2 V to 0.4 V corresponded to the high-SNR region, 0.1 V to the low-SNR region, and above 0.5 V to the nonlinear region. The signals were transmitted through a 1.2-m water tank. The received signals were processed and converted into complex data, which were then coordinate-transformed and combined into 4D feature data to form the dataset. The length of each frame and the total size of the dataset were not specified.

In paper [34], Zhang L. et al. simulated a UOWC system using MATLAB. They constructed a channel model incorporating scattering, absorption, and turbulence effects to generate constellation diagram samples. Fifteen modulation formats were selected, covering M-ary QAM, M-ary ASK, M-ary PSK, and BPSK. Within the SNR range of 6–15 dB, with a step size of 3 dB, 1200 original constellation diagram samples were generated. The samples were converted to grayscale and resized to 28 × 28 pixels. A total of 300 samples were selected to form the dataset. The training set and validation set had different modulation formats. During the training phase, the dataset was further divided into a support set and a query set. The support set contained samples and labels of known classes for model tuning. The query set contained samples of the same classes for aggregating and optimizing parameters to achieve knowledge transfer and generalization.

Currently, research teams primarily generate datasets based on their own simulated or experimental VLC systems, which vary in architecture and channel conditions. This results in objective differences in model performance across different datasets, affecting the universality of research findings. The number and types of modulation formats considered by each team also differ, but they generally cover common formats such as PSK and QAM. Constructing a comprehensive dataset that encompasses current mainstream VLC systems and their modulation formats is a monumental task. As VLC technology advances, unique modulation formats continue to emerge, and the expansion of dataset categories will rely on the spirit of open-source collaboration among research teams.

Moving forward, we advocate for the widespread open-sourcing of VLC modulation recognition datasets. We encourage teams with comprehensive equipment and experimental systems to collect VLC-MFR signal sample datasets and hope to see these datasets made open-source. This will promote improvements in VLC-MFR technology under real-world channel conditions. Before VLC achieves a balance between performance and efficiency and is widely applied on a large scale, different system architectures will coexist objectively. Therefore, the open-sourcing of simulation datasets is also welcomed by researchers. Simulation datasets offer high flexibility in construction and can be iteratively updated quickly once open-sourced, which is conducive to forming a more complete VLC model and dataset and driving the vigorous development of VLC MFR technology.

4.5. Prospects for the Practical Hardware Deployment of VLC MFR

During the evolution of VLC technology towards practical applications, the hardware implementation of MFR technology is still in the exploratory stage. We believe this situation is primarily due to two issues: First, VLC technology has not yet formed a large-scale commercial application. Research on its hardware implementation has mostly focused on innovations in modulation schemes and the optimization of basic links, while the hardware deployment of advanced algorithms like modulation recognition has not yet attracted widespread attention. Second, the unique optoelectronic hybrid architecture and dynamic channel characteristics of VLC systems pose special requirements for hardware implementation: it needs to meet the demands of real-time signal processing while also achieving low-power design on resource-constrained mobile terminal platforms. Existing VLC hardware systems have shown a trend towards modular development. The transmitter typically adopts an integrated design of LED arrays and driving circuits, while the receiver uses a photodetector in conjunction with an analog front-end (AFE) to achieve optoelectronic conversion. Digital baseband processing is mostly accomplished using field-programmable gate arrays (FPGA) or ARM architecture processors. Such architectures, while supporting traditional modulation formats like OOK [43,44] and PPM [45,46], are gradually expanding to higher-order schemes such as OFDM [47], PAM4 [48], and QPSK [49].

The hardware implementation experience of MFR technology in the RF communication field provides important references for VLC systems. In RF scenarios, deep learning-based modulation recognition has been realized through the deployment of quantized convolutional neural networks (QMCNet) and residual networks (RUNet) on FPGAs. For example, the RUNet (6,6,6)p model with 6 bit quantized weights and iterative pruning technology achieves a classification accuracy of 94.46% and a throughput of 527 K classifications per second on the Xilinx ZCU111 FPGA platform [50]. The ternary weight quantization technique in [51], which restricts neural network parameters to {−1, 0, +1} at inference, combined with the MobileNetV3 architecture, achieves an average recognition accuracy of 90.1% on an ASIC. The authors reduced the number of multiply and accumulate (MAC) operations using a common subexpression elimination algorithm. However, the essential differences between VLC and RF systems lead to some distinctions in hardware implementation paths: the light-intensity fluctuation characteristics of VLC channels require the hardware front-end to have a stronger dynamic range adaptation capability, while the nonlinear response characteristics of LEDs necessitate the design of more complex digital predistortion techniques and feedback compensation mechanisms.

Deploying VLC-MFR technology faces multiple challenges. The primary difficulty lies in the contradiction between hardware resource constraints and algorithm complexity. FPGAs have a limited number of logic units (LUTs) and digital signal processing (DSP) modules, while deep neural networks required for high-order MFR may consume more hardware resources. Although the RUNet (6,6,6)p model [50] reduces storage requirements through weight pruning, additional hardware resources are still needed in VLC scenarios to handle channel estimation and equalization operations. The requirement for real-time performance constitutes the second challenge. Existing FPGA-based MFR implementations can achieve a classification latency of 7.5 μs under a 30 dB SNR [50], but they have not fully considered the need for resynchronization caused by sudden light changes in VLC applications. Anti-interference capability is the third major bottleneck. The superimposed interference and time-varying characteristics of natural and artificial light sources may cause a sharp drop in the SNR of the received signal, requiring hardware accelerators to dynamically adjust quantization precision and decision thresholds.

The adaptability of datasets poses the fourth challenge. Before loading the MFR model onto a hardware platform, the classification model is typically pre-trained on a computer based on a given dataset. However, during the actual deployment of VLC hardware terminals, different channel scenarios are often encountered, such as indoor and outdoor scenarios, stationary and mobile scenarios, and scenarios with little or complex occlusion. Models trained offline based on a fixed dataset may face unstable performance in these changing situations.

Power consumption optimization is particularly prominent in battery-powered portable devices. Compared with RF hardware systems, the unique dimming control function of VLC systems may add extra energy consumption costs. The lack of standardization further increases deployment difficulties. Differences in optoelectronic interface timing and modulation parameter configuration among VLC module solutions from different vendors lead to the need for recognition algorithms to be compatible with multiple physical layer specifications.

Potential solutions to the aforementioned challenges include developing FPGA architectures that support dynamic partial reconfiguration (DPR) to enable algorithm hot swapping; designing optoelectronic collaborative hybrid-precision quantization strategies that use low-precision inference for accelerated processing during stable lighting conditions and switch to high-precision mode when the channel degrades; building dedicated processing units based on RISC-V instruction set extensions to optimize convolution operations and activation function calculations through custom instructions; and promoting the formation of industry alliances to draft VLC standards that include modulation recognition interfaces, clarifying forward compatibility requirements. Additionally, promoting the application of transfer learning, active learning, and unsupervised learning techniques in the hardware implementation of VLC-MFR, developing mobile VLC-MFR model hardware implementations that can autonomously train and optimize based on real-world environments, enhancing the environmental adaptability of VLC device deployment, and reducing maintenance costs. These explorations need to form an iterative closed loop between laboratory prototyping and field testing, ultimately driving the transition of MFR technology from algorithm simulation to product deployment.

To address these challenges, several potential solutions are proposed. First, developing FPGA architectures that support dynamic partial reconfiguration (DPR) can enable algorithm hot swapping. This allows the hardware to dynamically adapt to different processing needs without requiring full reconfiguration.

Second, designing optoelectronic collaborative hybrid-precision quantization strategies is essential. These strategies would use low-precision inference for accelerated processing during stable lighting conditions and switch to high-precision mode when the channel degrades.

Moreover, promoting the formation of industry alliances to draft VLC standards that include modulation recognition interfaces is crucial. Clarifying forward-compatibility requirements will ensure that future developments remain aligned with current standards.

Additionally, promoting the application of transfer learning, active learning, and unsupervised learning in the hardware implementation of VLC-MFR can lead to the development of mobile VLC-MFR model hardware implementations. These implementations can autonomously train and optimize based on real-world environments, enhancing the environmental adaptability of VLC device deployment and reducing maintenance costs.

These explorations need to form an iterative closed loop between laboratory prototyping and field testing. This iterative process will ultimately drive the transition of MFR technology from algorithm simulation to product deployment.

4.6. Summary of Section 4

Deep learning-based modulation recognition techniques have significantly improved classification performance in complex channel environments by leveraging automatic feature extraction and nonlinear mapping capabilities. Current research primarily concentrates on three major directions: innovation in neural network architectures, the optimization of feature representation, and cross-scenario generalization. Together, these areas form a technical system centered on deep neural networks, while also integrating new paradigms such as meta-learning and knowledge distillation. The innovations in relevant studies primarily focus on aspects of feature engineering, classification models, and training strategies, as illustrated in Figure 9. The multidimensional feature decomposition method using I, Q coordinates and polar coordinates has been consistently employed in recent years. This enables deep learning models to learn unique signal representations from four dimensions of the signal sequence. Constellation diagrams and their variants are also frequently used, thanks to the IQ space features displayed in the constellation diagrams. Processing signals into image formats is also conducive to directly inputting them into the rapidly developing image classification models for modulation recognition tasks. Novel data-processing methods, such as folding algorithms, have continuously emerged. The use of neural network-based pre-equalizers has also effectively alleviated the classification difficulties under low-SNR conditions.

A summary and comparison of deep learning-based modulation recognition methods are presented in Table 4. CNNs exhibit significant advantages in the spatial feature extraction of signals, thanks to their local perception and weight-sharing mechanisms. Multi-dimensional feature fusion strategies effectively mitigate the limitations of traditional methods that rely on single-signal representations. Additionally, network designs that incorporate residual structures and attention mechanisms further enhance robustness against frequency offsets, phase noise, and other types of interference. However, the complexity of these models and their computational overhead still hinder their application in scenarios requiring high real-time performance. On the other hand, RNNs and their variants are designed to focus on modeling temporal dependencies, effectively capturing the dynamic characteristics of signals through gating mechanisms. They perform well under non-stationary channel conditions but inherently suffer from issues such as gradient vanishing and limited long-range modeling efficiency. Innovative models have sought to achieve a balance between accuracy and efficiency in specific scenarios by introducing physical priors, applying lightweight architectures, and implementing few-shot learning mechanisms. Nevertheless, further breakthroughs are still necessary in areas such as cross-domain generalization and complex interference suppression.

5. Conclusions and Outlook

As the 6G communication network evolves towards ultra-high speed and global coverage, VLC has become a key technology for building an integrated space–air–ground–sea communication system due to its ultra-high bandwidth, low power consumption, and resistance to electromagnetic interference. MFR as a core component of VLC adaptive transmission directly affects the dynamic optimization capability and scenario adaptability of the communication system. This paper systematically reviewed the research status of MFI technology, revealing the process of technological evolution and the challenges faced, from traditional likelihood ratio methods and feature-driven models to deep learning frameworks (as shown in Table 5).

5.1. Current Challenges

In promoting the practical implementation and large-scale deployment of VLC in 6G integrated scenarios, MFR technology still faces challenges in multiple aspects. These challenges cover algorithm limitations, data dependency, hardware constraints, and environmental complexity, as described below.

(1): Algorithm limitations in dynamic and non-stationary channels

Traditional likelihood-based (LB) methods, although theoretically optimal, have excessively high computational complexity when applied to VLC systems. For instance, hybrid likelihood ratio tests (HLRT) require iterative parameter estimation and statistical averaging, which become intractable under LED nonlinearity and multipath scattering effects. Feature-based (FB) methods, while less computationally intensive, show vulnerability in distinguishing modulation formats with overlapping feature distributions under low-SNR conditions (e.g., high-order QAM and APSK). Deep learning (DL) models mitigate these issues through automatic feature learning but introduce new bottlenecks: mainstream CNN and RNN models struggle to balance model depth and real-time inference latency, especially in resource-constrained edge and miniaturized devices.

(2): Data scarcity and domain-shift issues

The lack of standardized open-source datasets for VLC-MFR forces researchers to rely on simulation or laboratory-generated data under limited conditions, restricting model generalizability. Current research mostly employs narrow-range datasets simulating specific communication scenarios, which fail to capture the full spectrum of real-world variations (e.g., dynamic mobility, heterogeneous interference). When models encounter unforeseeable unfamiliar scenarios and interferences, significant performance degradation occurs. Moreover, DL models require large amounts of labeled data, which is impractical for field deployments with high labeling costs.

(3): Bottlenecks in hardware–algorithm co-design

VLC mobile terminal hardware systems prioritize low-power operation, but mainstream DL architectures impose heavy computational loads. Lightweight models, pruning operations, and quantization methods can effectively reduce the burden on hardware devices but still sacrifice some feature richness. Meanwhile, hardware imperfections (such as LED nonlinearity and photodetector response delay) unpredictably distort signal features. Additionally, synchronization errors in practical systems (e.g., symbol timing offset) have not yet been considered in simulations. Future work should combine with actual hardware for the co-design of software and hardware.

(4): Cross-scenario robustness and scalability

Most MFR solutions have been optimized for specific isolated environments (indoor, underwater, vehicular), lacking general adaptability. Meta-learning frameworks like PGML [34] have shown promise in few-shot adaptation and have considered the model’s recognition performance when facing new samples. Future research needs to continue the diversified development in this area. The lack of a unified evaluation metric system complicates cross-study comparisons, as accuracy claims vary with the SNR range.

(5): Real-time processing and energy efficiency trade-offs

Real-time MFR in dynamic channels, such as vehicular VLC, requires sub-millisecond latency, which conflicts with the computational overhead of complex feature fusion and attention mechanisms. In terms of energy efficiency optimization, existing research has not yet broken through the strong correlation between algorithm complexity and hardware power consumption. Emerging architectures such as neuromorphic computing, although theoretically possessing event-driven and ultra-low power consumption characteristics, are still rarely applied in the VLC-MFR field.

(6): Standardization and cross-layer design gap

The fragmented development of VLC-MFR technology has to some extent hindered its engineering process. Compared with the established standardized datasets in the RF communication field, such as RadioML and HisarMod, VLC-MFR relies on fragmented, non-reproducible, and non-public local scenario datasets. This data heterogeneity makes performance comparisons between different studies more complex.

The absence of cross-layer co-design mechanisms further constrains system-level performance optimization. There is a deep coupling relationship between physical-layer modulation recognition and upper-layer adaptive modulation and coding (AMC), hybrid automatic repeat request (HARQ), but existing research mostly adopts isolated optimization strategies. Establishing a joint optimization framework across the physical and link layers has become a key technical bottleneck for achieving 6G global coverage.

These challenges highlight the necessity for a paradigm shift in VLC-MFR research, elevating robustness, efficiency, and cross-domain collaboration to the same level of importance as incremental accuracy improvements. Future work should primarily focus on bridging the gap between controlled experiments and real-world deployments to accelerate technological maturity.

5.2. Future Outlook

In response to the challenges currently faced by VLC modulation format recognition technology, as well as the requirements of 6G for global coverage and intelligence, the future development of this technology will focus on two main objectives: enhancing algorithm robustness to improve adaptability to complex channel environments and reducing hardware resource consumption to enable real-time and accurate recognition on mobile devices. The outlook for future solutions can be summarized as follows:

(1): Multi-Domain Feature Fusion and Adaptive Feature Extraction

Integrate features from multiple domains such as time, frequency, and statistical properties to enhance the ability to distinguish between different modulation formats and reduce dependence on single signal representations. Introduce algorithms like self-attention mechanisms to dynamically allocate feature weights, reducing reliance on manual design and further enhancing the intelligence level of the algorithms.

(2): Reducing Computational Complexity and Model Size to Meet Real-Time Requirements

Explore lightweight neural network architectures, optimize model training strategies, and adopt efficient feature-dimensionality reduction techniques to enable modulation recognition algorithms to run efficiently on resource-constrained embedded devices. Focus on practical deployment with co-design of hardware and software, and optimize the dataflow scheduling and memory access patterns of convolution operations based on the hardware characteristics of mobile NPUs/FPGAs. This will promote the implementation of these algorithms in large-scale application scenarios such as the Internet of Things and vehicular networks.

(3): Enhancing Interdisciplinary Technology Integration and Innovation

For instance, draw on image analysis and signal-processing techniques from the medical field to provide new ideas and methods for the development of VLC modulation format recognition technology. Break through the performance bottlenecks of traditional algorithms to achieve high-precision recognition of complex modulated signals.

(4): Feature Enhancement Under Complex Channels

Design adaptive feature-enhancement strategies based on channel characteristics to automatically adjust the methods of feature extraction and enhancement according to different channel conditions. Maximize the distinguishability of features to further improve modulation recognition performance.

(5): Joint Optimization of Channel Estimation and Classifier

Focus on designing an integrated framework that allows channel estimation and classifiers to be jointly trained and optimized under a unified objective function. Enable channel estimation results to directly guide the learning process of the classifier and use feedback from the classifier to help channel estimation more accurately capture channel characteristics, reducing performance losses due to channel estimation errors and classifier inadaptability.

(6): Promoting Model Generalization from Simulation to Real-World Scenarios

The issues of data scarcity and domain shift can be addressed through simulation-real data collaborative enhancement techniques. Utilize generative adversarial networks (GANs) to synthesize data that conform to actual channel characteristics, and combine this with a small amount of labeled real-world data to construct a semi-supervised training set. Develop unsupervised domain adaptation frameworks that introduce adversarial training mechanisms to eliminate feature distribution differences between the training and deployment domains. Design lightweight feature alignment modules to fine-tune models under unlabeled target domain conditions. Additionally, it is necessary to establish a multi-dimensional evaluation metric system. Beyond traditional recognition accuracy, metrics such as computational latency, energy efficiency ratio, and model interpretability should be given increased importance to drive the technology towards practical application.

(7): Cross-Scenario Adaptability: Building a Meta-Learning-Driven Adaptive Recognition Framework

To enhance the scenario scalability of MFR systems, it is essential to develop dynamic adaptation mechanisms based on meta-learning. Meta-learning and few-shot learning techniques can construct a meta-feature library for modulation formats, enabling rapid adaptation to newly emerging modulation types through gradient update strategies. This addresses the challenge of zero-shot recognition across scenarios. For example, design a hierarchical meta-network (Meta-Net) that learns general modulation features at the base layer and quickly adjusts decision boundaries in the adaptation layer using a few samples from new scenarios. Further, combine this with knowledge distillation techniques to compress multi-scenario expert models into a lightweight unified model, effectively balancing model complexity and cross-scenario performance.

Future VLC-MFR research needs to shift away from the traditional focus on optimizing a single performance metric and instead pursue a holistic optimal solution that integrates algorithms, data, hardware, and scenarios. Through interdisciplinary innovation and the construction of a standardized ecosystem, it is possible to bridge the gap between laboratory settings and real-world applications, providing intelligent modulation recognition solutions that are highly reliable and low-latency for 6G integrated optical and wireless communications.

6. Summary

In summary, modulation format recognition technology for 6G mobile VLC applications should focus on the core goals of “lightweight, high-precision, and low-power consumption”. By deeply integrating algorithm innovation, architectural optimization, and hardware acceleration, existing technological bottlenecks can be overcome. With the advancement of edge computing power and new learning paradigms, MFI technology is expected to achieve substantial breakthroughs in complex channel scenarios and real-time dynamic recognition, providing reliable technical support for adaptive resource allocation in dynamic wireless optical networks.

Author Contributions

Conceptualization, S.Z., W.D. and C.L.; methodology, S.Z. and W.D.; formal analysis, S.Z. and W.D.; investigation, S.Z. and W.D.; resources, S.Z., C.L. and S.L.; data curation, S.L. and R.L.; writing—original draft preparation, S.Z. and W.D.; writing—review and editing, S.Z., W.D. and C.L.; visualization, S.L. and R.L.; supervision, S.Z. and C.L.; project administration, S.Z. and W.D.; funding acquisition, S.Z., C.L. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Innovation Project of Guangxi Graduate Education (No. YCSW2025507), Guangxi Natural Science Foundation (No. 2025GXNSFBA069254), Guangxi Key Technologies R&D Program (No. AB241484046).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, P.; Xiao, Y.; Xiao, M.; Li, S. 6G Wireless Communications: Vision and Potential Techniques. IEEE Netw. 2019, 33, 70–75. [Google Scholar] [CrossRef]
Chi, N.; Zhou, Y.; Wei, Y.; Hu, F. Visible Light Communication in 6G: Advances, Challenges, and Prospects. IEEE Veh. Technol. Mag. 2020, 15, 93–102. [Google Scholar] [CrossRef]
Tang, L.; Wu, Y.; Cheng, Z.; Teng, D.; Liu, L. Over 23.43 Gbps Visible Light Communication System Based on 9 V Integrated RGBP LED Modules. Opt. Commun. 2023, 534, 129317. [Google Scholar] [CrossRef]
Li, C.Y.; Lu, H.H.; Tsai, W.S.; Wang, Z.H.; Hung, C.W.; Su, C.W.; Lu, Y.F. A 5 m/25 Gbps Underwater Wireless Optical Communication System. IEEE Photonics J. 2018, 10, 1–9. [Google Scholar] [CrossRef]
Matthews, W.; Ahmed, Z.; Ali, W.; Collins, S. A 3.45 Gigabits/s SiPM-Based OOK VLC Receiver. IEEE Photonics Technol. Lett. 2021, 33, 487–490. [Google Scholar] [CrossRef]
Xu, B.; Min, T.; Yue, C.P. Design of PAM-8 VLC Transceiver System Employing Neural Network-Based FFE and Post-Equalization. Electronics 2022, 11, 3908. [Google Scholar] [CrossRef]
Zou, P.; Hu, F.; Zhao, Y.; Chi, N. On the Achievable Information Rate of Probabilistic Shaping QAM Order and Source Entropy in Visible Light Communication Systems. Appl. Sci. 2020, 10, 4299. [Google Scholar] [CrossRef]
Zenhom, Y.A.; Hamad, E.K.I.; Alghassab, M.; Elnabawy, M.M. Optical-OFDM VLC System: Peak-to-Average Power Ratio Enhancement and Performance Evaluation. Sensors 2024, 24, 2965. [Google Scholar] [CrossRef]
Zhou, Y.; Wei, Y.; Hu, F.; Hu, J.; Zhao, Y.; Zhang, J.; Jiang, F.; Chi, N. Comparison of Nonlinear Equalizers for High-Speed Visible Light Communication Utilizing Silicon Substrate Phosphorescent White LED. Opt. Express 2020, 28, 2302–2316. [Google Scholar] [CrossRef]
Yakkati, R.R.; Tripathy, R.K.; Cenkeramaddi, L.R. Radio Frequency Spectrum Sensing by Automatic Modulation Classification in Cognitive Radio System Using Multiscale Deep CNN. IEEE Sens. J. 2022, 22, 926–938. [Google Scholar] [CrossRef]
Huang, S.; Lin, C.; Xu, W.; Gao, Y.; Feng, Z.; Zhu, F. Identification of Active Attacks in Internet of Things: Joint Model- and Data-Driven Automatic Modulation Classification Approach. IEEE Internet Things J. 2021, 8, 2051–2065. [Google Scholar] [CrossRef]
Reddy, R.; Sinha, S. State-of-the-Art Review: Electronic Warfare against Radar Systems. IEEE Access 2025, 13, 57530–57567. [Google Scholar] [CrossRef]
Sliti, M.; Mrabet, M.; Garai, M.; Ammar, L.B. A Survey on Machine Learning Algorithm Applications in Visible Light Communication Systems. Opt. Quantum Electron. 2024, 56, 1351. [Google Scholar] [CrossRef]
Zedini, E.; Oubei, H.M.; Kammoun, A.; Hamdi, M.; Ooi, B.S.; Alouini, M.-S. Unified Statistical Channel Model for Turbulence-Induced Fading in Underwater Wireless Optical Communication Systems. IEEE Trans. Commun. 2019, 67, 2893–2907. [Google Scholar] [CrossRef]
Jajoo, G.; Kumar, Y.; Yadav, S.K.; Adhikari, B.; Kumar, A. Blind Signal Modulation Recognition Through Clustering Analysis of Constellation Signature. Expert Syst. Appl. 2017, 90, 13–22. [Google Scholar] [CrossRef]
Xu, J.L.; Su, W.; Zhou, M. Likelihood-Ratio Approaches to Automatic Modulation Classification. IEEE Trans. Syst. Man Cybern. Part C 2010, 41, 455–469. [Google Scholar] [CrossRef]
Zheng, J.; Lv, Y. Likelihood-Based Automatic Modulation Classification in OFDM with Index Modulation. IEEE Trans. Veh. Technol. 2018, 67, 8192–8204. [Google Scholar] [CrossRef]
Ali, A.; Yangyu, F. Unsupervised Feature Learning and Automatic Modulation Classification Using Deep Learning Model. Phys. Commun. 2017, 25, 75–84. [Google Scholar] [CrossRef]
Ren, H.; Yu, J.; Wang, Z.; Chen, J.; Yu, C. Modulation Format Recognition in Visible Light Communications Based on Higher Order Statistics. In Proceedings of the 2017 Conference on Lasers and Electro-Optics Pacific Rim (CLEO-PR), Singapore, 31 July–4 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–2. [Google Scholar]
He, J.; Zhou, Y.; Shi, J.; Tang, Q. Modulation Classification Method Based on Clustering and Gaussian Model Analysis for VLC System. IEEE Photonics Technol. Lett. 2020, 32, 651–654. [Google Scholar] [CrossRef]
Xiong, S. Intelligent Modulation Recognition Algorithm for Optical Communication. J. Intell. Fuzzy Syst. 2021, 40, 5845–5852. [Google Scholar] [CrossRef]
Ağır, T.T.; Sönmez, M. The Modulation Classification Methods in PPM–VLC Systems. Opt. Quantum Electron. 2023, 55, 223. [Google Scholar] [CrossRef]
Mohamed, S.E.D.N.; Mortada, B.; El-Shafai, W.; Khalaf, A.A.M.; Zahran, O.; Dessouky, M.I.; El-Rabaie, E.-S.M.; Abd El-Samie, F.E. Automatic Modulation Classification in Optical Wireless Communication Systems Based on Cancellable Biometric Concepts. Opt. Quantum Electron. 2023, 55, 389. [Google Scholar] [CrossRef]
Zhang, X.; Zeng, Z.; Du, P.; Lin, B.; Chen, C. Intelligent Index Recognition for OFDM with Index Modulation in Underwater OWC Systems. IEEE Photonics Technol. Lett. 2024, 36, 1249–1252. [Google Scholar] [CrossRef]
Jin, H.; Jeong, S.; Park, D.C.; Kim, S.C. Convolutional Neural Network Based Blind Automatic Modulation Classification Robust to Phase and Frequency Offsets. IET Commun. 2020, 14, 3578–3584. [Google Scholar] [CrossRef]
O’Shea, T.J.; Corgan, J.; Clancy, T.C. Convolutional Radio Modulation Recognition Networks. In Proceedings of the 17th International Conference on Engineering Applications of Neural Networks (EANN 2016), Aberdeen, UK, 2–5 September 2016; Springer: Cham, Switzerland, 2016; pp. 213–226. [Google Scholar]
Liu, W.; Li, X.; Yang, C.; Luo, M. Modulation Classification Based on Deep Learning for DMT Subcarriers in VLC System. In Proceedings of the 2020 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 8–12 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–3. [Google Scholar]
Mortada, B.; El-Shafai, W.; Mohamed, S.E.D.N.; Zahran, O.; El-Rabaie, E.-S.M.; Abd El-Samie, F.E. Fan-Beam Projection for Modulation Classification in Optical Wireless Communication Systems. Appl. Opt. 2022, 61, 1041–1048. [Google Scholar] [CrossRef]
Gu, Y.; Wu, Z.; Li, X.; Tian, R.; Ma, S.; Jia, T. Modulation Format Identification in a Satellite to Ground Optical Wireless Communication Systems Using a Convolution Neural Network. Appl. Sci. 2022, 12, 3331. [Google Scholar] [CrossRef]
Gao, W.; Xu, C.; Xu, Z.; Jin, R.Z.; Chi, N. Modulation Format Recognition Based on Coordinate Transformation and Combination in VLC System. In Proceedings of the 2022 Asia Communications and Photonics Conference (ACP), Shenzhen, China, 5–8 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1151–1155. [Google Scholar]
Wang, Y.; Wu, Z.; Zhao, Y.; Yan, Z.; Mao, R.; Zhu, H. Improved YOLOv5s Algorithm for Modulation Format Recognition of Visible Light Communication Signal. Opt. Commun. Technol. 2024, 48, 18–22. (In Chinese) [Google Scholar]
Arafa, N.A.; Lizos, K.A.; Alfarraj, O.; Shawki, F.; Abd El-atty, S.M. Deep Learning Approach for Automatic Modulation Format Identification in Vehicular Visible Light Communications. Opt. Quantum Electron. 2024, 56, 1083. [Google Scholar] [CrossRef]
Gao, W.; Wang, Y.; Xu, C.; Xu, Z.; Chi, N. Research on Modulation Format Recognition for Underwater Visible Light Communication Based on BiGRU. Study Opt. Commun. 2025, 2, 240023. Available online: http://kns.cnki.net/kcms/detail/42.1266.TN.20240223.1857.004.html (accessed on 20 March 2025). (In Chinese).
Zhang, L.; Zhou, X.; Du, J.; Tian, P. Fast Self-Learning Modulation Recognition Method for Smart Underwater Optical Communication Systems. Opt. Express 2020, 28, 38223–38240. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, MA, USA, 2017; pp. 1126–1135. [Google Scholar]
Zhao, Z.; Khan, F.N.; Li, Y.; Wang, Z.; Zhang, Y.; Fu, H.Y. Application and Comparison of Active and Transfer Learning Approaches for Modulation Format Classification in Visible Light Communication Systems. Opt. Express 2022, 30, 16351–16361. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Lin, X.; Shi, J.; Li, Z.; Chi, N. Modulation Format Recognition in a UVLC System Based on Reservoir Computing with Coordinate Transformation and Folding Algorithm. Opt. Express 2023, 31, 17331–17344. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Liu, M.; Yang, J.; Gui, G. Data-Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios. IEEE Trans. Veh. Technol. 2019, 68, 4074–4077. [Google Scholar] [CrossRef]
Yao, L.; Li, F.; Zhang, H.; Zhou, Y.; Wei, Y.; Li, Z.; Shi, J.; Zhang, J.; Shen, C.; Chi, N. Modulation Format Recognition in a UVLC System Based on an Ultra-Lightweight Model with Communication-Informed Knowledge Distillation. Opt. Express 2024, 32, 13095–13110. [Google Scholar] [CrossRef]
Zheng, X.; He, Y.; Zhang, C.; Miao, P. VLCMnet-Based Modulation Format Recognition for Indoor Visible Light Communication Systems. Photonics 2024, 11, 403. [Google Scholar] [CrossRef]
O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-air deep learning based radio signal classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [Google Scholar] [CrossRef]
Tekbıyık, K.; Ekti, A.R.; Görçin, A.; Kurt, G.K.; Keçeci, C. Robust and fast automatic modulation classification with CNN under multipath fading channels. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–6. [Google Scholar]
Xu, W.; Zhang, M.; Han, D.; Ghassemlooy, Z.; Luo, P.; Zhang, Y. Real-time 262-Mb/s visible light communication with digital predistortion waveform shaping. IEEE Photonics J. 2018, 10, 1–10. [Google Scholar] [CrossRef]
Wang, Q.; Giustiniano, D.; Zuniga, M. In light and in darkness, in motion and in stillness: A reliable and adaptive receiver for the internet of lights. IEEE J. Sel. Areas Commun. 2017, 36, 149–161. [Google Scholar] [CrossRef]
Mao, L.; Li, C.; Li, H.; Chen, X.; Mao, X.; Chen, H. A mixed-interval multi-pulse position modulation scheme for real-time visible light communication system. Opt. Commun. 2017, 402, 330–335. [Google Scholar] [CrossRef]
Wu, H.; Wang, Q.; Xiong, J.; Zuniga, M. SmartVLC: Co-designing smart lighting and communication for visible light networks. IEEE Trans. Mob. Comput. 2019, 19, 1956–1970. [Google Scholar] [CrossRef]
Randy, L.D.; Sebastián, B.P.J.; San Millán, H.E. Time synchronization technique hardware implementation for OFDM systems with Hermitian Symmetry for VLC applications. IEEE Access 2023, 11, 42222–42233. [Google Scholar] [CrossRef]
Liang, J.; Lin, S.; Ke, X. Design of Indoor Visible Light Communication PAM4 System. Appl. Sci. 2024, 14, 1663. [Google Scholar] [CrossRef]
Perlaza, J.S.B.; Domínguez, R.L.; Heredia, E.S.M. Phase Characterization and Correction in a Hardware Implementation of an OFDM-Based System for VLC Applications. IEEE Photonics J. 2023, 15, 1–7. [Google Scholar] [CrossRef]
Kumar, S.; Mahapatra, R.; Singh, A. Automatic modulation recognition: An FPGA implementation. IEEE Commun. Lett. 2022, 26, 2062–2066. [Google Scholar] [CrossRef]
Woo, J.; Jung, K.; Mukhopadhyay, S. Efficient Hardware Design of DNN for RF Signal Modulation Recognition Employing Ternary Weights. IEEE Access 2024, 12, 80165–80175. [Google Scholar] [CrossRef]

Figure 1. Modulation format recognition in VLC systems and its fundamental principles.

Figure 2. Basic structure of likelihood-based modulation recognition methods.

Figure 3. Basic structure of feature-based modulation recognition methods.

Figure 4. The Pc (accuracy) on classification among 4/16-PAM and 4/16-QAM ACO-OFDM in reference [19].

Figure 5. The accuracy performance of DT, KNN, SVM, and LM model for all distances between receiver and transmitter in reference [22].

Figure 6. Accuracy of different recognition algorithms for OFDM-IM using (a) BPSK, (b) 4QAM, and (c) 8QAM in reference [24].

Figure 7. Basic structure of modulation recognition based on deep learning models.

Figure 8. Geometry of FBP in reference [28].

Figure 9. Timeline and keywords of recent research on deep learning-based modulation format recognition in visible light communication.

Table 1. Comparison of feature-based modulation recognition schemes.

Reference	Year	Main Design Scheme	Research Objective	Performance (%)	SNR (dB)
[19]	2017	Utilizing fourth-order cumulants for noise suppression and feature extraction, establishing classification thresholds through Monte Carlo simulations.	To distinguish high-order complex modulation formats and enhance noise robustness and modulation sensitivity.	88.9 (Acc)	15
[20]	2020	Extracting constellation diagram features through clustering analysis and two-dimensional Gaussian models, combined with decision trees for M-QAM signal recognition.	To improve the recognition accuracy of M-QAM under low-SNR conditions.	100 (Acc)	18
[21]	2021	Discrete Wavelet Transform for multi-scale noise suppression, integrating multi-dimensional features with supervised learning classification models.	To address the issue of decreased modulation recognition accuracy in optical communication systems under time-varying noise conditions.	96 (Acc)	15
[22]	2023	Designing time-domain integral windows for energy accumulation feature extraction for L-PPM signals, using multi-classifier comparison.	To achieve real-time recognition of multi-order PPM signals under complex channels.	97.78 (Acc)	25
[23]	2023	Combining chaotic mapping and wavelet fusion to process constellations, generating encrypted templates and employing autocorrelation estimation for classification.	To efficiently classify eight modulation formats (PSK/QAM) under dynamic channels with low complexity.	100 (AROC)	5
[24]	2024	Extracting frequency-domain histogram feature vectors and applying machine learning methods for classification.	To identify the number of activated subcarriers in OFDM-IM sub-blocks in dynamic underwater optical communication channels in real-time.	100 (Acc)	13

Table 2. Main MFR open datasets for RF systems.

Dataset	Channel Conditions	Modulation Types	File Format	Data Shape	Dataset Size	SNR Range (dB)
RML 2016.10a	Including time-varying channels such as carrier frequency offset, sampling rate offset, AWGN, multipath, and fading	11 classes (8PSK, BPSK, CPFSK, GFSK, PAM4, 16QAM, AM-DSB, AM-SSB, 64QAM, QPSK, WBFM)	.pkl	2 × 128	220,000	−20:2:18
RML 2018.01a	Tested and generated in a real laboratory environment, considering more complex channel conditions	24 classes (OOK, 4ASK, 8ASK, BPSK, QPSK, 8PSK, 16PSK, 32PSK, 16APSK, 32APSK, 64APSK, 128APSK, 16QAM, 32QAM, 64QAM, 128QAM, 256QAM, AM-SSB-WC, AM-SSB-SC, AM-DSB-WC, AM-DSB-SC, FM, GMSK, OQPSK)	.h5	2 × 1024	2,555,904	−20:2:30
HisarMod 2019.1	Simulated channel conditions include ideal, static, Rayleigh, Rician (k = 3), and Nakagami-m (m = 2) channels with varying numbers of channel taps.	26 classes (AM-DSB, AM-SC, AM-USB, AM-LSB, FM, PM, 2FSK, 4FSK, 8FSK, 16FSK, 4PAM, 8PAM, 16PAM, BPSK, QPSK, 8PSK, 16PSK, 32PSK, 64PSK, 4QAM, 8QAM, 16QAM, 32QAM, 64QAM, 128QAM, 256QAM)	.mat	2 × 1024	780,000	−20:2:18

Table 3. Typical self-built datasets for MFR in VLC systems.

Reference	System Platform	Channel Conditions	Modulation Types	Sample Format	Dataset Size	Parameter Range
Mortada B. et al. [28]	Optisystem	A 4-km FSO link with an attenuation of 0.43 dB/km	8 classes (2/4/8/16PSK, 8/16/32/64QAM)	Fan-beam constellation diagram	Unclear	5:5:30 dB (OSNR)
Gu Y. et al. [29]	MATLAB	Gamma–gamma atmospheric channel, AWGN	4 classes (OOK, BPSK, QPSK, 16QAM)	2 × 1024 I/Q sequence	960,000	10:30 dB (SNR)
Wang Y. et al. [31]	MATLAB	AWGN, Doppler shift, Rician multipath fading, clock offset	4 classes (BPSK, QPSK, 8PSK, 16QAM)	Time-frequency diagram.	4400	0:2:20 dB (SNR)
Arafa N. et al. [32]	MATLAB R2020b	Simulation of a realistic traffic multi-vehicle VLC path loss model	8 classes (QPSK, 8/16PSK, 4/8/16/32/64QAM)	Hough constellation diagram	32,000	5:25 dB (SNR)
Gao W. et al. [33]	Real UVLC experimental system	Real UVLC channel with adjustable LED driving voltage	10 classes (2/4/8/16/32/64QAM, 8/16/32/64APSK)	4 × N sequence	Unclear	0.1~0.55 V (Voltage)
Zhang L. et al. [34]	MATLAB R2018b	UOWC channel incorporating scattering, absorption, and turbulence effects	15 classes (4/8/16/32/64/128/256 QAM, 2/4/8ASK, 2/4/8/16/32 PSK)	Constellation diagram	300	6:3:15 dB (SNR)

Table 4. Summary and comparison of deep learning-based modulation recognition methods.

Reference	Year	Input	Model	Modulation Types	Typical Accuracy (%)	Typical Conditions
Liu W. et al. [27]	2020	Pseudo constellation diagram	GoogLeNet V3	BPSK, 4/8/16/32/64QAM	98	SNR = 15 dB
Mortada B. et al. [28]	2022	Fan-beam constellation diagram	AlexNet	2/4/8/16PSK, 8/16/32/64QAM	100	OSNR = 15 dB
Gu Y. et al. [29]	2022	2 × 1024 IQ sequence	CNN	OOK, BPSK, QPSK, 16QAM	99.98	SNR = 10~30 dB
Gao W. et al. [30]	2022	2 × 2N matrix	DrCNN	4/8/16QAM,8PSK, OOK, 16APSK	98.3	SNR = 20 dB
Wang Y. et al. [31]	2024	Time-frequency diagram	YOLOv5s	BPSK, QPSK, 8PSK, 16QAM	0.993 (mAP)	SNR = 20 dB
Arafa N. et al. [32]	2024	Hough transform constellation diagram	Pre-Trained AlexNet	QPSK, 8/16PSK, 4/8/16/32/64QAM	100	SNR = 12 dB
Gao W. et al. [33]	2024	4 × N sequence	BiGRU	2/4/8/16/32/64QAM, 8/16/32/64APSK	>96	Linear working area
Zhang L. et al. [34]	2020	Constellation diagram	PGML-CNN	4/8/16/32/64/128/256 QAM, 2/4/8ASK, 2/4/8/16/32 PSK	95.63	SNR = 6 dB
Zhao Z. et al. [36]	2022	Contour stellar image	AlexNet-AL	2/4/8/16/32/64QAM	88.78	SNR = 0~15 dB
Li F. et al. [37]	2023	IQ samples processed by coordinate transformation and folding algorithm.	Reservoir Computing	OOK, 4QAM, 8QAM-DIA, 8QAM-CIR, 16APSK, 16QAM	>90	Linear working area
Yao L. et al. [39]	2024	Constellation diagram	CIKD-CNN	PAM4, QPSK, 8QAM-CIR, 8QAM-DIA, 16/32QAM, 16/32APSK	100	Ideal working area
Zheng X. et al. [40]	2024	Constellation diagram	TCN-LSTM + MMAnet	4/8/16/32/64QAM	99.2	SNR = 4 dB

Table 5. Comparison of three types of modulation recognition schemes.

Methods	Characteristics	Advantages	Disadvantages
Likelihood-Based Methods	Construct likelihood functions based on Bayesian statistical theory; require assumptions about channel models and parameter distributions	Optimal in Bayesian theory; rigorous modeling of noise statistical properties	Extremely high computational complexity; dependence on precise prior parameter distributions; difficulty in distinguishing nested modulation types
Feature-Based Methods	Manually designed time/frequency/statistical features; combined with traditional classifiers; strong feature interpretability	Lower computational complexity; effective under small sample conditions; simple hardware implementation	Dependence on high SNR; expert knowledge required for manual feature design; poor adaptability to time-varying channels
Deep Learning-Based Methods	Data-driven automatic feature extraction; capable of end-to-end classification	Ability to capture complex features; better performance under low SNR; adaptive to new modulation formats	Requires large amounts of labeled data; time-consuming model training; poor interpretability; high hardware resource demands

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, S.; Du, W.; Li, C.; Liu, S.; Li, R. Research Progress on Modulation Format Recognition Technology for Visible Light Communication. Photonics 2025, 12, 512. https://doi.org/10.3390/photonics12050512

AMA Style

Zhou S, Du W, Li C, Liu S, Li R. Research Progress on Modulation Format Recognition Technology for Visible Light Communication. Photonics. 2025; 12(5):512. https://doi.org/10.3390/photonics12050512

Chicago/Turabian Style

Zhou, Shengbang, Weichang Du, Chuanqi Li, Shutian Liu, and Ruiqi Li. 2025. "Research Progress on Modulation Format Recognition Technology for Visible Light Communication" Photonics 12, no. 5: 512. https://doi.org/10.3390/photonics12050512

APA Style

Zhou, S., Du, W., Li, C., Liu, S., & Li, R. (2025). Research Progress on Modulation Format Recognition Technology for Visible Light Communication. Photonics, 12(5), 512. https://doi.org/10.3390/photonics12050512

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research Progress on Modulation Format Recognition Technology for Visible Light Communication

Abstract

1. Introduction

2. Likelihood-Based Modulation Recognition Methods

2.1. Signal Model

2.2. Average Likelihood Ratio Test (ALRT)

2.3. Generalized Likelihood Ratio Test (GLRT)

2.4. Hybrid Likelihood Ratio Test (HLRT)

2.5. Summary of Section 2

3. Feature-Based Modulation Recognition Methods

3.1. Higher-Order Statistics (HOS) Features

3.2. Constellation Diagram Features

3.3. Wavelet Transform

3.4. Integral Feature Extraction

3.5. Chaotic Mapping and Autocorrelation Estimation

3.6. Frequency-Domain Histogram

3.7. Summary of Section 3

4. Deep Learning-Based Modulation Recognition Methods

4.1. Convolutional Neural Networks (CNN)

4.2. Recurrent Neural Networks (RNN)

4.3. Other Innovative Deep Learning Models

4.4. Introduction to the VLC MFR Datasets

4.5. Prospects for the Practical Hardware Deployment of VLC MFR

4.6. Summary of Section 4

5. Conclusions and Outlook

5.1. Current Challenges

5.2. Future Outlook

6. Summary

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI