Research on Orbital Angular Momentum Recognition Technology Based on a Convolutional Neural Network

In underwater wireless optical communication (UWOC), a vortex beam carrying orbital angular momentum has a spatial spiral phase distribution, which provides spatial freedom for UWOC and, as a new information modulation dimension resource, it can greatly improve channel capacity and spectral efficiency. In a case of the disturbance of a vortex beam by ocean turbulence, where a Laguerre–Gaussian (LG) beam carrying orbital angular momentum (OAM) is damaged by turbulence and distortion, which affects OAM pattern recognition, and the phase feature of the phase map not only has spiral wavefront but also phase singularity feature, the convolutional neural network (CNN) model can effectively extract the information of the distorted OAM phase map to realize the recognition of dual-mode OAM and single-mode OAM. The phase map of the Laguerre–Gaussian beam passing through ocean turbulence was used as a dataset to simulate and analyze the OAM recognition effect during turbulence caused by different temperature ratios and salinity. The results showed that, during strong turbulence Cn2=1.0×10−13K2m−2/3, when different ω = −1.75, the recognition rate of dual-mode OAM (ℓ = ±1~±5, ±1~±6, ±1~±7, ±1~±8, ±1~±9, ±1~±10) had higher recognition rates of 100%, 100%, 100%, 100%, 98.89%, and 98.67% and single-mode OAM (ℓ = 1~5, 1~6, 1~7, 1~8, 1~9, 1~10) had higher recognition rates of 93.33%, 92.77%, 92.33%, 90%, 87.78%, and 84%, respectively. With the increase in ω, the recognition accuracy of the CNN model will gradually decrease, and in a fixed case, the dual-mode OAM has stronger anti-interference ability than single-mode OAM. These results may provide a reference for optical communication technologies that implement high-capacity OAM.


Introduction
The first section mainly introduces the background and research significance of orbital angular momentum (OAM) optical communication and investigates the research status of convolutional neural network (CNN) recognition of OAM.
With the rapid development of underwater optical communication technology, the vortex beam carrying OAM is a new type of beam, and the topological charge can theoretically be any integer, so without increasing the spectral bandwidth, the information transmission rate and channel capacity of the system [1][2][3] can be greatly improved by the OAM multiplexing mode, which can effectively solve the problems of low information transmission rate and insufficient bandwidth common in underwater communication and has great potential and wide application prospects. The most representative OAM beam is the Laguerre-Gaussian (LG) beam, and the different OAM modes are orthogonal to each other. This indicates that different OAM modes do not interfere with each other during transmission, so OAM light can be applied to the compilation code and the multiplexing transmission of wireless optical communication [4][5][6][7] to meet the growing demand for information transmission capacity; in fact, the information carried by OAM is limited, which is related to the OAM beam's own anti-interference ability and transmission interference,

Materials and Methods
Section 2 introduces the formula of the LG beam and the definition of the single mode and dual mode of LG beam. Along with the basic principle of the ocean turbulence random phase screen, the ocean turbulence channel model was constructed, and the phase distribution characteristics of LG beam single mode and dual mode were analyzed.

LG Beam
In column coordinates, the expression of the light field propagating by the LG beam along the z-axis can be expressed as [21]: LG p (r, θ, z) = 2p! (π(p+| |)!) where w(z) = w 0 1 + (z/z R ) 2 is the radius of the girdle after transmitting the distance of z; z R = kw 2 0 /2 indicates the Rayleigh length; w 0 is the zero-order girdle radius, that is, the girdle radius when the transmission distance z = 0; k = 2π/λ denotes beam; is the topological charge value of the beam, which represents the phase change of the beam along the direction angle; p is the radial factor that represents the phase change that occurs in the beam along the radius; r √ 2/w(z) | | represents a vortex core function affected by a phase singularity; L p denotes Laguerre polynomials; exp(i θ) spiral phase factor; i is the imaginary unit; and θ is the directional phase angle, which indicates that the beam carries orbital angular momentum. There the dual-mode LG beam can be expressed as: where LG − represents a single-mode LG beam with a reverse spiral, LG + represents a single-mode LG beam with a forward spiral, and LG ± represents a dual-mode LG beam, for example, LG −4 is an LG beam with = −4, LG ±4 is an LG beam with = 4, and LG ±4 is a dual-mode LG beam with = ±4.

Ocean Turbulence Random Phase Screen Model
The influence of ocean turbulence on beam transmission is simulated by passing the beam through a series of equally spaced random phase screens, and the random phase screen model of ocean turbulence is constructed by power spectrum inversion.
The common refractive index fluctuation spectrum of seawater was proposed by Nikishov [28] et al. using the expression: where C 2 n = 10 −8 χ T ε −1/3 is the equivalent temperature structural parameter, ε is the kinetic energy dissipation rate per unit volume of seawater, and the value range is [10 −10 m 2 /s 3 , 10 −1 m 2 /s 3 ];χ T is the mean square seawater temperature dissipation rate, and the value range is [10 −10 K 2 /s, 10 −4 K 2 /s]; ω is the turbulence caused by the change in temperature gradient and salinity gradient, and the value range is [−5,0]; and η is the Kolmogorov microscale, the value range is 6 × 10 −3 m, 0.01m , and regarding the depths of seawater, on the Kolmogorov scale, the size of η is close to 0.01 m. The other parameters of the equation are set to: A T = 1.863 × 10 −2 , A S = 1.9 × 10 −4 , A TS = 9.41 × 10 −3 , δ = 8.284 × Firstly, the method of generating the phase screen based on power spectrum inversion methods generates a zero mean, and a unit variance of 1 in the frequency domain Hermitian complex Gaussian random number matrix H k x , k y uses the phase spectrum of seawater that conforms to the Kolmogorov spectrum of ocean turbulence. The function filters F Φ k x , k y and H k x , k y perform the inverse Fourier transform to obtain the random phase screen of ocean turbulence φ(x, y), which can be expressed as: A matrix of Gaussian random numbers with a mean of 0 and variance N × N of 1 is generated by randn(), and then, a Fourier transform is performed H k x , k y .
The seawater phase spectrum F Φ k x , k y on a sliced surface perpendicular to the propagation direction of the beam can be expressed as: where is the ∆z propagation distance of the beam and Φ k x , k y is the refractive index fluctuation spectrum of seawater.
The random phase screen model of ocean turbulence is shown in Figure 1, the LG beam is generated at the transmitting end, the LG beam passes through the equally spaced random phase screen, and the receiving end receives the distorted LG beam phase map.
Kolmogorov microscale, the value range is   The random phase screen model of ocean turbulence is shown in Figure 1, th beam is generated at the transmitting end, the LG beam passes through the equally sp random phase screen, and the receiving end receives the distorted LG beam phase m Suppose the plane where the phase display is located is the XY plane and the is transmitted in the Z axial direction. In the spatial domain, the light field of the i beam is 0 ( , ) U x y . 0 ( , ) U x y is a complex number whose modulus magnitude ind the intensity of the light field, and the angle represents the spatial phase of the light Assuming that the beam is transmitted in a free space channel, if the transfer functi the spatial frequency domain is  Suppose the plane where the phase display is located is the XY plane and the beam is transmitted in the Z axial direction. In the spatial domain, the light field of the initial beam is U 0 (x, y). U 0 (x, y) is a complex number whose modulus magnitude indicates the intensity of the light field, and the angle represents the spatial phase of the light field. Assuming that the beam is transmitted in a free space channel, if the transfer function in the spatial frequency domain is U prop (k x , k y ), the beam is transmitted only in free space until the first phase screen is reached. The light field when it reaches the first phase screen can be expressed as: where k x and k y are the frequency components of the X axis and Y axis direction in the spatial frequency domain, F represents the Fourier transform, and F −1 represents the inverse Fourier transform. U prop (k x , k y ) is a free-space transfer function whose expression is as follows: Sensors 2023, 23, 971

of 12
After the beam passes through the phase screen, the spatial phase of its light field is affected by the phase screen model, and the light field changes: where ϕ(x, y) is the distribution function of the random phase screen. The phase distribution of an LG beam with turbulent disturbances is shown in Figure 2. As can be seen from the figure, in the absence of turbulence or after turbulence, the phase distribution is destroyed, and as the intensity of turbulence increases, its phase distribution distortion becomes more pronounced, which severely limits the effective recognition of OAM patterns.
where x k and y k are the frequency components of the X axis and Y axis direction in the spatial frequency domain, F represents the Fourier transform, and 1 F − represents the inverse Fourier transform.

( , )
prop x y U k k is a free-space transfer function whose expression is as follows: After the beam passes through the phase screen, the spatial phase of its light field is affected by the phase screen model, and the light field changes: x y ϕ is the distribution function of the random phase screen. The phase distribution of an LG beam with turbulent disturbances is shown in Figure  2. As can be seen from the figure, in the absence of turbulence or after turbulence, the phase distribution is destroyed, and as the intensity of turbulence increases, its phase distribution distortion becomes more pronounced, which severely limits the effective recognition of OAM patterns.

Convolutional Neural Networks Recognize OAM
Section 3 mainly introduces the composition of the experimental CNN model, the feature extraction of the phase map, and the experimental analysis of OAM recognition.

Construction of Convolutional Neural Networks
CNN is a multilayer perceptron similar to artificial neural network, and the CNN model architecture includes the following: input layer, convolution layer, pooling layer, and fully connected layer. The input layer preprocesses the raw image data, deaveraging and normalizing the data. The function of the convolutional layer is to extract features and enhance the original signal features, and the purpose of the convolution kernel is to extract feature information from the phase map, play the role of feature extractor, and obtain multiple feature maps. The pooling layer performs advanced feature extraction on the feature images output by the convolutional layer, reducing the weight parameters required for network training. Pooling operations include the maximum pool and the average pool, where the largest pool takes the maximum value of the sampling point and the average pool takes the average. The fully connected layer is a linear transformation and nonlinear transformation of the features obtained by the convolutional layer and the pooling layer, and its functions are the classification layer and regression layer. In CNN model construction, activation functions are used to introduce nonlinear effects into the model, enabling the model to deal with complex problems. When the fully connected layer is the classification layer and the regression layer, the activation function can be the Softmax or the Relu function.
The mathematical expression for the Relu function is [21] as follows: In Formula (9), when x takes the value is (−∞, 0), the output value of the Relu function is 0, and when the x value is greater than 0, the output value of the Relu function is equal to the input value.
The mathematical expression of the Softmax function is as follows: In Formula (10), the probability value of the j-th output is calculated, where j = 1,2,. . .,K, indicating that there are a total of K categories.
During training, the loss function is a criterion for assessing how well the model fits. To optimize the CNN training results, it is necessary to minimize the value of the loss function. Here, the cross-entropy function is used as the loss function to optimize the classification performance of the CNN model, and it can be expressed as: In the structural design of CNN, if the CNN model is too deep, the computational complexity will be large, which may produce serious overfitting, and if the CNN model is too shallow, it will not be able to effectively extract the features of the image, resulting in poor recognition accuracy. Therefore, the final network model is shown in Figure 3, as four convolutional layers, three maximum pooling layers, and one fully connected layer. In order to reduce the computational complexity of the network, the input layer normalizes the size of the input image to 128 × 128, and batches normalize it after each convolutional layer and use Relu as an excitation function to ensure that the value of the feature map is within a reasonable range. The convolutional layer output can be input to the fully connected layer as the different features of the input image, and the soft max classifier converts the feature map into the desired output to obtain the OAM mode information of the image.

Phase Map Feature Extraction
The feature extraction aspect of the convolutional neural network is mainly carried out through the convolutional layer, which can extract different features from the image. With a deepening of the number of layers, low-level features are continuously fused to form high-level features, for example, the edge features extracted at the beginning can be fused to form high-level shape features, and through the deep-level learning network, the process can master enough feature information for judgment and can finally output reliable results. Taking the first convolutional layer and the third convolutional layer as examples, as shown in Figure 4, the convolutional layer has eight convolution kernels, the output feature map has eight channels, each channel can be regarded as a grayscale map, and the convolutional layer in Figure 5 has 32 feature maps.

Phase Map Feature Extraction
The feature extraction aspect of the convolutional neural network is mainly carried out through the convolutional layer, which can extract different features from the image. With a deepening of the number of layers, low-level features are continuously fused to form high-level features, for example, the edge features extracted at the beginning can be fused to form high-level shape features, and through the deep-level learning network, the process can master enough feature information for judgment and can finally output reliable results. Taking the first convolutional layer and the third convolutional layer as examples, as shown in Figure 4, the convolutional layer has eight convolution kernels, the output feature map has eight channels, each channel can be regarded as a grayscale map, and the convolutional layer in Figure 5 has 32 feature maps.
With a deepening of the number of layers, low-level features are continuously fused to form high-level features, for example, the edge features extracted at the beginning can be fused to form high-level shape features, and through the deep-level learning network, the process can master enough feature information for judgment and can finally output reliable results. Taking the first convolutional layer and the third convolutional layer as examples, as shown in Figure 4, the convolutional layer has eight convolution kernels, the output feature map has eight channels, each channel can be regarded as a grayscale map, and the convolutional layer in Figure 5 has 32 feature maps. There are eight feature maps in the above figure, and each feature map contains the low-level feature information extracted by the convolutional layer from the original image. The above figure visualizes the feature maps of layer 1 and layer 3. In Figure 4, and it can be seen that convolutional layer 1 is visible. The extracted features are more specific and more in line with human vision. In Figure 5, the feature map of the third convolutional layer is highly abstracted, but the singularity region of the OAM beam phase map can be retained, which is also part of the efficiency of deep neural network classification recognition. The following is the interpretation of part of the feature map of convolutional layer 1. In Figure 6, The activation values on the four channels are extracted and resized to the dimensions of the original image. It can be seen that where the original image brightness transition contrast is obvious, there is a high-contrast arc at the corresponding position on the fourth channel. From this, it can be seen that channel 4 is "looking for" the characteristics of contrast.
In Figure 7, the activation value on channel 1 is extracted, and the feature value area corresponding to the black area in the original figure is presented as black, which demonstrates that the first channel is "looking for" black features. There are eight feature maps in the above figure, and each feature map contains the low-level feature information extracted by the convolutional layer from the original image.
The above figure visualizes the feature maps of layer 1 and layer 3. In Figure 4, and it can be seen that convolutional layer 1 is visible. The extracted features are more specific and more in line with human vision. In Figure 5, the feature map of the third convolutional layer is highly abstracted, but the singularity region of the OAM beam phase map can be retained, which is also part of the efficiency of deep neural network classification recognition. The following is the interpretation of part of the feature map of convolutional layer 1.
In Figure 6, The activation values on the four channels are extracted and resized to the dimensions of the original image. It can be seen that where the original image brightness transition contrast is obvious, there is a high-contrast arc at the corresponding position on the fourth channel. From this, it can be seen that channel 4 is "looking for" the characteristics of contrast. The above figure visualizes the feature maps of layer 1 and layer 3. In Figure 4, and it can be seen that convolutional layer 1 is visible. The extracted features are more specific and more in line with human vision. In Figure 5, the feature map of the third convolutional layer is highly abstracted, but the singularity region of the OAM beam phase map can be retained, which is also part of the efficiency of deep neural network classification recognition. The following is the interpretation of part of the feature map of convolutional layer 1. In Figure 6, The activation values on the four channels are extracted and resized to the dimensions of the original image. It can be seen that where the original image brightness transition contrast is obvious, there is a high-contrast arc at the corresponding position on the fourth channel. From this, it can be seen that channel 4 is "looking for" the In Figure 7, the activation value on channel 1 is extracted, and the feature value area corresponding to the black area in the original figure is presented as black, which demonstrates that the first channel is "looking for" black features. In Figure 6, The activation values on the four channels are extracted and resized to the dimensions of the original image. It can be seen that where the original image brightness transition contrast is obvious, there is a high-contrast arc at the corresponding position on the fourth channel. From this, it can be seen that channel 4 is "looking for" the characteristics of contrast.
In Figure 7, the activation value on channel 1 is extracted, and the feature value area corresponding to the black area in the original figure is presented as black, which demonstrates that the first channel is "looking for" black features.

OAM Recognition Simulation Results and Analysis
In this study, the dual-mode and single-mode OAM recognition methods based on CNN were studied, and the accuracy of OAM recognition changed with the variations. The parameters are set as follows: the wavelength was set to 532 nm, the transmission distance z was 100 m, the spacing of the phase screen was set to 10 m, and the phase screen size L was set to 0.04 m, the number of single-sided sampling points N of the phase screen was 1024, and the input image size was 128 × 128. The ratio of training set to test set in each group was 8:2, and the OAM modal recognition rate was obtained under the different ω values of turbulence intensity C 2 n = 1.0 × 10 −13 K 2 m −2/3 . The experimental results are shown in Figures 8-10.
As shown in Tables 1 and 2, the single-mode OAM recognition rate at z =100 m in the ocean turbulence channel is characterized. Combined with Figure 10, according to the data shown in Table 1, the experimental results show that in the ocean turbulence channel, with the increase in ω value, the phase distortion of the LG beam is more serious, the phase helix feature is damaged, the OAM recognition rate decreases, and the larger the number of modes, the lower the recognition rate of OAM modes.  The results show that among the ocean turbulence channels dominated by temperature fluctuations, ocean turbulence has less influence on OAM recognition based on CNN. Conversely, among the ocean turbulence channels dominated by salinity fluctuations, ocean turbulence has a greater influence on CNN-based OAM recognition.

Conclusions
Section 4 summarizes the significance of this work and then considers areas of study related to those in this paper.
This paper examines the recognition of OAM using a CNN model, expounds the research status of CNN in OAM recognition, constructs a random phase screen model of ocean turbulence, takes the LG beam phase map through ocean turbulence as a dataset, extracts the spiral wavefront and phase singularity features of the phase map based on the CNN model, and simulates and analyzes the OAM recognition effect under different ω conditions. The results show that, under strong turbulence C 2 n = 1.0 × 10 −13 K 2 m −2/3 , a good recognition effect can still be obtained, and the dual-mode OAM ( = ±1~±10) recognition can reach 98.67%, even under a disturbance of temperature and salinity ω = −1.0, and the dual-mode OAM ( = 1~10) recognition rate can reach 83.33%. Dual-mode OAM has higher recognition accuracy than single-mode OAM and has better anti-turbulence interference performance. The results can provide a reference for the study of optical communication technology of high-capacity OAM. In future studies, the influence of absorption, scattering, attenuation, and other factors can be further considered. It is believed that with continuous exploration and experimentation, the underwater information transmission rate will be greatly improved in the near future.