Dual-Output Mode Analysis of Multimode Laguerre-Gaussian Beams via Deep Learning

: The Laguerre-Gaussian (LG) beam demonstrates great potential for optical communication due to its orthogonality between different eigenstates, and has gained increased research interest in recent years. Here, we propose a dual-output mode analysis method based on deep learning that can accurately obtain both the mode weight and phase information of multimode LG beams. We reconstruct the LG beams based on the result predicted by the convolutional neural network. It shows that the correlation coefﬁcient values after reconstruction are above 0.9999, and the mean absolute error (MAE) of the mode weights and phases are about 1.4 × 10 − 3 and 2.9 × 10 − 3 , respectively. The model still maintains relatively accurate prediction for the associated unknown data set and the noise-disturbed samples. In addition, the computation time of the model for a single test sample takes only 0.975 ms on average. These results show that our method has good abilities of generalization and robustness and allows for nearly real-time modal analysis.

Optics 2021, 2 88 of OAM beams. Even in propagation environments, such as atmospheric turbulence and underwater, CNNs have shown good accuracy performance [26][27][28][29]. However, most of these studies focus on identifying a single OAM beam mode or a combination of modes of multiple OAM beams. The phase information, which is unknown for optical intensity profile images, has been less studied.
In this paper, we propose a dual-output convolutional neural network (Y-Net) based modal analysis method for the multimode Laguerre-Gaussian (LG) beams [30], which is a kind of common beam that carries OAM. Our method not only outputs the weight of each mode based on the optical intensity profile of the input beams, but also obtains the phase information simultaneously. Moreover, we evaluated the method by optical field reconstruction and prediction errors at different mode numbers and propagation distances, and obtained superior results, which further demonstrate the advantages of the proposed scheme. Our approach has the potential to provide implications for accurate, robust and fast real-time modal analysis of OAM beams.

Materials and Methods
In the cylindrical coordinate system, a single-mode LG beam with zero radial index can be represented as [31]: where l represents the topological charge, r is the radiation distance, ϕ is the azimuth, z is the propagation distance, A |l| = 2 π|l|! , w(z)= w(0) (z 2 +z R 2 ) z R 2 1 2 , L |l| is Laguerre polynomial, z R = kw 0 2 2 is Rayleigh length and k is the wave vector. The superimposed optical field of LG beams with different l-quantum numbers, which are orthogonal to each other, can be expressed by the following equations: U(r, ϕ, z) = N ∑ n=1 a n e iθ n u l n (r, ϕ, z) (2) where N is the number of modes, u l n (r, ϕ, z) is the nth LG beam eigenmode, a n and θ n are the amplitude and phase of each eigenmode, respectively. a n 2 is the proportion of the nth eigenmode in the superimposed optical field and satisfies this expression N ∑ n=1 a n 2 = 1, which we call the mode weight. The optical intensity profiles of the multimode LG beams are shown in Figure 1.
In this paper, we propose a dual-output convolutional neural network (Y-Net) based modal analysis method for the multimode Laguerre-Gaussian (LG) beams [30], which is a kind of common beam that carries OAM. Our method not only outputs the weight of each mode based on the optical intensity profile of the input beams, but also obtains the phase information simultaneously. Moreover, we evaluated the method by optical field reconstruction and prediction errors at different mode numbers and propagation distances, and obtained superior results, which further demonstrate the advantages of the proposed scheme. Our approach has the potential to provide implications for accurate, robust and fast real-time modal analysis of OAM beams.

Materials and Methods
In the cylindrical coordinate system, a single-mode LG beam with zero radial index can be represented as [31]: where l represents the topological charge, r is the radiation distance, φ is the azimuth, z is is Rayleigh length and k is the wave vector. The superimposed optical field of LG beams with different l-quantum numbers, which are orthogonal to each other, can be expressed by the following equations: U r, φ, z = a n e iθ n u n l r, φ, z N n = 1 (2) where N is the number of modes, u n l r, φ, z is the n th LG beam eigenmode, a n and θ n are the amplitude and phase of each eigenmode, respectively. a n 2 is the proportion of the n th eigenmode in the superimposed optical field and satisfies this expression a n 2 N n = 1 = 1, which we call the mode weight. The optical intensity profiles of the multimode LG beams are shown in Figure 1.  Furthermore, the weights of different modes can be expressed as a 1 2 , a 2 2 , · · · , a N 2 , while the expressing of the phase is a little different. Since the phases of each mode are relative, we define the first mode as the fundamental mode, which has a phase value of zero. The other modes are in relative phase, expressed as [θ 1 , θ 2 , · · · , θ N − 1 ]. It should be noted that the number of elements of the vector of relative phase is one less than the number of modes, and we also linearly scale the elemental values of this vector from [0, 2π] to [0,1]. In addition, we choose the optical field intensity profile of the multimode LG beams as the input, defined as I(x, y) = |U(r, ϕ, z)| 2 .
CNN is a typical deep learning method. In the ImageNet competition in 2012, Krizhevsky et al. proposed a CNN-AlexNet with ReLU as the activation function, which achieved a far better performance than other algorithms and gained wide attention from researchers [32]. A general CNN framework is composed of several layers with different functions connected in a certain order, and the output of each layer is used as the input features of the next layer and up to the output layer of the final model. In addition, a number of adjustable parameters are available for each layer for training. The core component of CNN is the convolutional layer, which extracts the features of the input image through convolutional operations and characterizes the obtained features into higher-dimensional feature spaces.
Other layers also have important roles, for example, the batch normalization (BN) layer normalizes the feature vector output from the convolutional layer. The max-pooling layer extracts a window from the feature map input and outputs the maximum value for each channel. The role of the fully connected (FC) layer is to map the learned features to the sample labeling space.
In order to obtain the amplitude and phase information of different modes in the LG beams simultaneously, we design a dual output convolutional neural network, as shown in Figure 2 below. The convolutional operation part of the model consists of 4 blocks, connected with 2 fully connected layers and the output layer, showing a Y-shaped structure. The branch of the dual output structure consists of the 2nd-4th blocks, the fully connected layer and the output layer. Each block contains 3 convolutional layers, 3 batch normalization layers and 1 max-pooling layer. The convolution kernels of the convolution layers in each block are of a size 3 × 3 and step size 1 × 1.
Optics 2021, 2, 9 89 Furthermore, the weights of different modes can be expressed as a 1 2 , a 2 2 , ⋯, a N 2 , while the expressing of the phase is a little different. Since the phases of each mode are relative, we define the first mode as the fundamental mode, which has a phase value of zero. The other modes are in relative phase, expressed as θ 1 , θ 2 , ⋯, θ N -1 . It should be noted that the number of elements of the vector of relative phase is one less than the number of modes, and we also linearly scale the elemental values of this vector from [0, 2π] to [0, 1]. In addition, we choose the optical field intensity profile of the multimode LG beams as the input, defined as CNN is a typical deep learning method. In the ImageNet competition in 2012, Krizhevsky et al. proposed a CNN-AlexNet with ReLU as the activation function, which achieved a far better performance than other algorithms and gained wide attention from researchers [32]. A general CNN framework is composed of several layers with different functions connected in a certain order, and the output of each layer is used as the input features of the next layer and up to the output layer of the final model. In addition, a number of adjustable parameters are available for each layer for training. The core component of CNN is the convolutional layer, which extracts the features of the input image through convolutional operations and characterizes the obtained features into higher-dimensional feature spaces.
Other layers also have important roles, for example, the batch normalization (BN) layer normalizes the feature vector output from the convolutional layer. The max-pooling layer extracts a window from the feature map input and outputs the maximum value for each channel. The role of the fully connected (FC) layer is to map the learned features to the sample labeling space.
In order to obtain the amplitude and phase information of different modes in the LG beams simultaneously, we design a dual output convolutional neural network, as shown in Figure 2 below. The convolutional operation part of the model consists of 4 blocks, connected with 2 fully connected layers and the output layer, showing a Y-shaped structure. The branch of the dual output structure consists of the 2nd-4th blocks, the fully connected layer and the output layer. Each block contains 3 convolutional layers, 3 batch normalization layers and 1 max-pooling layer. The convolution kernels of the convolution layers in each block are of a size 3 × 3 and step size 1 × 1. The advantages of this design are that, on the one hand, with limited hardware computing power, it takes less time to use one model to output both amplitude and phase than to use two models separately; on the other hand, the dual output structure is connected through block1 in Figure 2, and the two branch structures share the output feature map of block1. We believe that such dual-output structure will retain the link between amplitude and phase for the optical intensity profile. The advantages of this design are that, on the one hand, with limited hardware computing power, it takes less time to use one model to output both amplitude and phase than to use two models separately; on the other hand, the dual output structure is connected through block1 in Figure 2, and the two branch structures share the output feature map of block1. We believe that such dual-output structure will retain the link between amplitude and phase for the optical intensity profile.
The optical field intensity profile is limited to 128 × 128 image size, and the mode proportion of each mode is randomly and uniformly distributed between (0, 1) and normalized, and the relative phase values are randomly and uniformly distributed in [0, 2π] and linearly transformed. Other parameters are set as follows: the wavelength of the LG beam is 1064 nm, the beam waist radius is 15 mm. We generate a total of 100,000 samples (one input image, two label vectors) and divide them into three data sets: the training set, the validation set and the test set, which contain data in the ratio of 6:2:2. The model is trained using the samples in the training set and validated in the validation set, and we can pause to adjust the model parameters during the training process and finally test the model in the test set.
The performance of the model is closely related to the setting of hyper-parameters. We use a mini-batch of size 64 to speed up the computation and use the Adam function with an initial learning rate of 0.01 as an optimizer. Moreover, our model uses a decaying learning rate training method, in which the learning rate is halved every 4 epochs for the first 20 training epochs, and each epoch for the subsequent training epochs. As for the activation functions in the output layer, Softmax and Sigmoid [33] functions are used as activation functions for predicting the mode weights and relative phases, respectively. The loss function of the model considers the mean absolute error (MAE) function, and the weight ratio of the two vectors of mode weights and relative phase is set to 1:1, that is Loss = Loss A +Loss P , then the final loss function can be shown as: where N is the number of vector elements, y n is the element in the real label vector, and y n is the element in the predicted label vector. A, P in the corner scale represent the weight vector and the relative phase vector, respectively. All training and testing processes in this work are performed on a GPU server with RTX 2080ti graphics card, and the loss function values of the model converged after 30 training epochs, with the total process taking only about 25 min. The model is tested on a test set of 20,000 samples in 19.5 s, indicating an average computation time of 0.975 ms per sample, which demonstrates that our model allows for fast real-time modal analysis of multimode LG beams. It should be noted that the complexity of the model could be reduced by adjusting the hyperparameters and structure of the model to speed up the computation of the model. Using parallel computing to provide more computational resources is also a way to reduce computation time.

Results
When the weights of each mode in the multimode LG beams are known, as well as the relative phase, the input optical field intensity profile can be easily reconstructed, and the accuracy of the model prediction can be visualized by reconstructing the image. Here we have used the correlation coefficient to characterize the effect of the reconstruction [34], which is expressed as follows: where ∆I j (r)= I j (r) − I j (r), (j = m, r) and I j represents the mean value of the input optical intensity I m or the reconstructed optical intensity of I r . The value of C shows the similarity between the reconstructed image and the original image and is provided by Figure 3. Ideally, when the reconstructed image is the same as the original image, the correlation coefficient C is the maximum value of 1. The residual image profile [35] can be expressed by ∆I(x, y) = |I m − I r |, which is the absolute value of the difference between the reconstructed image and the original image at each pixel point.
by ∆I x, y = |I m -I r |, which is the absolute value of the difference between the reconstructed image and the original image at each pixel point.
( a) (b) We use a dual-output CNN to predict the weights and relative phases of multimode LG beams with 9 modes (l = 0, 1, ⋯, 8) superimposed, and put the reconstructed image according to Equation (2) together with the original input image and the residual image for visual comparison, and the results are shown in Figure 3. It can be seen that the correlation coefficient between the reconstructed image and the input image is above 0.999, while the intensity value of the residual image is almost 0, which indicates that our scheme is very feasible. It should be noted in Figure 3b that the residual phase images have several red points at which the phase values converge to 2π, indicating that the predicted values of the phase at these points may be opposite to the true values. Since the phase of optical wave is periodic, the phase difference converging to 2π can be considered as converging to 0, which is unavoidable when only one optical field profile is involved in the modal analysis [36].
We investigate the effect of the mode number of multimode LG beams on the mode analysis performance of the CNN, which is evaluated by the MAE function. The weight error and phase error are defined as ∆a 2 = 1 N a 2 p − a 2 t and ∆θ = 1 N -1 θ p − θ t , where the corner scale p and t denote the predicted and true values, respectively. As shown in Figure 4, the mode weight error and phase error gradually increase as the number of modes increases, and the phase error is always higher than the weight error, while the difference between them also becomes larger with the increase in the number of modes. Possible reason is that the optical intensity profile of the multimode LG beams is progressively more complex as the number of modes increases, which increases the difficulty of feature extraction and characterization of the CNN and leads to a gradual increase in the weights and phase error. However, this situation can be improved by training a larger number of [18] or higher resolution [35] samples. We can also use other methods that are common in the field of deep learning to reduce the prediction error of the model. For example, pre-training, hyperparameters adjusting, and so on.
Good generalizability is an important dimension when evaluating the performance of CNN. In practical application scenarios of optical communication, l-quantum number non-adjacent LG beams are used for multiplexing to avoid crosstalk of adjacent OAM modes during propagation. Our model is trained based on samples with adjacent l-quantum numbers, and samples with non-adjacent l are not considered. To verify whether the model performs better on unknown samples, we generate two datasets with mode composition of l = 1, 3, 5, 7, 9 and l = 1, 5, 9, respectively. We tested the CNN trained by the We use a dual-output CNN to predict the weights and relative phases of multimode LG beams with 9 modes (l = 0, 1, · · · , 8) superimposed, and put the reconstructed image according to Equation (2) together with the original input image and the residual image for visual comparison, and the results are shown in Figure 3. It can be seen that the correlation coefficient between the reconstructed image and the input image is above 0.999, while the intensity value of the residual image is almost 0, which indicates that our scheme is very feasible. It should be noted in Figure 3b that the residual phase images have several red points at which the phase values converge to 2π, indicating that the predicted values of the phase at these points may be opposite to the true values. Since the phase of optical wave is periodic, the phase difference converging to 2π can be considered as converging to 0, which is unavoidable when only one optical field profile is involved in the modal analysis [36].
We investigate the effect of the mode number of multimode LG beams on the mode analysis performance of the CNN, which is evaluated by the MAE function. The weight error and phase error are defined as ∆a 2 = 1 N a 2 p − a 2 t and ∆θ = 1 N − 1 θ p − θ t , where the corner scale p and t denote the predicted and true values, respectively. As shown in Figure 4, the mode weight error and phase error gradually increase as the number of modes increases, and the phase error is always higher than the weight error, while the difference between them also becomes larger with the increase in the number of modes. Possible reason is that the optical intensity profile of the multimode LG beams is progressively more complex as the number of modes increases, which increases the difficulty of feature extraction and characterization of the CNN and leads to a gradual increase in the weights and phase error. However, this situation can be improved by training a larger number of [18] or higher resolution [35] samples. We can also use other methods that are common in the field of deep learning to reduce the prediction error of the model. For example, pre-training, hyperparameters adjusting, and so on.
Good generalizability is an important dimension when evaluating the performance of CNN. In practical application scenarios of optical communication, l-quantum number nonadjacent LG beams are used for multiplexing to avoid crosstalk of adjacent OAM modes during propagation. Our model is trained based on samples with adjacent l-quantum numbers, and samples with non-adjacent l are not considered. To verify whether the model performs better on unknown samples, we generate two datasets with mode composition of l = 1, 3, 5, 7, 9 and l = 1, 5, 9, respectively. We tested the CNN trained by the dataset with mode composition of l = 1, 2, 3, 4, 5, 6, 7, 8, 9 on the above two new datasets to investigate the generalizability of the model. The test results are shown in Figure 5, the predicted values of mode weights basically agree with the actual values, indicating that CNN has good generalization ability for the associated unknown dataset, which can reduce the burden of device level and transfer it to the data processing category. This demonstrates the application value of the dual output CNN-based approach for modal analysis.
dataset with mode composition of l = 1, 2, 3, 4, 5, 6, 7, 8, 9 on the above two new datasets to investigate the generalizability of the model. The test results are shown in Figure 5, the predicted values of mode weights basically agree with the actual values, indicating that CNN has good generalization ability for the associated unknown dataset, which can reduce the burden of device level and transfer it to the data processing category. This demonstrates the application value of the dual output CNN-based approach for modal analysis. Our model is trained with images when the propagation distance is zero, but it is also suitable for modal analysis of samples with non-zero propagation distance. We used the model to test samples with different propagation distances and different mode combinations, and the results are shown in Figure 6. The weight error increases with the increase in beam propagation distance, but even for the most complex 9 modes multiplexed beams, the weight error of the model is only 5.6 × 10 −3 after propagating 120 m, indicating that our model can support the mode analysis work of multimode LG beams within a certain distance. It is worth mentioning that the prediction accuracy of CNN can be improved if the data samples after propagating a certain distance can be added to the data set. Optics 2021, 2, 9 92 dataset with mode composition of l = 1, 2, 3, 4, 5, 6, 7, 8, 9 on the above two new datasets to investigate the generalizability of the model. The test results are shown in Figure 5, the predicted values of mode weights basically agree with the actual values, indicating that CNN has good generalization ability for the associated unknown dataset, which can reduce the burden of device level and transfer it to the data processing category. This demonstrates the application value of the dual output CNN-based approach for modal analysis. Our model is trained with images when the propagation distance is zero, but it is also suitable for modal analysis of samples with non-zero propagation distance. We used the model to test samples with different propagation distances and different mode combinations, and the results are shown in Figure 6. The weight error increases with the increase in beam propagation distance, but even for the most complex 9 modes multiplexed beams, the weight error of the model is only 5.6 × 10 −3 after propagating 120 m, indicating that our model can support the mode analysis work of multimode LG beams within a certain distance. It is worth mentioning that the prediction accuracy of CNN can be improved if the data samples after propagating a certain distance can be added to the data set. Our model is trained with images when the propagation distance is zero, but it is also suitable for modal analysis of samples with non-zero propagation distance. We used the model to test samples with different propagation distances and different mode combinations, and the results are shown in Figure 6. The weight error increases with the increase in beam propagation distance, but even for the most complex 9 modes multiplexed beams, the weight error of the model is only 5.6 × 10 −3 after propagating 120 m, indicating that our model can support the mode analysis work of multimode LG beams within a certain distance. It is worth mentioning that the prediction accuracy of CNN can be improved if the data samples after propagating a certain distance can be added to the data set. Optics 2021, 2, 9 93 Figure 6. The relation between mode weight error and distance.
The performance of neural networks can also be affected by noise factors. We test on a dataset containing random noise to investigate the robustness of the model to noise. Each pixel value in the optical intensity profile image is multiplied by a factor f = 1 + N 0, 1 ·σ to generate the noisy image dataset, where N(0, 1) is the standard normal distribution and σ is the noise intensity [18]. As shown in Figure 7a, the images of the optical intensity profile of the 9-mode superimposed LG beams propagated for 100 m become gradually blurred with increasing σ values, and we also have selected a local region of the image to show this change in more detail. As shown in Figure 7b, the error value of the model prediction and the slope of the curve increase with the increasing noise intensity, and when the noise intensity reaches 0.12, the weighting error is still less than 1.4 × 10 −2 . It should be noted that it is difficult to reach this level of noise intensity in real situations [18], and the results in Figure 7b confirm that our model has a strong noise immunity.  The performance of neural networks can also be affected by noise factors. We test on a dataset containing random noise to investigate the robustness of the model to noise. Each pixel value in the optical intensity profile image is multiplied by a factor f = 1 + N(0, 1)·σ to generate the noisy image dataset, where N(0, 1) is the standard normal distribution and σ is the noise intensity [18]. As shown in Figure 7a, the images of the optical intensity profile of the 9-mode superimposed LG beams propagated for 100 m become gradually blurred with increasing σ values, and we also have selected a local region of the image to show this change in more detail. As shown in Figure 7b, the error value of the model prediction and the slope of the curve increase with the increasing noise intensity, and when the noise intensity reaches 0.12, the weighting error is still less than 1.4 × 10 −2 . It should be noted that it is difficult to reach this level of noise intensity in real situations [18], and the results in Figure 7b confirm that our model has a strong noise immunity. The performance of neural networks can also be affected by noise factors. We test on a dataset containing random noise to investigate the robustness of the model to noise. Each pixel value in the optical intensity profile image is multiplied by a factor f = 1 + N 0, 1 ·σ to generate the noisy image dataset, where N(0, 1) is the standard normal distribution and σ is the noise intensity [18]. As shown in Figure 7a, the images of the optical intensity profile of the 9-mode superimposed LG beams propagated for 100 m become gradually blurred with increasing σ values, and we also have selected a local region of the image to show this change in more detail. As shown in Figure 7b, the error value of the model prediction and the slope of the curve increase with the increasing noise intensity, and when the noise intensity reaches 0.12, the weighting error is still less than 1.4 × 10 −2 . It should be noted that it is difficult to reach this level of noise intensity in real situations [18], and the results in Figure 7b confirm that our model has a strong noise immunity.

Conclusions
In summary, we propose a dual-output CNN mode analysis method that can quickly and accurately predict the mode weights and phase information of multimode LG beams simultaneously. The trained CNN can process a single input intensity image in less than 1 ms, and can also achieve more accurate predictions even for correlated unknown datasets and noise-disturbed samples. The performance of the model demonstrates that our method is accurate, robust and fast, which can reduce the burden of device level. In addition, our method may be applicable to the mode analysis of other OAM beams, such as the Bessel beam, which indicates that our method might be of general value to the practical application of OAM beams to optical communications.