Global Receptive Field Designed Complex-Valued Convolutional Neural Network Equalizer for Optical Fiber Communications

: In this paper, an improved complex-valued convolutional neural network (CvCNN) structure to be placed at the received side is proposed for nonlinearity compensation in a coherent optical system. This complex-valued global convolutional kernel-assisted convolutional neural network equalizer (CvGNN) has been verified in terms of Q-factor performance and complexity compared to seven other related nonlinear equalizers based on both the 64 QAM experimental platform and the QPSK numerical platform. The global convolution operation of the proposed CvGNN is more suitable for the calculation process of perturbation coefficients, and the global receptive field can also be more effective at extracting effective information from perturbation feature maps. The introduction of CvCNN can directly focus on the complex-valued perturbation feature maps themselves without separately processing the real and imaginary parts, which is more in line with the waveform-dependent physical characteristics of optical signals. Based on the experimental platform, compared with the real-valued neural network with small convolutional kernel (RvCNNC), the proposed CvGNNC improves the Q-factor by ∼ 2.95 dB at the optimal transmission power, while reducing the time complexity by ∼ 44.7%.


Introduction
Driven by various emerging internet services, the quantity of global data has exploded, and there is an increasing demand to perform massive amounts of data transmission and processing.For efficient use of spectrum resources, information is generally modulated in a M-ary quadrature-amplitude (M-QAM) modulation format, which is extremely susceptible to serious nonlinear impairments [1].Moreover, long-distance transmission causes serious nonlinearity accumulation [2].Digital backpropagation (DBP) based on split-step Fourier transform (SSFT) is an effective method for nonlinearity compensation (NLC), but it requires considerable computational resources [3,4].
Neural network (NN) algorithms have achieved excellent performance in many areas of science and technology.Various structures have been proposed and implemented in optical fiber communication systems, which have experimentally demonstrated that NNs can effectively map end-to-end relationships because of their excellent ability to fit the linear and nonlinear transform between the input and output.Ref. [5] proposed a NN equalizer based on the perturbation theory and selected triplets as input features and perturbation terms as NN output.Ref. [6] compared several NN equalizers and proved that symbol sequences can be treated as time series, so nonlinear tasks can be treated as classification or regression tasks for time series in recurrent neural network (RNN).In our previous work [7], we regarded the triplets as feature units (FU) and constructed a dual-channel feature map.We also proposed using real-valued convolutional network classifier (RvCNNC) to process these feature maps and complete the task of nonlinearity equalization.However, it is well known that digital signals are usually represented in plural form for optical signal processing in optical fiber communication systems.Most existing NNs are real-valued frameworks that ignore the correlation between the real and imaginary components of complex signals [8].A comprehensive design of a complexvalued neural network (CvNN) incorporating phase information in the research process was proposed, where the network parameters and the backpropagation algorithm were extended into the complex domain [9,10].Applying CvNN simplifies waveform signal processing, and CvNN is a compatible model for wave-related information processing, as it can simultaneously deal with phase and amplitude components [11].In [12], a CvNN was used for four-level pulse amplitude modulation (PAM-4) coherent detection.This approach achieves a better bit error rate (BER) performance than a real-valued neural network (RvNN) equalizer.
Recently, CNNs have faced challenges caused by vision transformers (ViTs) in many tasks [13,14].To break these bottlenecks, scholars have proposed increasing the size of the effective receptive field (ERF) to improve the system performance.A global convolutional neural network (GNN) using a large convolutional kernel is proposed in [15] to eliminate the influence of shape bias and to establish a closer connection between the input feature map and the output classification result.The global convolution kernel has been proposed in [16], and the scholars have proven that the GNN performance can be greatly improved by expanding the size of the convolutional kernel and ERF.
Our novel contributions are summarized as follows: • Complex-valued input feature map: For input data, we reconstruct a complex-valued single channel feature map from received symbols on the basis of perturbation theory.

•
Equalizer design: In our proposed CNN equalizer [7], all FUs and FU positionrelated information is interrelated and essential.Thus, the equalizer in this paper was designed based on two aims: 1.To design a convolution kernel with a global receptive field; 2. to apply the global kernel into the complex-valued convolutional neural network (CvCNN).For the output of classifiers, we set 64-class classification labels for the received symbols, and for the output of regressors, we set difference values between the received and transmitted symbols.Based on different output data types and loss functions, we can build a nonlinear equalizer consisting of a classifier and a regressor.

•
Experimental result validation: We built a 120 Gb/s polarization division multiplexing (PDM) 64QAM experimental platform with 375 km transmission distance.We evaluate our algorithm based on two aspects, the Q-factor performance and the complexity performance.In coherent optical fiber communication systems, we estimate the time complexity based on the number of floating point multiplications (FLOPs) to equalize one symbol and ignore other operations with lower impact.Moreover, the space complexity is depicted as the number of parameters required to implement the NN model.
The rest of our paper is organized as follows.In Section 2, we discuss the basic principle of perturbation theory, the structure of FUs, and the structure of the CvGNN.In Section 3, the configuration of the coherent optical fiber communication system is presented.In Section 4, the experimental results are given, along with an analysis of the performance.In Section 5, the complexity comparison between different equalizers is discussed.Finally, we conclude the paper in in Section 6.

Theoretical Analysis
In this section, we review the feature map construction process by applying the perturbation theory.And based on the inherent symmetry of perturbation terms, we produce a spatial folding method to reduce the dimensionality of feature maps.Then, a CNN structure based on dimensionally reduced feature maps is proposed for nonlinear equalization.

Feature Map Construction
In PDM optical fiber communication systems, the continuous signals can be denoted as T , where z and t represent the transmission distance and time, respectively.In optical fiber transmission link, the ⃗ Q(z, t) follows the Manakov equation as [17]: where α, β 2 and γ refer to the linear loss, group velocity dispersion, and nonlinear Kerr coefficient in the fiber optic links, respectively.The optical signal at the transmitting side can be expressed as follows: where T denotes the amplitude of the k th symbol, T s refers to the duration of the symbol, and g denotes the waveform of a carrier pulse.Based on the first-order perturbation theory and the assumption of large dispersion in the optical fiber, the nonlinear impairments can be assumed to be perturbation terms.If the received symbol sequence is expressed as can be obtained as follows: where m, n and m + n are symbol indexes with respect to the k th symbol.The perturbation term is the vector dot product of − − → FU n m and perturbation coefficient C m,n , and C m,n can be calculated by the system link parameters using the following Equations (4a)-(4c) [18]: where τ is the pulse width and E 1 (x) represents the exponential integral function.Figure 1a shows the normalized amplitude of the perturbation coefficient at different m and n, and S is the maximum value of m and n.As shown in Figure 1b,c, we organize many different FUs to complete a feature map with two channels, and the space position relationships between different FUs are preserved.Moreover, in order to better preserve the inherent connection between the real part and imaginary part, we combined the two channels to form a complex-valued single-channel feature map as shown in Figure 1d.In addition, we propose using a CvCNN to classify a complex-valued single-channel feature map and complete the nonlinear equalization of corresponding symbols.Later in the article, we use C to indicate a value that corresponds to the complex domain.Figure 2 shows the structure of the CvCNN.The convolution operation of CvCNN is the same as that of RvCNN.The difference is that all parameters are in the complex-valued domain, and the operations satisfy the complex-valued operations.The network input for complex neuron is X = X R + iX I , where X R and X I represent the real and imaginary components of X, the weight of the convolutional kernel is ω = ω R + iω I , and the bias is b = b R + ib I .So the corresponding complex neuron output is ...

Complex-valued Output
Complex-valued Loss Fuction

Complex-valued Weights Update Fully Complex-valued Back Propagation
Channel-1

Channel-l
Complex-valued fully-connected layers The nonlinear activation function can introduce nonlinearity to the affine transformations in neural networks.In this paper, we adopt the complex-valued Split-type A activation function [19], which means we activate the real and imaginary components, respectively.This can be denoted as shown in the following equation: where Z is the final output.The most commonly used nonlinear activation functions in RvNN are Relu, Leaky Relu, Elu, and Tanh.Thus, in this paper, we adopt C Relu, C Leaky Relu, C Elu, and C Tanh as activation functions to verify the effect of the CvNN nonlinear equalizer.In this paper, classification tasks are discussed, and the CSoftMax function is used to map the probability of each category, which can be expressed as follows: where y j is the output value of the j-th (i = 1, 2, 3, . . ., C) neural and C is the number of classifications in the output layer.When training a CvNN, the error is backpropagated from the output layer to the input layer using fully complex-valued gradient descent.Given the complex-valued output in polar form z j = r j e iθ j ∈ C in the CvNN, the complex-valued cross-entropy loss function can be expressed as follows [20]: where y jk denotes k-th element of the one-hot encoded label.The process of learning with complex domain backpropagation is similar to the learning process in the real domain, and involves finding the optimal weights ω to minimize the loss.The loss calculated after the forward pass is backpropagated to each neuron in the network, and the weights are adjusted in the backward pass.And in CvNN, ω can be updated by following Equation [21]: where η l is the learning rate at the l-th iteration, and ∇ ω * E defines the direction of the maximum rate of change with respect to ω * .

Global Convolutional Kernel in CNN
In the CNN, the convolutional layer is generally used to automatically extract the visual features of the feature map for filtering operations.The size of the convolution kernel defines the size range of the convolution, representing the size of the receptive field which refers to the range of input figure processed simultaneously in the network.Therefore, a global convolutional kernel can enhance ERF and can obtain more feature information simultaneously.Figure 3 shows the ERF ranges of different convolutional kernels, as well as different structures of CNNs using normal convolutional kernels and global convolutional kernels.
In the Figure 3a, by adopting a normal kernel design, we can obtain 1 × 1 output maps (OMs) after multiple layers of convolution, which can be denoted as where I M (m,n) represents the input feature map, m and n represent the pixel index, kernel-1 is the convolution kernel for convolution layer-1, and kernel-2 is the convolution kernel for convolution layer-2.In the Figure 3c, it can be seen that compared with the normal kernel, only one-layer convolution is needed in GNN, which can be denoted as The global kernel has a global receptive field (GRF), which can control the entire map information and better capture the correlation between pixels in the feature map and the boundary information.Furthermore, GNN is more similar to the calculation process of perturbation term, and it is more meaningful to use GNN for nonlinear compensation.The structure of CNN is shown in Figure 3b, whose IM is 11 × 11.A double convolution layer with 6 × 6 convolution kernel and batch normalization (BN) is adopted in CNN.In this paper, a global convolution kernel in Figure 3d whose size is equivalent to the size of IM is adopted because the GNN can more efficiently extract useful information without affecting the system's performance.So, under the premise of balancing performance and complexity, the IM and global convolutional kernel sizes in GNN can be set as the optimized value 9 × 9.

Experimental Setup
Figure 4 depicts the experimental setup of the 120 Gb/s PDM 64 QAM coherent optical communication system.At the transmitter, two pseudo-random bit sequences (PRBS) are generated by MATLAB, and the sequences are combined to construct a strong random sequence that will not be learned by the NN or other advanced algorithms.Additionally, the data pattern used in the training and testing datasets has a maximum of 0.5% normalized cross-correlation to ensure the independence of the data [22].Then, the data map to 64QAM and are loaded into an arbitrary waveform generator (AWG) with a sampling rate of 25 GSa/s.The I-channel and Q-channel signals are amplified by electric amplifiers (EAs) and then sent to the I/Q modulator with an external cavity laser (ECL).The PDM module consists of a polarization retention optical coupler (PM-OC), optical delay line (DL), polarization controller (PC), and polarizing beam combiner (PBC) to achieve polarization multiplexing.A variable optical attenuator (VOA) is used to adjust the power of the optical signal.A 375 km (5 × 75) standard single-mode optical fiber (SSMF) with a span of 5 is adopted, and at each end of the span, an Erbium-doped fiber amplifier (EDFA) is used to compensate for the linear loss.At the receiver, coherent detection technology is applied, and an ECL with a 100 kHz linewidth is used as the local oscillator (LO).Two PBSs are used to separate the polarization of the optical signal and LO.The X-polarization of the optical signal and LO is mixed by the 90 • optical hybrid and detected by a balanced photonic detector (BPD).After that, two electric signals are obtained, including the X-polarization I component (X-I) and Q component (X-Q).Similarly, for the Y-polarization direction, Offline digital signal processing (DSP) is applied to improve the signal quality.In order to better improve the overall quality of the signal, linear equalization is performed to repair the signal, and then a nonlinear equalization algorithm is adopted to enable it to learn and compensate for nonlinear damage more cleanly.Linear compensation mainly includes low-pass filter (LPF), I/Q imbalance compensation, chromatic dispersion (CD) compensation, clock recovery, polarization demultiplexing, polarization mode dispersion (PMD) compensation, frequency offset estimation (FOE), and carrier phase recovery (CPR).And the CvGNN equalizer is applied to achieve nonlinearity compensation.
The CvGNN equalizer is built, trained, and evaluated in PyTorch 3.8.1.The personal computer platform owns an AMD Ryzen7 CPU @ 2.90 GHz, and the Random Access Memory (RAM) is 16 GB.In our model, the Kaiming initialization method is applied to initialize initial weights [23], and the complex-valued Adam optimizer is employed to optimize the CvGNNC.When the output data type is labeled and the loss function is complex-valued cross-entropy, the equalizers act as classifiers.When the output data type shows the different values between the received and transmitted symbols, and the loss function is complex-valued mean square error, the equalizers act as regressors.The datasets for each LOP contain approximately 2 20 symbols, and we divided them into 70% for training and 30% for testing.The maximum training epochs are set to 1000, the initial learning rate is set to 0.003, and every 30 epochs, the learning rate drops to 90% of the original rate to prevent the learning from falling into the overfitting state.

Results and Analysis
As mentioned above, the activation functions are essential for our NN equalizers.Taking the 1 dBm LOP as an example, we compared and verified the system Q-factor with four different activation functions using a CvGNN equalizer, as shown in Figure 5.The abscissa represents the epochs: an epoch occurs when all the training data are sent to the NN for training once.The ordinate is the Q-factor, which can better distinguish the system performance when BER is low and can be calculated by the BER using the following equation: From the Figure 5, we can determine that although C Tanh has the fastest convergence speed, it performs poorly, and C Leaky Relu performs well.
Therefore, we choose the C Leaky Relu as the nonlinear activation function in this paper.Based on the optimal parameters, we compare multiple NNs, including the system Q-factor performance under same time complexity, and the complexity observed when the Q-factor performance is achieved is similar.The specific structures of eight equalizers are displayed in Figure 6, and the Q-factor performance is shown in Figure 7.In Figure 6a-f, the NN input is a feature map based on the perturbation theory.In Figure 6g,h, the NN input is the symbol sequence in which the nonlinearity equalization problem appears to be a time sequence problem.Figure 6a,e are the CvGNN classifier (CvGNNC) and regressor (CvGNNR) based on the global kernel design, the number of channels is 160, and the number of hidden layer neurons is 20. Figure 6b,f are the RvGNN classifier (RvGNNC) and regressor (RvGNNR) based on the global kernel design; the number of channels is 256; and the number of hidden layer neurons is 90. Figure 6c,d   Figure 7 presents the performance of eight nonlinear equalizers based on the different NN structures mentioned in Figure 6, which is expressed as Q-factor performance, and the LOP is in the range of −4 dBm to 5 dBm.In Figure 7a, the performance of CvGNNC is compared with that achieved after chromatic dispersion compensation (CDC), proving that using a nonlinear equalizer at the receiver end can significantly improve the Q-factor performance of the system.As shown in Figure 7b, when LOP is 1 dBm, the best Q-factor of CvGNNC is 8.98 dB, which is 1.15 dB higher than RvGNNC, and CvCNNC also performs better than RvCNNC.It is proven that with the same time complexity, the CvNN system's performance is better than that of RvNN because CvNN has greater advantages for the complex-valued operation of complex-valued perturbation characteristics.Furthermore, by using CvGNNC, we achieve 2.31 dB and 2.95 dB Q-factor improvement compared with CvCNNC and RvCNNC.It is proven that adopting a global kernel with GRF can extract FUs and their relationship information more efficiently.Therefore, a perturbation theory-aided CvGNNC can have a better compensation effect.Figure 7c proves that NN using the feature map constructed by perturbation theory more easily fits the relationship between input and output than the NN using front-and back-linked symbols as input features, both of which are CvNNs.When the time complexity of CvFNNC-2 is 3.5 times that of CvGNNC, its system performance is consistent with that of CvGNNC.When the complexity of the two NNs is the same, the Q-factor of CvGNNC is 0.60 dB higher than CvFNNC-1 at an LOP of 1 dBm.Moreover, the performance of CvFNNC-1 is 0.55 dB higher than that of RvGNNC, which further confirms the superiority of the CvNN in the optical fiber communication system.Figure 7d shows that for CNN, the performance of classifiers is better than that of regressors, which also proves that CvCNN is better than RvCNN for same pattern recognition tasks.The application of NN classifiers in the nonlinear equalization of optical fiber communication should be more extensive.
Additionally, as shown in Figure 8, we provide a 130 GBaud, 1200 km DP-QPSK simulation setup to corroborate the results presented in this paper.The optical fiber channel simulation is based on the split-step Fourier method (SSFM) and is implemented by MATLAB 2020a.The dispersion, nonlinear effect, and phase noise are added, and the optical signal-to-noise ratio (OSNR) is set at 30 dB.The optical transmission link is 1200 km SSMF with 20 spans, and each span incorporates an EDFA to fully compensate for linear impairments.The comprehensive simulation parameters refer to the actual optical fiber parameters, as shown in Table 1.The offline DSP is consistent with the experimental system.Simulation results are shown in the Figure 9, the 1 step-per-span (SPS) DBP and 50 SPS DBP are performed for comparison with the proposed nonlinearity equalizer CvGNNC.It is evident that CvGNNC outperforms the 1 SPS DBP, achieving a 0.97 dB improvement in the Q-factor at the LOP of 0 dBm.Furthermore, when LOP ranges from −2 dBm to 1 dBm, the performance of CvGNNC is comparable to that of 50 SPS DBP.However, within the linear range, the CvGNNC outperforms 50 SPS DBP.Thus, unlike DBP algorithm, NNs are employed to fit end-to-end nonlinear models, enabling CvGNNC not only to balance nonlinear impairments but also to address residual linear impairments during the equalization process.Moreover, NNs offer the advantage of avoiding computationally intensive processes and do not rely on extensive or precise channel knowledge.Many algorithms have been proposed to simplify DBP-based equalizers, yet their complexity remains higher than that of NN-based equalizers [24][25][26][27].In the same way, based on fair equalization performance, the CvGNNC exhibits significantly lower computational complexity compared to the multi-step DBP-based equalizer.Obviously, these results robustly validate the conclusions articulated in this paper.

Complexity Comparison Discussion
In this section, we analyze the proposed CvGNNC with other nonlinear equalizers in terms of time complexity and space complexity.The time complexity determines the training or prediction time of the model, and depicted as the number of FLOPs required to equalize one symbol.The space complexity is closely related to the model capacity, which can be determines the number of parameters network required.
The complexity of the CNN model is mainly concentrated in the convolutional layer and the fully connected layer.The complexity of the fully connected neural network is composed of multiple fully connected layers.Equation ( 13) defines the time complexity measured by the number of FLOPs, and the space complexity measured by the number of parameters, which are defined as N F_conv and N P_conv in the convolutional layer, and those in the fully connected layer are defined as N F_ f ully and N P_ f ully .And in the i th convolutional layer, the C Ii represents the number of input channels, C Oi represents the number of output channels, M i represents the size of the feature map, and K i represents the size of the convolution kernel.Meanwhile, in the j th fully connected layer, F I j and F Oj represent the number of input and output neurons, while NC and NF represent the number of layers in the convolutional layer and fully connected layer, respectively.The specific calculation processes are as follows: In CvNN, the data, weight and activation function are all located in the complex field, and their operations are in the complex domain.It is known that one complex-valued FLOP is equivalent to four real-valued FLOPs.Therefore, FLOPs in the CvNN should be multiplied by four times on the basis of the same RvNN structure.In the CvNN, the parameters are different for the real and imaginary components, so when calculating the network parameters in CvNN, it should be multiplied twice on the basis of the same RvNN structure.In this paper, because the number of FLOPs introduced by addition can be ignored, we consider only the number of operations provided by multiplication when calculating the complexity.
In Table 2, we list the calculation formulas of the time complexity and space complexity of multiple NNs related to the CvNN under the same performance, as well as their actual values.We can see in the table that when we calculate the time and space complexity of CvGNNC, CvGNNR, CvCNNC, and CvFNN, we multiply them by four and two times on the basis of the normal calculation.As shown in Table 1, in the CvGNNC, the number of FLOPs required is 6.97 × 10 4 , and the number of parameters required is 3.35 × 10 4 .Compared with RvGNNC, the time complexity and space complexity decrease by 16.3% and 16.2%, respectively.Moreover, the number of parameters required for CvCNNC is 46.5% lower than that of RvCNNC, while the number of FLOPs required is higher than that required for RvCNNC.This indicates that CvNN may introduces greater time complexity, but the application of GNN in CvNN can effectively suppress this phenomenon.In addition, we can see that compared with CvCNNC, 57.7% time complexity reduction and 22.0% space complexity reduction are obtained by CvGNNC; compared with RvCNNC, the time complexity and the space complexity of RvGNNC are reduced by 33.9% and 50.2%, respectively, which proves the superiority of the CvGNN in reducing complexity.From the figure, it can be concluded that, for the same performance, the time complexity and space complexity of CvFNNC-2 are much higher than that of our proposed CvGNNC.This also proves that the data obtained by constructing the feature map more easily fit the relationship between the input and output terminals.Additionally, we can easily see that when the performance is the same, the time and space complexity required by regressors is slightly higher than that of classifiers.Therefore, in the application of optical fiber communication systems, the classifiers are better than the regressors.

Conclusions
In this paper, we propose an equalization technique using a CvGNN at the receiver of optical fiber communication systems.Based on the perturbation theory, we construct a complex-valued single-channel feature map as input to make it more suitable for complexvalued neural networks.We expand the convolution kernel to be equivalent to the feature map to expand the ERF and reduce the model depth and then trade off the performance and complexity to obtain the best parameters.Based on the optimal parameters, we select the CvGNNC, CvGNNR, RvGNNC, RvGNNR, CvCNNC, RvCNNC, and CvFNNC to compare the equalization performance and the equalization complexity.We find that at the same time complexity, a global convolutional kernel structure can further improve the performance compared with a normal convolutional kernel structure, and CvNNs are proven to be more suitable for optical fiber communication signal processing than RvNNs.In the same

Figure 1 .
Figure 1.The construction method for input features.

Figure 3 .
Figure 3. Diagram of the different size of ERF and the different structure of CNN.

Figure 5 .
Figure 5. Q-factor trace of CvGNNC with different activation functions.

Figure 6 .
Figure 6.Structural design of different nonlinear equalizers.
are the CvCNN classifier (CvCNNC) and RvCNN classifier (RvCNNC) based on the normal kernel.Figure 6g,h are the complex-valued fully connected NN classifiers (CvFNNC).The CvFNNC-1 shown in Figure 6g has a total of 171 symbols at the input side with a hidden layer of 78 neurons, and the time complexity is equal compared with other equalizers; CvFNNC-2, shown in Figure 6h, has a hidden layer of 260 neurons, and its time complexity is 3.5 times that of other NNs.Q-factor(dB) (a) CvGNN vs. CDC (b) CNN vs.GNN (c) CvFNN vs. CvGNN (d) Classifier vs. Regressor

Figure 7 .
Figure 7. Nonlinear equalization performance of different neural networks with the same time complexity.

Figure 10
Figure10intuitively compares the space and time complexity.From the figure, it can be concluded that, for the same performance, the time complexity and space complexity of CvFNNC-2 are much higher than that of our proposed CvGNNC.This also proves that the data obtained by constructing the feature map more easily fit the relationship between the input and output terminals.Additionally, we can easily see that when the performance is the same, the time and space complexity required by regressors is slightly higher than that of classifiers.Therefore, in the application of optical fiber communication systems, the classifiers are better than the regressors.

Figure 10 .
Figure 10.The computational complexity of different NNs, including time and space complexity.

Table 2 .
Complexity calculation process of different equalizers.