Numerical Study of Parallel Optoelectronic Reservoir Computing to Enhance Nonlinear Channel Equalization

: Nonlinear impairment is one of the critical limits to enhancing the performance of high-speed communication systems. Traditional digital signal processing (DSP)-based nonlinear channel equalization schemes are inﬂuenced by limited bandwidth, high power consumption, and high processing latency. Optoelectronic reservoir computing (RC) is considered a promising optical signal processing (OSP) technique with merits such as large bandwidth, high power efﬁciency, and low training complexity. In this paper, optoelectronic RC was employed to solve the nonlinear channel equalization problem. A parallel optoelectronic RC scheme with a dual-polarization Mach– Zehnder modulator (DPol-MZM) is proposed and demonstrated numerically. The nonlinear channel equalization performance was greatly enhanced compared with the traditional optoelectronic RC and the Volterra-based nonlinear DSP schemes. In addition, the system efﬁciency was improved with a single DPol-MZM.


Introduction
Along with substantial advances in novel internet and wireless applications, higher speed and larger capacity requirements have been put forward for communication systems. Therefore, nonlinear effects in communication systems have become common and more serious such as nonlinear electro-optical conversion in high-speed visible light communication [1], nonlinear distortion in radio frequency power amplifiers commonly used in the wireless communications [2], and signal distortions caused by various optical devices in optical fiber communications [3].
The nonlinear channel equalization method plays an essential role in recovering data from communication systems by compensating for nonlinear impairment. At present, nonlinear channel equalizations are mainly based on digital signal processing (DSP)-based methods and machine learning-based methods. Compared with mathematical modelbased DSP methods, such as Volterra filtering [4], machine learning-based methods, such as artificial neural networks (ANNs), use the connection-based model to simulate the activity of biology neurons, and it has apparent advantages in processing nonlinear data [5]. Recently, ANN-based machine learning schemes have been demonstrated with enhanced channel equalization performance compared with traditional DSP methods for high-speed communications [6]. Moreover, the feedback ANNs showed strong memory capacity and good optimization solving ability, which can be used to solve the timing-dependent nonlinear channel equalization problem. However, limitations also exist, for example, the connection weights in the network are difficult to train, the computation complexity is large, and the convergence speed is slow. Meanwhile, the disadvantage of fading memory [7] leads to inadequate modeling of nonlinear impairment.

Basic Concepts of Reservoir Computing
As a simplified feedback ANN, RC has a three-layer structure: input layer, output layer, and middle layer as shown in Figure 1. Its core lies in a nonlinear delay feedback loop, which is mentioned as the typical RC in the following content. According to the concept of time-division multiplexing [8], virtual nodes are set on the delay loop, and the delay τ is equally divided into N parts. When the input signal is fed into the nonlinear element, the nonlinear element generates a transient response under the combined action of the current input and the remaining virtual nodes on the feedback loop, so the virtual nodes have a wide variety of states. The output of the RC is given by the weighted linear (W out ) combination of virtual node states. The virtual node state provides a nonlinear mapping of the input to the high-dimensional space. Each virtual node is equivalent to a neuron, so when the size of reservoir N increases, the performance is enhanced at the expense of increased calculation time and complexity. The time duration of each part is θ, which is the virtual node interval. The function of θ is the time interval of collecting RC node state and related signal pre-processing. According to the above analysis, its state equation is as Equation (1), where x(n) is the state function, u(n + 1) is the input signal, and bias is an offset to make RC work in the nonlinear range of excitation function f (.).
Photonics 2021, 8, x FOR PEER REVIEW 3 of increased calculation time and complexity. The time duration of each part is θ, whi the virtual node interval. The function of θ is the time interval of collecting RC node s and related signal pre-processing. According to the above analysis, its state equation Equation (1), where x(n) is the state function, u(n + 1) is the input signal, and bias i offset to make RC work in the nonlinear range of excitation function f (.  Win is the input connection weight matrix, Wres is the int connection weight matrix, Wout is the output connection weight matrix, τ is the delay time, x the state function, and NL is the nonlinear node.

Proposed Scheme of Optoelectronic Reservoir Computing
The internal dynamics of the optoelectronic reservoir are enriched by the propo parallel optoelectronic RC scheme using optical polarization multiplexing methodol supported by a novel double-loop scheme with a dual-polarization Mach-Zehnder m ulator (DPol-MZM). The structure is shown in Figure 2. A continuous-wave laser is used as the input of the DPol-MZM. The optical ca passes through the optical polarization controller (PC) into two orthogonal beams. I Figure 1. Schematic diagram of RC. W in is the input connection weight matrix, W res is the internal connection weight matrix, W out is the output connection weight matrix, τ is the delay time, x(n) is the state function, and NL is the nonlinear node.

Proposed Scheme of Optoelectronic Reservoir Computing
The internal dynamics of the optoelectronic reservoir are enriched by the proposed parallel optoelectronic RC scheme using optical polarization multiplexing methodology, supported by a novel double-loop scheme with a dual-polarization Mach-Zehnder modulator (DPol-MZM). The structure is shown in Figure 2. of increased calculation time and complexity. The time duration of each part is θ, which is the virtual node interval. The function of θ is the time interval of collecting RC node state and related signal pre-processing. According to the above analysis, its state equation is as Equation (1), where x(n) is the state function, u(n + 1) is the input signal, and bias is an offset to make RC work in the nonlinear range of excitation function f (.  Win is the input connection weight matrix, Wres is the internal connection weight matrix, Wout is the output connection weight matrix, τ is the delay time, x(n) is the state function, and NL is the nonlinear node.

Proposed Scheme of Optoelectronic Reservoir Computing
The internal dynamics of the optoelectronic reservoir are enriched by the proposed parallel optoelectronic RC scheme using optical polarization multiplexing methodology, supported by a novel double-loop scheme with a dual-polarization Mach-Zehnder modulator (DPol-MZM). The structure is shown in Figure 2. A continuous-wave laser is used as the input of the DPol-MZM. The optical carrier passes through the optical polarization controller (PC) into two orthogonal beams. It enters into two arms of the DPol-MZM (e.g., Fujitsu FTM7980), and then the intensities of A continuous-wave laser is used as the input of the DPol-MZM. The optical carrier passes through the optical polarization controller (PC) into two orthogonal beams. It enters into two arms of the DPol-MZM (e.g., Fujitsu FTM7980), and then the intensities of these two optical carriers are modulated by the voltage related to the input signal. The state function x(n) is collected and used as the output of RC. Then, it is injected into two feedback loops with different delays and different optical attenuation coefficients. The up and down loops have various delays corresponding to τ 1 and τ 2 in the figure. The virtual node interval θ is equal, so the number of virtual nodes N between the two loops is different. By setting the delays in the two loops differently, the virtual nodes of the two loops are nonsymmetrically coupled, and the internal dynamics of the reservoir are enriched. Setting different light attenuation coefficients also enriches the internal dynamics. Next, the optical signals of these two loops are converted into electrical signals by photodetectors (PDs), and they are combined with the pre-processed input signals to feedback to the two arms of the DPol-MZM, forming a parallel optoelectronic RC scheme with two delay feedback loops. The combiner is actually the coupler in the implementation of the scheme. Here, the polarization is not controlled in the loop, and the optical polarization orthogonality is controlled inside the DPol-MZM.
A delay differential equation model is used to analyze the system. The dynamic equation is shown as follows: where x(n) is the state function, τ H is the time constant of the high-pass filtering effect of the loop, τ L is the time constant of the low-pass filtering effect of the loop, and β is the normalized feedback coefficient. The function of α 1,2 is equivalent to the internal connection weight matrix of the neural network, τ 1,2 is the two-loop delays, γ is the scaling parameter for the input signal, W in is the input connection weight matrix, u(n) is the input signal, ϕ 1,2 is the bias voltage of the two arms of DPol-MZM, and V π is the half-wave voltage of the modulator. The typical values of the fixed parameters in the system are presented in Table 1. Additional parameters are optimized in the numerical simulations to optimize the nonlinear channel equalization performance.

Input Signal Processing
The pre-processing of the input signal is shown in Figure 3, which consists of sampling and holding. Here, the sampling period was set to be equal to τ 1 . A sampling period is divided into N = 50 parts, the duration of each part is θ, and then it is multiplied by a random mask size. To reduce the requirement of the sampling rate of AWG (arbitrary waveform generator) and facilitate the realization of the system, we adopted a slowchanging mask method [17] as a pre-processing method for RC that divides the period into larger parts. Here, the sampling period was divided into N/10 parts so that the duration of each part was 10*θ, and the value of the mask was discrete values of ±1. However, if the large 10*θ were to be selected, the system would reach steady state after each virtual node spacing. Therefore, choosing an appropriate value of θ is very important. The value of θ is discussed later.
waveform generator) and facilitate the realization of the system, we adopted a slowchanging mask method [17] as a pre-processing method for RC that divides the period into larger parts. Here, the sampling period was divided into N/10 parts so that the duration of each part was 10*θ, and the value of the mask was discrete values of ±1. However, if the large 10*θ were to be selected, the system would reach steady state after each virtual node spacing. Therefore, choosing an appropriate value of θ is very important. The value of θ is discussed later.

Numerical Setup and Results
First, we chose the Wiener model to simulate the various nonlinear factors that appeared in the communication system, as shown in Figure 4. The schematic diagram of the model is shown in the figure below; here, d(n) is the input signal, hi are the coefficients of the linear part, and gi are the coefficients of the nonlinear part [18].
Output signal The original d(n) is a four-level pulse amplitude modulation signal (PAM-4) that is a random sequence with a value of {−3, −1, 1, 3} [19]. Here, we set M = 7, i = −2, j = 1, Z = 3, and the specific coefficients are shown in the following formulas. Thus, the input signal first undergoes a linear memory change and is converted to q(n): ( 4) followed by an instantaneous memoryless nonlinearity: where ν(n) is Gaussian white noise, u(n) is the final input signal of the system, and the target of nonlinear channel equalization is to recover u(n) to d(n). Here, the bit error rate

Numerical Setup and Results
First, we chose the Wiener model to simulate the various nonlinear factors that appeared in the communication system, as shown in Figure 4. The schematic diagram of the model is shown in the figure below; here, d(n) is the input signal, h i are the coefficients of the linear part, and g i are the coefficients of the nonlinear part [18]. waveform generator) and facilitate the realization of the system, we adopted a slowchanging mask method [17] as a pre-processing method for RC that divides the period into larger parts. Here, the sampling period was divided into N/10 parts so that the duration of each part was 10*θ, and the value of the mask was discrete values of ±1. However, if the large 10*θ were to be selected, the system would reach steady state after each virtual node spacing. Therefore, choosing an appropriate value of θ is very important. The value of θ is discussed later.

Numerical Setup and Results
First, we chose the Wiener model to simulate the various nonlinear factors that appeared in the communication system, as shown in Figure 4. The schematic diagram of the model is shown in the figure below; here, d(n) is the input signal, hi are the coefficients of the linear part, and gi are the coefficients of the nonlinear part [18].  The original d(n) is a four-level pulse amplitude modulation signal (PAM-4) that is a random sequence with a value of {−3, −1, 1, 3} [19]. Here, we set M = 7, i = −2, j = 1, Z = 3, and the specific coefficients are shown in the following formulas. Thus, the input signal first undergoes a linear memory change and is converted to q(n): ( 2) followed by an instantaneous memoryless nonlinearity: where ν(n) is Gaussian white noise, u(n) is the final input signal of the system, and the target of nonlinear channel equalization is to recover u(n) to d(n). Here, the bit error rate The original d(n) is a four-level pulse amplitude modulation signal (PAM-4) that is a random sequence with a value of {−3, −1, 1, 3} [19]. Here, we set M = 7, i = −2, j = 1, Z = 3, and the specific coefficients are shown in the following formulas. Thus, the input signal first undergoes a linear memory change and is converted to q(n): followed by an instantaneous memoryless nonlinearity: where ν(n) is Gaussian white noise, u(n) is the final input signal of the system, and the target of nonlinear channel equalization is to recover u(n) to d(n). Here, the bit error rate (BER) is used to verify the system's performance, and the performance of the proposed scheme is demonstrated through parameter optimization and comparison with the single-loop optoelectronic RC scheme. The features of the proposed optoelectronic RC scheme and the Wiener model is inherently correlated. First, Equation (4) shows the memory property of the channel, it can be reflected in the delay-loop scheme of the proposed optoelectronic RC. Thus, the optimization of the virtual node time interval (θ) and RC size (N) could influence the memory property of the proposed optoelectronic RC and improve the correlation between the proposed RC scheme and the Wiener model in terms of memory performance. Second, Equation (5) shows the nonlinear and noise property of the channel, it can be reflected in the nonlinear node (DPol-MZM) and output training process in the proposed optoelectronic RC. Thus, the optimization of the parameters of DPol-MZM would change the nonlinear transfer function of it, and this process could influence the nonlinear property of the proposed optoelectronic RC.

Parameters Optimization
In the parameter optimization part, the signal-to-noise (SNR) ratio was set to 16 dB, the training length was 3 × 10 3 PAM-4 signals, and the testing length was 5 × 10 4 PAM-4 signals. Here, we mainly show the effects of the virtual node interval (θ), mask size (γ), feedback coefficient (β), and the ridge regression value (reg) in the RC training process.
As mentioned in Section 2.3, the larger θ contributes to the reduced hardware requirements for implementing the RC scheme. However, if the selected θ is too large, it may exceed the characteristic time scale of the nonlinear node. In that case, the influence of neighboring nodes on the state of the virtual node will be weakened, and the performance of RC will be drastically reduced. As shown in Figure 5, when θ was set between 10 and 80 ps, the BER was relatively low, but when θ was set between 50 and 80 ps, the error was more stable, so θ = 80 ps was superior here.
herently correlated. First, Equation (4) shows the memory property of the channel, i be reflected in the delay-loop scheme of the proposed optoelectronic RC. Thus, the mization of the virtual node time interval (θ) and RC size (N) could influence the me property of the proposed optoelectronic RC and improve the correlation between the posed RC scheme and the Wiener model in terms of memory performance. Second, E tion (5) shows the nonlinear and noise property of the channel, it can be reflected i nonlinear node (DPol-MZM) and output training process in the proposed optoelect RC. Thus, the optimization of the parameters of DPol-MZM would change the nonl transfer function of it, and this process could influence the nonlinear property of the posed optoelectronic RC.

Parameters Optimization
In the parameter optimization part, the signal-to-noise (SNR) ratio was set to 1 the training length was 3 × 10 3 PAM-4 signals, and the testing length was 5 × 10 4 PA signals. Here, we mainly show the effects of the virtual node interval (θ), mask siz feedback coefficient (β), and the ridge regression value (reg) in the RC training proce As mentioned in Section 2.3, the larger θ contributes to the reduced hardwar quirements for implementing the RC scheme. However, if the selected θ is too lar may exceed the characteristic time scale of the nonlinear node. In that case, the influ of neighboring nodes on the state of the virtual node will be weakened, and the pe mance of RC will be drastically reduced. As shown in Figure 5, when θ was set bet 10 and 80 ps, the BER was relatively low, but when θ was set between 50 and 80 p error was more stable, so θ = 80 ps was superior here. As shown in Figure 6, when the mask size γ was set to a value of approximat BER was relatively small. As γ increases from 0, the BER decreases first and then incr when γ is larger than 1. This phenomenon is straightforward. When γ is close to 0 input multiplied by γ is equivalent to almost zero in the system. When γ is too larg As shown in Figure 6, when the mask size γ was set to a value of approximately 1, BER was relatively small. As γ increases from 0, the BER decreases first and then increases when γ is larger than 1. This phenomenon is straightforward. When γ is close to 0, the input multiplied by γ is equivalent to almost zero in the system. When γ is too large, the signal will deviate from the nonlinear region of the transfer function. Both situations will deteriorate the system's transmission performance. Here, we chose γ = 0.70. signal will deviate from the nonlinear region of the transfer function. Both situations will deteriorate the system's transmission performance. Here, we chose γ = 0.70. The trend in the relationship between the normalized feedback coefficient β and BER was similar to that of γ. The trend is shown in Figure 7. It was necessary to make the value of the function as close as possible to the nonlinear region of the transfer function, so β also needed a value region that was not too big or too small. When β < 1, the interna dynamics of the reservoir were stable. When β > 1, it may become divergent. Therefore for β > 1, the performance decreased as β increased. Finally, the optimal value of β was determined to be 0.11 by dividing into smaller intervals as shown in the inset of Figure 7  In the training process of the output connection weight matrix (Wout), the optimization of the regularization parameter (reg) was also involved. The output layer collects x(n) at all times, and the node state of each cycle was a column to form matrix B. This matrix B was multiplied by the output connection weight matrix, Wout, to obtain the output. The The trend in the relationship between the normalized feedback coefficient β and BER was similar to that of γ. The trend is shown in Figure 7. It was necessary to make the value of the function as close as possible to the nonlinear region of the transfer function, so β also needed a value region that was not too big or too small. When β < 1, the internal dynamics of the reservoir were stable. When β > 1, it may become divergent. Therefore, for β > 1, the performance decreased as β increased. Finally, the optimal value of β was determined to be 0.11 by dividing into smaller intervals as shown in the inset of Figure 7. signal will deviate from the nonlinear region of the transfer function. Both situations wil deteriorate the system's transmission performance. Here, we chose γ = 0.70. The trend in the relationship between the normalized feedback coefficient β and BER was similar to that of γ. The trend is shown in Figure 7. It was necessary to make the value of the function as close as possible to the nonlinear region of the transfer function, so β also needed a value region that was not too big or too small. When β < 1, the interna dynamics of the reservoir were stable. When β > 1, it may become divergent. Therefore for β > 1, the performance decreased as β increased. Finally, the optimal value of β wa determined to be 0.11 by dividing into smaller intervals as shown in the inset of Figure 7  In the training process of the output connection weight matrix (Wout), the optimiza tion of the regularization parameter (reg) was also involved. The output layer collects x(n at all times, and the node state of each cycle was a column to form matrix B. This matrix B was multiplied by the output connection weight matrix, Wout, to obtain the output. The In the training process of the output connection weight matrix (W out ), the optimization of the regularization parameter (reg) was also involved. The output layer collects x(n) at all times, and the node state of each cycle was a column to form matrix B. This matrix B was multiplied by the output connection weight matrix, W out , to obtain the output. The output matrix T of the training process was the expected output d(n). T is a known quantity so that W out can be solved as the following formula: Considering that matrix B may be a singular matrix and enhance the model's generalization ability, the ridge regression algorithm was used to satisfy these two conditions. Equation (6) was modified to the following formula [20]: where E is the identity matrix of the same size as B*B T . The value of reg directly affects W out , affecting the performance of the RC.
As shown in Figure 8, when reg was very close to 0, the curve fluctuated sharply, and the error was large. This phenomenon occurred because the regularization parameter was too small, causing the model to transform into an over-fitting state. At the same time, because the training length was small, regularization was needed to improve the network's generalization ability. When the reg value was relatively large, especially ranging from 1E-6, the BER also increased. This phenomenon was because the weight attenuation caused numbers of the values in the W out matrix to be 0 or close to 0, making the model to be in an under-fitting state and reducing the accuracy. It had excellent performance around reg = 1 × 10 −11 . It should be noted that the optimal value of reg is affected by SNR. When the SNR is large, the optimal reg will decrease. output matrix T of the training process was the expected output d(n). T is a known quan tity so that Wout can be solved as the following formula: Considering that matrix B may be a singular matrix and enhance the model's gener alization ability, the ridge regression algorithm was used to satisfy these two conditions Equation (6) was modified to the following formula [20]: where E is the identity matrix of the same size as B*B T . The value of reg directly affects Wout, affecting the performance of the RC. As shown in Figure 8, when reg was very close to 0, the curve fluctuated sharply, and the error was large. This phenomenon occurred because the regularization parameter was too small, causing the model to transform into an over-fitting state. At the same time, be cause the training length was small, regularization was needed to improve the network's generalization ability. When the reg value was relatively large, especially ranging from 1E-6, the BER also increased. This phenomenon was because the weight attenuation caused numbers of the values in the Wout matrix to be 0 or close to 0, making the model to be in an under-fitting state and reducing the accuracy. It had excellent performance around reg = 1 × 10 −11 . It should be noted that the optimal value of reg is affected by SNR When the SNR is large, the optimal reg will decrease.

Results and Comparisons
We compared the proposed system with the typical single-loop RC. The paramete settings of the two systems are shown in Table 2. For a fair comparison, the total numbe of the virtual nodes should be identical between the double-loop and single-loop systems (i.e., N = 150 of a typical RC), and the rest of the parameters set at the best operating poin of the respective system.

Results and Comparisons
We compared the proposed system with the typical single-loop RC. The parameter settings of the two systems are shown in Table 2. For a fair comparison, the total number of the virtual nodes should be identical between the double-loop and single-loop systems (i.e., N = 150 of a typical RC), and the rest of the parameters set at the best operating point of the respective system.
Scale factor of feedback First, we used entropy [17] to compare the internal dynamics of the RC structure, as mentioned above, and the typical single-loop RC. The formula for normalized entropy is as Equation (8), where p i is the probability of the node states being included in the i-th segment. As shown in Figure 9, the greater the entropy indicated the enhanced internal dynamics. As can be seen, the internal dynamics of the double-loop RC were enhanced more than the typical single-loop RC. First, we used entropy [17] to compare the internal dynamics of the RC structure, as mentioned above, and the typical single-loop RC. The formula for normalized entropy is as Equation (8), where pi is the probability of the node states being included in the i-th segment. As shown in Figure 9, the greater the entropy indicated the enhanced internal dynamics. As can be seen, the internal dynamics of the double-loop RC were enhanced more than the typical single-loop RC.  The DPol-MZM-based double-loop optoelectronic RC enriched the internal dynamics of the reservoir and improved performance through the nonlinearity of the parallel loops superimposed on the transfer function. When simulating nonlinear channel equalization tasks, its BER was greatly improved compared with the single-loop scheme below the HD-FEC [21] and KP4-FEC [22], reaching a BER level of 1 × 10 −6 at the SNR of 32 dB. The performance of decision feedback equalization (DFE) [23] and third-order Volterra filtering [24] methods are also shown in Figure 10. The DPol-MZM-based double-loop optoelectronic RC enriched the internal dynamics of the reservoir and improved performance through the nonlinearity of the parallel loops superimposed on the transfer function. When simulating nonlinear channel equalization tasks, its BER was greatly improved compared with the single-loop scheme below the HD-FEC [21] and KP4-FEC [22], reaching a BER level of 1 × 10 −6 at the SNR of 32 dB. The performance of decision feedback equalization (DFE) [23] and third-order Volterra filtering [24] methods are also shown in Figure 10.

Conclusions
In summary, we proposed and numerically studied a parallel optoelectronic RC system. We first proved that the proposed scheme enhanced the internal dynamics more than a typical single-loop RC. In addition, we numerically analyzed several typical influencing factors to optimize the performance of the proposed RC. The simulation results demonstrated that the proposed system has the potential to obtain lower BERs for nonlinear channel equalization. Furthermore, the proposed optoelectronic RC scheme is less complex than other ANN schemes and is easy to implement in hardware, so that in the future, channel equalization in communication systems can be realized at the cost of smaller chip resources or power consumption. Altogether, this makes the DPol-MZM-based parallel RC an appealing tool for dealing with nonlinear distortion problems in communication systems.

Conclusions
In summary, we proposed and numerically studied a parallel optoelectronic RC system. We first proved that the proposed scheme enhanced the internal dynamics more than a typical single-loop RC. In addition, we numerically analyzed several typical influencing factors to optimize the performance of the proposed RC. The simulation results demonstrated that the proposed system has the potential to obtain lower BERs for nonlinear channel equalization. Furthermore, the proposed optoelectronic RC scheme is less complex than other ANN schemes and is easy to implement in hardware, so that in the future, channel equalization in communication systems can be realized at the cost of smaller chip resources or power consumption. Altogether, this makes the DPol-MZM-based parallel RC an appealing tool for dealing with nonlinear distortion problems in communication systems.