Monitoring of OSNR Using an Improved Binary Particle Swarm Optimization and Deep Neural Network in Coherent Optical Systems

: A novel technique is proposed to implement optical signal-to-noise ratio (OSNR) estimation by using an improved binary particle swarm optimization (IBPSO) and deep neural network (DNN) based on amplitude histograms (AHs) of signals obtained after constant modulus algorithm (CMA) equalization in an optical coherent system. For existing OSNR estimation models of DNN and AHs, sparse AHs with valid features of original data are selected by IBPSO algorithm to replace the original, and the sparse sets are used as input vector to train and test the particle swarm optimization (PSO) optimized DNN (PSO-DNN) network structure. Numerical simulations have been carried out in the OSNR ranges from 10 dB to 30 dB for 112 Gbps PM-RZ-QPSK and 112 Gbps PM-NRZ-16QAM signals, and results show that the proposed algorithm achieves a high OSNR estimation accuracy with the maximum estimation error is less than 0.5 dB. In addition, the simulation results with di ﬀ erent data input into the deep neural network structure show that the mean OSNR estimation error is 0.29 dB and 0.39 dB under original data and 0.29 dB and 0.37 dB under sparse data for the two signals, respectively. In the future dynamic optical network, it is of more practical signiﬁcance to reconstruct the original signal and analyze the data using sparse observation information in the face of multiple impairment and serious interference. The proposed technique has the potential to be applied for optical performance monitoring (OPM) and is helpful for better management of optical networks.


Introduction
With the application of high-speed and large-capacity optical communication technologies such as reconfigurable optical add-drop multiplexer (ROADM), dense wavelength division multiplexing (DWDM), polarization multiplexing, and coherent optical communication to optical communication systems, the capacity and rate of optical networks are changing at a constantly changing speed. In the future, the optical network will continue to develop towards the goal of being more flexible, high-speed and dynamic, however, with the increasing complexity of network structure and the aggravation of signal impairment, the reliability of optical fiber communication system and optical network will be reduced.
It is predicted that the future optical communication system will no longer be a relatively static network system that operates completely following established norms. The dynamic link structure will change with the temperature, component replacement, aging and optical fiber maintenance, and the optical nodes that can achieve "Plug and Play" will be required to better allocate optical computing model and learn the manifestation of data through multilayer abstraction. This technology has made technological breakthroughs in most research fields, including data mining [17], visual target detection [18], speech recognition [19], target recognition [20], quality of transmission (QoT) prediction [21], and medical diagnostic [22], as well as greatly improved the learning performance of related systems. Compared with traditional machine learning, DL is constantly deepened on the basis of artificial neural network theory. By adopting universal learning process and layer-by-layer learning feature representation, the impact of the introduction of artificial features on the system is weakened. At present, DL technology has also been well used in optical communications to promote the development of intelligent systems [23]. The methods of OPM using machine learning include support vector machine (SVM) [24], artificial neural network (ANN) [25], generalized regression neural network (GRNN) [26], and convolutional neural network (CNN) [27]. Recently, the use of deep neural network (DNN) instead of shallow ANN for optical performance monitoring has been extensively studied. DNN is a network structure with multiple hidden layers between the input and output layers. The additional layers of DNN will automatically extract input features, so that complex data can be modeled by fewer units than shallow networks [28].
In this paper, we demonstrate a technique which employs the improved binary particle swarm optimization (IBPSO) algorithm and the particle swarm optimization optimized deep neural network (PSO-DNN) structure in optical coherent receivers for OSNR monitoring. Particle swarm optimization (PSO) is a kind of swarm intelligence optimization algorithm in the field of computational intelligence. It is a global optimization intelligent algorithm that uses the foraging behavior characteristics of biological populations to simulate and solve practical optimization problems [29]. PSO is a meta-heuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions [30,31]. Compared with the other evolutionary algorithm such as artificial fish swarm algorithm [32], genetic algorithm [33,34], artificial bee colony algorithm [35,36], and simulated annealing algorithm [37], the most important advantages of PSO are the few parameters needed to adjust and the easy implementation. IBPSO follows the action of chromosomes in genetic algorithm (GA) and sparsely calculates the eigenvectors of the input AHs. In many practical problems, sparse signals can provide enough information, such as head image recognition [38], sound location [39], and target tracking [40]. In the proposed scheme, first the equivalent AHs sparse dataset is obtained by the IBPSO as the eigenvector input of the DNNs, and then the optimization training is conducted on two DNN modules for 112 Gbps PM-RZ-QPSK and 112 Gbps PM-NRZ-16QAM signals, respectively, through PSO to obtain the accurate estimation of OSNR. Experimental results obtained show that the proposed method can achieve a higher precision estimation of OSNR than the existing methods. In addition, this technology does not require additional monitoring equipment or any modifications to the transmitter. As the dimension of the input signal set AHs decreases, the corresponding neural network structure is simplified correspondingly, and the estimation accuracy is still relatively high.
The rest of this paper is structured as follows. Section 2 introduces the OSNR monitoring based on the conventional DNN and the improved binary particle swarm optimization and deep neural network (IBPSO-DNN) method trained with AHs in detail. Section 3 describes the structure of the OSNR monitoring experimental configuration of PM-RZ-QPSK and PM-NRZ-16QAM signals. Some simulation results and discussion are presented in Section 4. Finally, Section 5 sums up the conclusions.

OSNR Monitoring Based on the Conventional DNN Trained with AHs
Consider a coherent optical transmission system, its coherent receiver and digital signal processing (DSP) architecture, including the proposed OSNR monitoring stage, as shown in Figure 1. The received signal in the digital coherent receiver adopts the standard signal processing algorithms in the DSP unit, including normalization, resampling and phase-orthogonal (IQ) imbalance compensation, and then followed by chromatic dispersion (CD) compensation and timing phase recovery algorithm independent of the standard modulation format. Next, we use equalization based on CMA to solve polarization demultiplexing and compensate almost all linear transmission impairments [41]. As can be seen from Figure 1, the data for OSNR monitoring is the output signal after CMA equalization. After this stage, the signal is mainly affected by the amplified spontaneous emission (ASE) noise.
Consider a coherent optical transmission system, its coherent receiver and digital signal processing (DSP) architecture, including the proposed OSNR monitoring stage, as shown in Figure  1. The received signal in the digital coherent receiver adopts the standard signal processing algorithms in the DSP unit, including normalization, resampling and phase-orthogonal (IQ) imbalance compensation, and then followed by chromatic dispersion (CD) compensation and timing phase recovery algorithm independent of the standard modulation format. Next, we use equalization based on CMA to solve polarization demultiplexing and compensate almost all linear transmission impairments [41]. As can be seen from Figure 1, the data for OSNR monitoring is the output signal after CMA equalization. After this stage, the signal is mainly affected by the amplified spontaneous emission (ASE) noise. Figure 1. Schematic of the coherent optical receiver and digital signal processing (DSP) for optical signal-to-noise ratio (OSNR) monitoring (LO: local oscillator, PBS: polarization beam splitter, and ADC: analog-to-digital converter). Figures 2 and 3 show the constellation diagram and the corresponding amplitude histograms of PM-RZ-QPSK and PM-NRZ-16QAM signals after CMA equalization, as well as the constellation diagram after carrier phase estimation. AHs are generated by data samples obtained after CMA equalization module with 100 bins. We choose OSNR monitoring after CMA equalization instead of carrier phase estimation because the former only needs a few DSP units, which reduces the complexity of OPM devices, while the latter needs to process additional AHs, which increases the computational complexity and time. It can be clearly seen from the two figures that their corresponding AHs show unique and distinct signatures for different OSNRs and modulation formats. Therefore, the sensitivity of AHs to OSNR can be used as an estimated feature of OSNR monitoring. Then, the pattern recognition technology based on conventional DNN is used to estimate OSNR by using the relevant features of AHs.   3 show the constellation diagram and the corresponding amplitude histograms of PM-RZ-QPSK and PM-NRZ-16QAM signals after CMA equalization, as well as the constellation diagram after carrier phase estimation. AHs are generated by data samples obtained after CMA equalization module with 100 bins. We choose OSNR monitoring after CMA equalization instead of carrier phase estimation because the former only needs a few DSP units, which reduces the complexity of OPM devices, while the latter needs to process additional AHs, which increases the computational complexity and time. It can be clearly seen from the two figures that their corresponding AHs show unique and distinct signatures for different OSNRs and modulation formats. Therefore, the sensitivity of AHs to OSNR can be used as an estimated feature of OSNR monitoring. Then, the pattern recognition technology based on conventional DNN is used to estimate OSNR by using the relevant features of AHs.
After obtaining the characteristic data for estimating OSNR, it was input into DNN structure as input data for training and test. The schematic diagram of the fully connected conventional DNN (no optimization algorithm) is shown in Figure 4. It includes an input layer, multiple hidden layers, and an output layer. The mean square error (MSE) formula of the output end, the output of the hidden layer, and the output layer neuron are shown in Figure 4. Where, x is the input vector, d is the predicted value of output, w and b are the weights and biases in the DNN structure, and G(.) and F(.) are the activation functions of the hidden layer and the output layer including Tanh, Sigmoid, ReLU and Linear, Sigmoid, and Softmax, respectively. The weights and biases of DNN are initialized randomly, and the whole network is trained by an iterative algorithm. The training can be stopped by adjusting the parameters according to the MSE between the current output and the label, until the error converges or reaches a certain iteration set value. In the conventional OSNR estimation algorithm, the input vector is composed of 100 × 1 bin counts in the AHs dataset, which is input into different DNN structures of two signals. Considering the estimated value of OSNR, the output layer neuron is 1 and the output value is scalar. After obtaining the characteristic data for estimating OSNR, it was input into DNN structure as input data for training and test. The schematic diagram of the fully connected conventional DNN (no optimization algorithm) is shown in Figure 4. It includes an input layer, multiple hidden layers, and an output layer. The mean square error (MSE) formula of the output end, the output of the hidden layer, and the output layer neuron are shown in Figure 4. Where, x is the input vector, d is the predicted value of output, w and b are the weights and biases in the DNN structure, and (.) Sigmoid, ReLU and Linear, Sigmoid, and Softmax, respectively. The weights and biases of DNN are initialized randomly, and the whole network is trained by an iterative algorithm. The training can be stopped by adjusting the parameters according to the MSE between the current output and the label, until the error converges or reaches a certain iteration set value. In the conventional OSNR estimation algorithm, the input vector is composed of 100 × 1 bin counts in the AHs dataset, which is input into different DNN structures of two signals. Considering the estimated value of OSNR, the output layer neuron is 1 and the output value is scalar.

OSNR Monitoring Based on the IBPSO-Based DNN Trained with AHs
With the development of evolutionary computation, people gradually combine intelligent optimization algorithm with neural network, and use various optimization methods to train neural network. Because the intelligent optimization algorithm has strong global convergence ability and robustness, it does not need the feature information. Therefore, the combination of the two gives play to the generalization mapping ability of the neural network, and also improves the convergence speed and learning ability of the neural network. Common intelligent optimization algorithms include GA [42], ant colony optimization (ACO) algorithm [43], and PSO. Among them, GA cannot converge effectively in a limited time and ACO needs a long search time and is prone to premature convergence because of its lack of initial information and slow solution speed. As a swarm intelligence optimization algorithm, PSO employs the cooperative mechanism of swarm solutions to produce the optimal solution iteratively. In addition, the concept of PSO is simple and easy to implement, and there are few parameters to be adjusted.

Principle of Particle Swarm Optimization (PSO)
The basic idea of PSO is that the potential solution of each optimization problem is a particle in the search space. All particles have a fitness value determined by the function of optimization. Each particle has a velocity vector to determine the direction and distance of their movement, and then the particles follow the search of the current optimal particle in the solution space. The PSO algorithm is initialized as a group of random particles, and then the optimal solution is found by iteration. The particle updates itself by tracking two extremes values in each iteration. The first extreme value is the best solution found by the particle itself until the current moment, which is called individual best value. The other is the best solution found by the whole population until the current moment, which is called the global best value.
Suppose that in a D-dimensional target search space, there are n particles forming a population X = (X 1 , X 2 , · · · , X n ), where the ith particle is expressed as a D-dimensional vector x i = (x i1 , x i2 , · · · , x iD ), representing the position in the D-dimensional search space of the ith particle, and also representing a potential solution of the problem. According to the objective function, the fitness value corresponding to each particle position x i can be calculated. The velocity of the ith particle is v i = (v i1 , v i2 , · · · , v iD ), its previous best position in the history of each particle is p i = (p i1 , p i2 , · · · , p iD ), and the best position associated with the best particle in the population is p g = (v g1 , v g2 , · · · , v gD ). During the evolutionary process, the best previous position of a particle is recorded as the personal best pbest and the best position obtained by the population thus far is called gbest.
In each iteration, the particle updates its velocity and position through individual and global extremum. The updating formula is as follows: where, ω is inertia weight, i = 1, 2, · · · , n, d = 1, 2, · · · , D, t is the current iteration number. The learning factor c 1 and c 2 are non-negative constants. These two constants enable the particles to self-summarize and learn from the excellent individuals in the group, thus, approaching their own historical optimum and the global optimum within the group or in the field. rand() is a random function evaluated in the range [0,1]. The first part expresses the influence of historical inertia on the present, the second part shows the recognition and reflection of particles on themselves, and the third part expresses the learning, comparison, and imitation of particles on the whole population. v id ∈ [−v max , v max ] is the current velocity of the ith particle, and v max is a non-negative number, that is, after the implementation of the velocity update formula, there are:

Principle of the IBPSO
Considering that the large input dataset in the DNN usage scenario will affect the convergence rate and computational complexity of DNN, therefore, it is necessary to conduct dimensionality reduction preprocessing for the data [44]. The common dimension reduction methods are principal component analysis (PCA) and partial least square (PLS). The disadvantage of PCA is that it is opposite to the original input variable, and the derived dimension may have no intuitive explanation. In addition, the implementation process of PLS is based on nonlinear iteration, which usually requires hypothesis or transformation of the original sample data, which is difficult to satisfy in practical problems. Therefore, the IBPSO algorithm was introduced to get an equivalent sparse input vector set to represent the original data by referring to the principle of sparse sampling and the role of chromosomes in GA, which improved the effect of data execution and data dimension reduction, and it was less affected by the testing process.
The particles of a traditional binary particle swarm optimization (BPSO) algorithm are composed of binary strings [45]. In the specific dimension, the probability distribution with particle velocity as the main function generates the particle position randomly. Each binary bit utilizes Equation (1) Photonics 2019, 6, 111 8 of 17 to generate speed, and its velocity value is converted into the probability of transformation, that is, the chance of the bit variable to take a value of 1. In order to indicate that the velocity value is the probability of binary bit picking 1, the value of velocity is mapped to [0,1]. The mapping function is generally adopted as sigmoid function: where S denotes the probability that position x id takes 1, and the particle changes its position by the following formula: It should be noted that the value of sigmoid function does not represent the probability of a bit change, only the probability of one bit take a value of 1. When the velocity of the particle v id approaches 0 it means that the position of the particle x id is consistent, and the sigmoid function demonstrates an equal probability of 0 or 1 for x id . If it converges to the global optimal particle, its velocity is 0. At this point, according to the properties of sigmoid function, the most chance of bit change is 0.5. And the search is random and directionless. Therefore, the traditional BPSO algorithm is a global random search algorithm, which runs with iteration and has strong randomness.
In order to make the particle tend, more and more, to the optimal particle, and the algorithm converges to the global optimal particle, we change the mapping function to the following formula: Then, the updating formula of particle position is changed to the following form: when v id < 0 (7) The purpose of the new mapping function is that when the velocity tends to 0, the value of the probability mapping function is 0. Secondly, when the value of the probability function is set to 0 and 1 in the form of Equations (7) and (8), this form can ensure that the value of bit is unchanged when the velocity is 0. When the velocity is negative, the bit can only be changed to 0, and when the velocity is positive, the bit can be changed to 1. In this way, the particle swarm can easily approach the global optimal particle eventually and, when the velocity is 0, the probability of a change rate of the particle bit is increased near 0. This idea is in agreement with the essence of particle swarm optimization.
With improvements to BPSO, IBPSO finds the best binary vector where each bit is associated with a feature. If a bit of the vector is 1, the feature is selected. If the bit is 0, the feature cannot be selected. We hope to divide the whole feature and eliminate the irrelevant features by their importance. This helps us to reduce the computational overhead, dimension features of datasets, and improve the estimation accuracy. The flow chart of the IBPSO algorithm and the process of optimizing DNN with joint IBPSO and PSO are shown as Figures 5 and 6. As can be seen from Figure 6, the "initial network" is the conventional DNN for comparison and the fully connected DNN based on IBPSO has a simpler structure due to the reduced dimension of bin counts of AHs, which reduces the hardware complexity of the sampling module.
importance. This helps us to reduce the computational overhead, dimension features of datasets, and improve the estimation accuracy. The flow chart of the IBPSO algorithm and the process of optimizing DNN with joint IBPSO and PSO are shown as Figures 5 and 6. As can be seen from Figure 6, the "initial network" is the conventional DNN for comparison and the fully connected DNN based on IBPSO has a simpler structure due to the reduced dimension of bin counts of AHs, which reduces the hardware complexity of the sampling module.

Experimental Setup
Experiments have been performed to demonstrate the validity of the proposed OSNR monitoring technique for 112 Gbps PM-RZ-QPSK and 112 Gbps PM-NRZ-16QAM systems. The experimental configuration is shown in Figure 7. At the transmitter, 28 Gbaud RZ-QPSK and 14

Experimental Setup
Experiments have been performed to demonstrate the validity of the proposed OSNR monitoring technique for 112 Gbps PM-RZ-QPSK and 112 Gbps PM-NRZ-16QAM systems. The experimental configuration is shown in Figure 7. At the transmitter, 28 Gbaud RZ-QPSK and 14 Gbaud NRZ-16QAM optical carrier signals are generated using an external cavity laser (ECL) and an I/Q modulator driven by multilevel electrical signals. ECL has a central wavelength of 1550 nm and its line width is 150 kHz. A polarization beam splitter (PBS) divides continuous laser generated by continuous wave laser into two groups of vertically polarized optical carriers. The data of pseudo-random binary sequence (PRBS) generator are transformed into corresponding modulator driving signals by level generator and modulation driver, and then modulated to two vertically polarized optical signals by IQ modulator. Finally, these two groups of orthogonal polarized optical signals are synthesized by polarization beam combiner (PBC) to obtain polarization multiplexing optical signals. Then, the two signals are amplified by erbium-doped fiber amplifier (EDFA) and transmitted through the optical fiber recirculation loop. The loop consists of a span of 100 km standard single mode fiber (SSMF), EDFA, a variable optical attenuator (VOA) placed before the EDFA and a 5 nm bandwidth optical bandpass filter (OBPF) for channel power equalization. The gain of EDFA is 20 dB and the noise figure (NF) is 4 dB. Variable amounts of CD and differential group delay (DGD) are induced by the fiber and polarization mode dispersion (PMD) emulator, respectively. The OSNR is adjusted in the range of 10 and 30 dB in steps of 2 dB, the CD is introduced in the range between 0 and 3400 ps/nm in steps of 200 ps/nm, and the DGD is introduced in the range of 0 and 20 ps in steps of 2 ps. At the output end of the loop, the real value of OSNR is measured by optical spectrum analyzer (OSA). The output optical signal is filtered by 0.4 nm bandwidth OBPF, and then detected by a coherent receiver. The line width of the local oscillator (LO) is 100 kHz, and the frequency offset is 1 GHz. The coherent detected signal is sampled by a real-time oscilloscope with a 50G sampling rate, and then input to the DSP module for offline processing. A total of 200,000 amplitude samples for the whole experiment are obtained through offline processing. These samples are treated with 100 bin numbers to obtain 200 sets of AHs data, and thus the complete dataset of AHs is obtained. The equivalent sparse dataset of AHs is screened from the original dataset by the IBPSO algorithm. The OSNR monitoring system based on DNN and PSO adopts original and sparse datasets for training, respectively. The training set and test set are randomly selected as 80% and 20% of the AHs data and 10-fold cross-validation is also applied to validate the accuracy of the model and the results are not biased by the random train and test split.

Results and Discussion
Considering that the larger dataset and more hidden layers have a greater impact on the neural network structure and iteration time of the neural network structure, this paper constructs a four-layer DNN structure which is illustrated in Figure 6. The number of input neuron nodes is equal to that of bin number, and the number of output neuron nodes is 1. The activation functions of the hidden layer and the output layer are Logsig and linear functions, respectively. We use the grid search method to optimize the number of hidden layer neuron nodes and get the optimal value of hidden layer neuron nodes when the fitness value reaches the minimum value. In order to find A total of 200,000 amplitude samples for the whole experiment are obtained through offline processing. These samples are treated with 100 bin numbers to obtain 200 sets of AHs data, and thus the complete dataset of AHs is obtained. The equivalent sparse dataset of AHs is screened from the original dataset by the IBPSO algorithm. The OSNR monitoring system based on DNN and PSO adopts original and sparse datasets for training, respectively. The training set and test set are randomly selected as 80% and 20% of the AHs data and 10-fold cross-validation is also applied to validate the accuracy of the model and the results are not biased by the random train and test split.

Results and Discussion
Considering that the larger dataset and more hidden layers have a greater impact on the neural network structure and iteration time of the neural network structure, this paper constructs a four-layer DNN structure which is illustrated in Figure 6. The number of input neuron nodes is equal to that of bin number, and the number of output neuron nodes is 1. The activation functions of the hidden layer and the output layer are Logsig and linear functions, respectively. We use the grid search method to optimize the number of hidden layer neuron nodes and get the optimal value of hidden layer neuron nodes when the fitness value reaches the minimum value. In order to find the relationship between bin number and OSNR estimation accuracy, we treat the total amplitude sample data according to AHs bin number of 10-100 (with step size of 10) to obtain different AHs bin counts. After the two signals are processed by DSP, each signal is transformed into a n i × 1 dimensional vector x = (x 1 , x 2 , · · · , x n i ) T , where n i = 10,20,30,40,50,60,70,80,90, 100 is the AHs bin number and x = (x 1 , x 2 , · · · , x n i ) T is n i AHs bin counts. Therefore, this is the input data of PSO-DNN. Correspondingly, the input layer of PSO-DNN has n i neurons. For comparison, we input the same training set into the traditional DNN for training and test the conventional DNN and PSO-DNN with the same test set, respectively. Among them, the structures of neurons for PSO-DNN with 100 input nodes was trained and tested with complete AHs signal sets, 100-76-47-1 (input layer has 100 nodes, the two hidden layers have 76 and 47 nodes, and output layer has 1 node) and 100-79-62-1 for QPSK and 16QAM signals, respectively. The corresponding neuronal structures of conventional DNN are 100-77-52-1 and 100-62-53-1. In addition, the neuron structures of other bin numbers were obtained by grid search method, therefore, they are not listed one by one.
Through the simulations of PSO-DNN and conventional DNN by convergence training, the average estimation error of these two signals in OSNR of corresponding AHs bin number are calculated and mapped. The following simulation results are obtained by random selection as 80% and 20% of the AHs data for the training set and the test set and a 10-fold cross-validation is used to obtain the average results to ensure the model fits the training data well. As shown in the Figure 8, the average estimation error in OSNR of the two signals trends to decreases as the AHs bin number increases. For PM-RZ-QPSK and PM-NRZ-16QAM signals, the average error of PSO-DNN in OSNR is the same as that of conventional DNN when the bin number is about 53 and 62, that is, PSO-DNN can achieve the same OSNR estimation accuracy as conventional DNN with fewer neurons. It is worth noting that when the bin number is 100, PSO-DNN can achieve a smaller estimation error than conventional DNN, which indicates that PSO-DNN can achieve better estimation accuracy than conventional DNN when the number of neurons is the same. It can be seen that when the number of layers of the neural network structure is the same, the number of neurons in the input layer is the main factor of the complexity of the network structure and the modeling time. Therefore, compared with conventional DNN, PSO-DNN can achieve better estimation accuracy under the same complexity of neural network structure.
Photonics 2019, 6, x FOR PEER REVIEW 12 of 18 can achieve the same OSNR estimation accuracy as conventional DNN with fewer neurons. It is worth noting that when the bin number is 100, PSO-DNN can achieve a smaller estimation error than conventional DNN, which indicates that PSO-DNN can achieve better estimation accuracy than conventional DNN when the number of neurons is the same. It can be seen that when the number of layers of the neural network structure is the same, the number of neurons in the input layer is the main factor of the complexity of the network structure and the modeling time. Therefore, compared with conventional DNN, PSO-DNN can achieve better estimation accuracy under the same complexity of neural network structure. In order to achieve high estimation accuracy with fewer neuron nodes in the input layer, an IBPSO algorithm is introduced into the system to obtain equivalent sparse input vectors. Through  , , , , , , , , , , , , , , , , , , , , , ,  , , , , , , , , , , , , , , , , , , , , , , , , , , , In order to achieve high estimation accuracy with fewer neuron nodes in the input layer, an IBPSO algorithm is introduced into the system to obtain equivalent sparse input vectors. Through repeated iterations of the IBPSO algorithm, the sparse AHs signal sets QPSK and 16QAM of the two signals are statistically obtained by: To contrast the effect of training and testing the PSO-DNN network structure with the original and the sparse input vector on the OSNR estimation accuracy, we first optimize the hidden layer neuron nodes with the grid search method when the sparse input vector is the input of the two signals, and obtain the optimal neuron network structure as 55-47-37-1 and 65-47-43-1 for PM-RZ-QPSK and PM-NRZ-16QAM signals, respectively. Then, we trained and tested the original and sparse input vector of the two signals, respectively, on the structure of their neural networks. The simulation results of OSNR for PM-RZ-QPSK and PM-NRZ-16QAM signals are shown in Figures 10 and 11. To contrast the effect of training and testing the PSO-DNN network structure with the original and the sparse input vector on the OSNR estimation accuracy, we first optimize the hidden layer neuron nodes with the grid search method when the sparse input vector is the input of the two signals, and obtain the optimal neuron network structure as 55-47-37-1 and 65-47-43-1 for PM-RZ-QPSK and PM-NRZ-16QAM signals, respectively. Then, we trained and tested the original and sparse input vector of the two signals, respectively, on the structure of their neural networks. The simulation results of OSNR for PM-RZ-QPSK and PM-NRZ-16QAM signals are shown in Figures 10 and 11.   It is clear from the Figures 10 and 11 that OSNR estimates are quite accurate and the mean estimation errors for PM-RZ-QPSK and PM-NRZ-16QAM signals with 100 bin numbers are 0.29 dB and 0.39 dB, respectively, and the mean estimation errors with 55 and 65 bin numbers for the two signals are 0.29 dB and 0.37 dB. It should be noted that the maximum estimated error of all results does not exceed 0.5 dB. It can be concluded from the above results that the sparse and complete AHs datasets can achieve relatively consistent estimation accuracy in the DNN-based estimation system, however, the structure of neural network corresponding to the sparse dataset after dimensionality reduction is simpler, and the cost and hardware requirements are reduced. The effectiveness of the IBPSO method is proven, correspondingly.
In order to prove the high accuracy of the proposed method in the OSNR estimation field, the same datasets are respectively input into the SVM and ANN systems, where SVM and ANN adopt the same training set and test set ratio. The simulation results and average estimation errors of the two signals in different OSNR estimation systems are shown in Figure 12 and Table 1. The maximum errors for the different OSNRs of the two signals are also shown in Table 1. It can be seen from the results that the OSNR estimation errors based on IBPSO-DNN can reach a higher accuracy as compared with the OSNR estimation system based on ANN and SVM for PM-RZ-QPSK and PM-NRZ-16QAM signals.

Conclusions
An improved binary particle swarm optimization and deep neural network has been proposed to monitor OSNR for optical performance monitoring in next-generation optical networks. Through the improvement of binary particle swarm optimization, a sparse amplitude histogram is selected as input data to train and test the monitoring system based on PSO and DNN for OSNR estimation in an optical communication network. Experimental results show that the sparse signal set and

Conclusions
An improved binary particle swarm optimization and deep neural network has been proposed to monitor OSNR for optical performance monitoring in next-generation optical networks. Through the improvement of binary particle swarm optimization, a sparse amplitude histogram is selected as input data to train and test the monitoring system based on PSO and DNN for OSNR estimation in an optical communication network. Experimental results show that the sparse signal set and

Conclusions
An improved binary particle swarm optimization and deep neural network has been proposed to monitor OSNR for optical performance monitoring in next-generation optical networks. Through the improvement of binary particle swarm optimization, a sparse amplitude histogram is selected as input data to train and test the monitoring system based on PSO and DNN for OSNR estimation in an optical communication network. Experimental results show that the sparse signal set and original signal set can achieve relatively consistent estimation accuracy in the OSNR monitoring system. The sparse amplitude histogram retains the key information of the original data and eliminates the redundant parts. Compared with the ANN-based and SVM-based algorithm, the proposed algorithm achieves better performance. The structure of the required neural network is simpler, and the requirements for hardware are reduced, thus achieving cost reduction and improving the estimation efficiency. Neural network optimization based on particle swarm optimization and its extended algorithm will become a powerful tool in the complex global search of optical performance monitoring in the future.