A Density Clustering Algorithm for Simultaneous Modulation Format Identiﬁcation and OSNR Estimation

: In this work, we propose a combined method to implement both modulation format identiﬁcation (MFI) and optical signal-to-noise ratio (OSNR) estimation, a method based on density-based spatial clustering of applications with a noise (DBSCAN) algorithm. The proposed method can automatically extract the cluster number and density information of samples. The cluster number can be used for MFI and the density information combined with a fourth-order polynomial ﬁtting can correctly estimate OSNR. We verify the feasibility of the method through simulation and conceptual proof experiments. The results show that the MFI can achieve 100% accuracy when the OSNR values are higher than the 7% forward error correction (FEC) thresholds for ﬁve commonly used modulation formats (MFs) like polarization division multiplexing (PDM)-QPSK, PDM-8PSK, PDM-16QAM, PDM-32QAM, and PDM-64QAM. Mean absolute OSNR estimation errors are not higher than 1 dB for di ﬀ erent signals. There is no additional hardware required, so the proposed method has the ability to be integrated into existing optical performance monitoring systems without burden. Furthermore, the proposed method has the potential to be used in bit-error ratio (BER) calculation, linear, or nonlinear impairments monitoring. We believe that our multifunctional and simple method would be favorable to a future elastic optical network (EON).


Introduction
In the past 10 years, considerable attention has been given to optical network technology.Traditional wavelength division multiplexing (WDM) optical networks with fixed wavelength and a single modulation format are facing challenges.To satisfy the exponential growth of global communication, optical fiber communication systems are envisioned to be flexible, large capacity, long transmission distance, and high spectral utilization.O. Gerstel et al. proposed the concept of spectrum sliced elastic optical network (EON) with a large capacity, high flexibility, strong scalability, and adaption to various network service forms [1].In particular, EON based on orthogonal frequency division multiplexing (OFDM) provides an important idea for solving the spectrum flexibility problem of optical networks and has attracted extensive interest.
In EONs, the unit of spectral resources is no longer fixed wavelength, but the smaller frequency gap.According to network operating demands, the EONs will be capable of dynamically implementing signal of different modulation formats (MFs), data rates, and error correction protocols to maximize the bandwidth and energy efficiency of the network [2].Dynamic variation of transmission parameters in EONs poses an important challenge to digital signal processing (DSP) algorithms employed in the optical receivers.Under the condition of modulation format switching, it is preferred to make the receivers have the ability of automatically identifying the signal modulation format, hence DSP algorithms can be deployed to achieve the optimum detection performance accordingly.Modulation format identification (MFI) technique is indispensable for re-configurable digital coherent receivers used in EONs.
It is imperative to have appropriate monitoring mechanisms across the networks to acquire precise and real-time information about the quality of transmission links and health of optical signals [3,4].Benefiting from the rapid advancement of DSP technology, the digital coherent receiver can compensate almost all linear impairments, resulting in the transmission performance being mainly determined by optical signal-to-noise ratio (OSNR) [5].OSNR is one of the most noteworthy parameters in coherent optical networks because of its direct relation with BER.OSNR information is also vital for automatic fault detection and diagnosis, as well as for in-service characterization of signal quality [6].
The above techniques have a single function and can only implement MFI or OSNR estimation.A cost-effective multi-parameter estimation in next-generation EONs is attractive.F. N. Khan et al. propose a method combining optical signal-to-noise ratio (OSNR) monitoring and MFI based on signals' amplitude histograms with the use of deep neural networks [29].The method utilizes amplitude information of the received signals and is insensitive to phase noises.However, the method cannot distinguish MFs with same amplitude distribution such as QPSK and 8PSK.D. Wang et al. propose an intelligent constellation diagram analysis algorithm using convolutional neural network to implement both MFI and OSNR estimation [30].Based on the characteristics of neural networks and supervised-learning, these two methods both need a large number of signal samples as training data.
The density-based clustering algorithm assumes that the clustering structure can be determined by the tightness of the sample distribution.In general, the density-based clustering algorithm examines the connectivity between samples and gives the connectable samples an expanding cluster until obtain the final clustering results.Several density-based clustering have been put forward, like DBSCAN, ordering points to identify the clustering structure (OPTICS), and clustering by fast search and find of density peaks (DP).
With prior information of the cluster number, the DP method generates a decision map containing the density information of each sample point and the minimum distance between the sample point and any other point with higher density.The cluster centers are recognized as points for which the value of decision map is anomalously large [31][32][33].The DP algorithm needs prior information of the cluster number in advance, and limits the application of MFI.
The OPTICS method performs clustering by computing an augmented cluster-ordering of the database objects [34][35][36][37].It can perform clustering of different-densities clusters.In principle, the OPTICS clustering method can be used to perform MFI.However, it cannot extract change of cluster density in different OSNR and cannot be used to estimate OSNR.
DBSCAN is a well-known clustering algorithm that maps the density of samples [38][39][40][41][42].With the ability of unsupervised-learning, DBSCAN can directly extract clustering information of data which can be utilized in identifying some widely used MFs such as PDM-QPSK, PDM-8PSK, PDM-16QAM, PDM-32QAM, and PDM-64QAM without training data.Moreover, by fitting the proportion of core points in the sample set, the DBSCAN-based method can realize OSNR estimation for five MFs mentioned above with low mean absolute error even when the testing data is not trained.Thus, we propose a joint algorithm for MFI and OSNR estimation based on DBSCAN in this work.The experimental results demonstrate successful identification of QPSK & 16QAM signals and low OSNR estimation error.
In Section 2, we introduce how DBSCAN is applied to the MFI and OSNR estimation.In Section 3, we verify the correctness of the proposed method by numerical simulation and proof experiments.In Section 4, we conclude that the proposed method in this article is feasible in numerical simulated and practical optical fiber communication systems and has the potential to be used in BER calculation, linear or nonlinear impairments monitoring.In Abbreviations, we present an acronym list for conveniently searching the acronym.In Appendix A, we present the basic notion, pseudocode, and framework of DBSCAN algorithm.

Proposed Algorithm: DBSCAN-Based MFI and OSNR Estimation
The basic notion, pseudocode, and framework of DBSCAN algorithm are shown in Appendix A. We define R to represent the proportion of core points in the sample set, shown as where Ω represents the set of all core points in D; D represents the set of whole points of data; || means the number of elements in the data set.

Model of MFI and OSNR Estimation Based on DBSCAN Algorithm
The DSP configuration of the proposed joint algorithm is shown in  Figure 2 shows the clustering results for five simulated MFs using DBSCAN algorithm (ε = 0.09 and MinPts = 25) when the OSNR values equal to their respective 7% FEC thresholds (QPSK: 12 dB, 8PSK: 15 dB, 16QAM: 18 dB, 32QAM: 22 dB, 64QAM: 24 dB).The laser linewidth is 100 kHz.The clustering results are 4, 8, 16, 32 and 36 for QPSK, 8PSK, 16QAM, 32QAM and 64QAM signals respectively.We can see that for QPSK, 8PSK, and 16QAM signals, the DBSCAN algorithm can correctly obtain the number of clusters.Due to the small Euclidean distance and vulnerability to noise, the cluster number of 32QAM and 64QAM is approximate.For these two signals, we use the MF index to distinguish two MFs which defined as γ = N R , where N is cluster number and R is defined by Equation (1).As OSNR increases, the effect of ASE noise on the signal decreases, leading to the constellation points better concentrated than lower OSNR conditions.According to the introduction of DBSCAN algorithm, the neighborhood parameters (εand MinPts) set a density threshold on symbols.The core points are the symbols which are reaching the density threshold.If the symbols are getting better concentrated, the number of core points apparently increases leading the R value increases.So parameter R defined by Equation ( 1) is positively correlated to OSNR for five MFs and is showed in Figure 3.In our proposed method, the DBSCAN algorithm calculates the R values corresponding to different OSNR.In order to make the fitting polynomial have the generalization ability to the multi-pattern signals without overfitting, we choose fourth-order polynomial.After fitting the fourth-order polynomial of R values and referenced OSNR values, we can estimate the corresponding OSNR from a known R with the fitted polynomial.The fitted fourth-order polynomial can be expressed as: Figure 4 shows fitted curve and R value of five other different QPSK signal patterns by changing the pseudo-random binary sequence (PRBS) random seed in arbitrary waveform generator (AWG), proving that the R value does not depend on the randomness of the signal patterns and the fitted polynomial can accurately estimate the real OSNR.After identifying the MF of input signal, the DBSCAN-based OSNR estimation algorithm is performed by using different neighborhood parameters on five MFs, respectively, with consideration of different Euclidean distances.In this stage, the R value is calculated and OSNR is estimated by the fitted fourth-order polynomial expressed by Equation (2).

Flowchart of MFI and OSNR Estimation Based on DBSCAN Algorithm
In our proposed joint method, the most computation intensive process is the DBSCAN block.So the total time complexity of our proposed method is on the order of O(n*log n) [38].Considering the proposed method does not need training large number of signal samples before performing MFI and OSNR estimation, we believe our method taking less time than the other two joint algorithms for MFI and OSNR estimation using deep neural network [29] and convolutional neural network [30].

Simulation Results
As shown in Figure 6, we set up the simulation system using the commercial software Virtual Photonics Inc (Norwood, MA, USA).and the DBSCAN-based joint algorithm is used at the receiver end to identify the MF and estimate OSNR.The wavelength of the laser at transmitter is 1553.6 nm with 100 kHz linewidth.After modulated by an IQ modulator which I and Q branches independently driven by two 30-order PRBS produced by AWG, five widely used MFs are generated.By changing the random seed of PRBS, we can get five different signal patterns.The symbol rate is set to 28 GBaud and the launched optical power to standard single-mode fiber (SSMF) is 0 dBm.The modulated signal light is amplified by an Erbium Doped Fiber Amplifier (EDFA) and transmitted in 100 km SSMF with CD parameter is 16ps/(nm.km)every span.The polarization mode dispersion (PMD) parameter is set as 0.1ps/ √ km.The transmission fiber length is set to 4000 km, 2000 km, 1000 km, 500 km and 500 km for PDM-QPSK, PDM-8PSK, PDM-16QAM, PDM-32QAM, and PDM-64QAM signals, respectively.A variable optical attenuator (VOA) and an EDFA are used to vary OSNR with transmitted signal light in 3 dB coupler.The linewidth of local oscillator laser is 100 kHz.After analog-to-digital converter (ADC), the DSP blocks described in Section 2.1 are followed.It can be seen that the mean absolute errors of all reference OSNR conditions for five MFs are all less than 1 dB.The mean estimated error is 0.3 dB, 0.2 dB, 0.3 dB, 0.5 dB, and 0.6 dB for QPSK, 8PSK, 16QAM, 32QAM, and 64QAM, respectively.It notes that the OSNR estimation error increases with greater reference OSNR.It can be explained that the ASE noise power is too low and the algorithm cannot extract the change of density information of clusters.As a result, OSNR estimation will have more deviation [45].Table 1 shows the neighborhood parameters of five different MFs.

Results of Proof Experiments
We build an experimental system according to Figure 6.The experimental configuration is as the following: at the transmitter, the wavelength and the linewidth of the laser are 1550 nm and 5 kHz.An AWG produces two independent PRBS which drive I and Q branches of the IQ modulator.Then the modulated QPSK and 16QAM signals are generated.The symbol rate is set to 20 GBaud and launched optical power to SSMF is 0 dBm.The generated optical signal is amplified by EDFA and then transmitted to a fiber link which transmission distance is 100 km.The CD index and PMD parameter of SSMF are 16ps/(nm.km)and 0.1ps/ √ km.At the end of the fiber link, an EDFA is used to add ASE noise to the transmitted optical signal.The reference OSNR for QPSK and 16QAM is varied in the range of 12~30 dB and 12~24 dB with fixed step of 2 dB for back to back system and 100 km fiber transmission systems respectively.We can obtain the reference OSNR from the optical spectrum analyzer by using out-of-band noise measurement.At the receiver end, the linewidth of local oscillator laser is 5 kHz.We use a real-time scope with 50 GSa/s sampling rate works as ADC to sample the coherent detected signals with following offline DSP blocks including proposed joint algorithm.
As shown in Figure 9a,c, with the OSNR range not lower than 7% threshold of QPSK and 16QAM signals, the MFI decision threshold proposed in Section 2.2 can correctly divide two MF signals.
In addition, the mean estimated error for QPSK signal is 0.4 dB (back to back system) and 0.4 dB (100 km fiber transmission system) and for 16QAM is 0.5 dB (back to back system) and 0.5 dB (100 km fiber transmission system).The above results indicate that the proposed method is feasible in practical optical fiber communication systems.

Discussion and Conclusions
In this work, we propose a joint algorithm for MFI and OSNR estimation based on the DBSCAN clustering method.The cluster number and the proportion of core points are extracted by the DBSCAN algorithm to achieve MFI and OSNR estimation.The numerical simulation results show that for five widely used signal types (PDM-QPSK, PDM-8PSK, PDM-16QAM, PDM-32QAM, and PDM-64QAM), the MFI can achieve 100% accuracy when the incoming OSNR values are higher than their respective 7% FEC thresholds.The mean estimated error is 0.3 dB, 0.2 dB, 0.3 dB, 0.5 dB, and 0.6 dB for five MFs, respectively.The experimental results demonstrate that, with the OSNR range not lower than 7% threshold of QPSK and 16QAM signals, the MFI decision threshold proposed in Section 2.2 can correctly divide two MF signals.In addition, the mean estimated error for QPSK signal is 0.4 dB (back to back system) and 0.4 dB (100 km fiber transmission system) and for 16QAM is 0.5 dB (back to back system) and 0.5 dB (100 km fiber transmission system).The experimental results prove that the proposed method is feasible in practical optical fiber communication systems.There is no additional hardware required, so the proposed method has the ability to be integrated into existing optical performance monitoring systems without burden.Furthermore, the proposed method has the potential to be used in BER calculation, and linear or nonlinear impairments monitoring.We believe that our multifunctional and simple method would be more favorable in comparison to future EON.

Figure 1 .
After chromatic dispersion (CD) compensation, timing recovery, normalization, constant modulus algorithm (CMA)-based equalization, and frequency-offset (FO) compensation for compensating nearly all linear transmission impairments, signals are mainly affected by amplified spontaneous emission (ASE) noise [29,43,44].The proposed DBSCAN-based joint algorithm can extract MF and OSNR information which are used in subsequent DSP blocks such as symbol decision, decoding and signal quality characterization.

Figure 1 .
Figure 1.DSP model of MFI and OSNR estimation based on DBSCAN algorithm.Constellation diagrams of different MFs have their own unique distribution characteristics; thus, commonly used signal types can be distinguished by countering the number of clusters.The DBSCAN does not need to know the number of clusters in advance and has an unparalleled advantage for identifying non-convex sample sets, making the DBSCAN algorithm more suitable for processing the non-spherical constellation points and irregular noise distribution due to the influence of the laser linewidth than other clustering algorithms.Figure2shows the clustering results for five simulated MFs using DBSCAN algorithm (ε = 0.09 and MinPts = 25) when the OSNR values equal to their respective 7% FEC thresholds (QPSK: 12 dB, 8PSK: 15 dB, 16QAM: 18 dB, 32QAM: 22 dB, 64QAM: 24 dB).The laser linewidth is 100 kHz.The clustering results are 4, 8, 16, 32 and 36 for QPSK, 8PSK, 16QAM, 32QAM and 64QAM signals respectively.We can

Figure 3 .
Figure 3.The value of R varies with different OSNR under five MFs.

Figure 4 .
Figure 4.The value of R varies with different patterns of QPSK signal.

Figure 5
Figure 5 is the flowchart of MFI and OSNR estimation joint algorithm based on DBSCAN.First, the DBSCAN algorithm calculates the cluster number N and proportion of core objects R of input signal with two neighborhood parameters being set the same values for five MFs.The respective decision zone for cluster number is set to identify different MFs with considering that the noise (phase noise due to laser linewidth and ASE noise from optical amplifiers used in optical transmission link) may cause the cluster number to decrease.The 32QAM and 64QAM signals are identified with extra steps combining cluster number N and R value.

Figure 5 .
Figure 5.The flowchart of DBSCAN based MFI and OSNR estimation joint algorithm.

Figure 6 .
Figure 6.Simulation setup to utilize the DBSCAN-based joint algorithm.

Figure
Figure 7a shows the cluster number N varies with different OSNR after using the DBSCAN-based algorithm for QPSK, 8PSK, and 16QAM signals with the same neighborhood parameters ε = 0.09 and MinPts = 25.The cluster numbers of three different MFs are represented by different color symbols.There are five different signal patterns for every OSNR condition.With the decision thresholds proposed in Section 2.2, modulation formats of different signal patterns are correctly identified with 100% accuracy.Figure 7b shows the γ value of 32QAM and 64QAM signals and these two MFs can be divided with the decision thresholds proposed in Section 2.2.The red/green area in Figure 7a,b represents the OSNR range below/upon the 7% FEC threshold of different MFs.

Figure 7 .
Figure 7. MFI results using the DBSCAN-based algorithm.(a) cluster number N of QPSK, 8PSK and 16QAM varies with different OSNR; (b) γ value of 32QAM and 64QAM varies with different OSNR.

Figure 8
Figure 8 shows the estimated OSNR and mean absolute error of five different signal patterns for five MFs signals.Signal patterns used for fitting fourth-order polynomial are not the same as five different signal patterns above.The estimated OSNR values are represented by different solid points while the mean absolute errors of five different signal patterns for every OSNR condition are represented by different hollow points.It can be seen that the mean absolute errors of all reference OSNR conditions for five MFs are all less than 1 dB.The mean estimated error is 0.3 dB, 0.2 dB, 0.3 dB, 0.5 dB, and 0.6 dB for QPSK, 8PSK, 16QAM, 32QAM, and 64QAM, respectively.It notes that the OSNR estimation error increases with greater reference OSNR.It can be explained that the ASE noise power is too low and the algorithm cannot extract the change of density information of clusters.As a result, OSNR estimation will have more deviation[45].Table1shows the neighborhood parameters of five different MFs.

Figure 9 .
Figure 9. Experimental results of MFI and OSNR estimation for 20 GBaud QPSK and 16QAM optical communication system.(a) MFI results for back to back systems; (b) Mean absolute error of estimated OSNR for back to back QPSK signal; (c) Mean absolute error of estimated OSNR for back to back 16QAM signal; (d) MFI results for 100 km transmission systems; (e) Mean absolute error of estimated OSNR for 100 km transmitted QPSK signal; (f) Mean absolute error of estimated OSNR for 100 km transmitted 16QAM signal.

Table 1 .
Neighborhood parameters of five different MFs.