Using Multidimensional ADTPE and SVM for Optical Modulation Real-Time Recognition

: Based on the feature extraction of multidimensional asynchronous delay-tap plot entropy (ADTPE) and multiclass classiﬁcation of support vector machine (SVM), we propose a method for recognition of multiple optical modulation formats and various data rates. We ﬁrstly present the algorithm of multidimensional ADTPE, which is extracted from asynchronous delay sampling pairs of modulated optical signal. Then, a multiclass SVM is utilized for fast and accurate classiﬁcation of several widely-used optical modulation formats. In addition, a simple real-time recognition scheme is designed to reduce the computation time. Compared to the existing method based on asynchronous delay-tap plot (ADTP), the theoretical analysis and simulation results show that our recognition method can effectively enhance the tolerance of transmission impairments, obtaining relatively high accuracy. Finally, it is further demonstrated that the proposed method can be integrated in an optical transport network (OTN) with ﬂexible expansion. Through simply adding the corresponding sub-SVM module in the digital signal processer (DSP), arbitrary new modulation formats can be recognized with high recognition accuracy in a short response time.


Introduction
With the rapid development of wideband Internet services and the requirement of data processing at a great capacity, the efficient use of the optical signal spectrum and capacity are continuously being enhanced by upgrading the signal modulation format in the optical transport network (OTN) [1]. On the other hand, the former modulation formats are not eliminated immediately due to the cost saving and holding on to existing services. Consequently, many kinds of modulation formats are used together to make the best of the available resources in the current OTN. Hence, the recognition of optical signal modulation is an urgent need for the OTN multiple signal monitoring. Similarly, it is useful for the realization of cognitive optical networking (CON) [2], where the net nodes have the capability to blindly demodulate the received data.
During the past few years, there have been many research papers on wireless signal modulation recognition [3][4][5][6][7], but few published papers can be found for the optical signal. Eric et al. [8] have demonstrated the usage of physical layer characteristics coupled with a digital coherent detection technique for the received optical signal modulation format recognition. However, the advanced modulation formats [9], such as polarization multiplexed (PM) modulation and quadrature amplitude modulation (QAM), are limited due to the range of the test parameters of the transmitter laser in their work. Khan and his colleagues [10,11] adopt an artificial neutral network (ANN), which is trained by an asynchronous amplitude histogram (AAH) or an asynchronous delay-tap plot (ADTP) of the received optical signal, to recognize six optical modulation formats with various data rates. One

Multidimensional Asynchronous Delay-Tap Plot Entropy
In the principle of ADTP [11], the detection signal is asynchronously sampled in pairs with a fixed delay time between the two sampling points, as shown in Figure 1. According to the figure, asynchronous sampling can be defined such that although Xptq and Xpt`τq are sampled at the same time, the locations of two sampling points on the signal waveform are different because of delay-tap τ [17]. Then, asynchronous sampling pairs are utilized to plot a two-dimensional histogram having AˆA bins. Essentially, an ADTP is a joint probability distribution of closely-located samples, which is the reflex of the distribution of the signal waveform's slopes.
Entropy , page-page 2 over-fitting. On the other side, the approach only supports the offline modulation format recognition with stringent impairment limitation. Thus, it is not easy to meet the requirement of real-time recognition in the next generation OTN and CON.
In this paper, a method is proposed for multiple optical modulation format recognition through employing multidimensional asynchronous delay-tap plot entropy (ADTPE) and multiclass SVM. It is noteworthy that the support vector machine (SVM) is a popular machine learning application for data classification with many advantages of small dependence on samples and excellent generalization. The most important aspect is that SVM guarantees that the local optimal solution is exactly the global optimal solution [12][13][14]. Therefore, it can be anticipated that SVM is preferred to ANN for real-time modulation format recognition. The simulations are performed on VPI 9.1 software [15] to simulate an optical transmission system that consists of six channels with different commonly-used modulation formats and various data rates, including 10-Gb return-to-zero (RZ) on-off keying (OOK), 40-Gb not-return-zero (NRZ) differential phase-shift keying (DPSK), 40-Gb duo-binary optical (DUO), 40-Gb RZ differential quadrature phase-shift keying (DQPSK), 100-Gb polarization-multiplexed (PM) RZ quadrature phase-shift keying (QPSK) and 200-Gb PM-NRZ 16 quadrature amplitude modulation (16QAM) [16]. Considering transmission impairments, the multidimensional ADTPEs are extracted by varying the size of steps in the wide range of optical signal-to-noise ratios (OSNR), chromatic dispersion (CD) and polarization mode dispersion (PMD), respectively. The results of the simulations show that the overall recognition accuracy can reach 99.05%, while the recognition time can keep within 332 ms for these six commonly-used optical modulation formats. At last, we further prove that the method with flexible expansion can implement real-time recognition of the new advanced optical modulation formats.

Multidimensional Asynchronous Delay-Tap Plot Entropy
In the principle of ADTP [11], the detection signal is asynchronously sampled in pairs with a fixed delay time between the two sampling points, as shown in Figure 1. According to the figure, asynchronous sampling can be defined such that although ( ) X t and ( ) X t   are sampled at the same time, the locations of two sampling points on the signal waveform are different because of delay-tap  [17]. Then, asynchronous sampling pairs are utilized to plot a two-dimensional histogram having A A  bins. Essentially, an ADTP is a joint probability distribution of closely-located samples, which is the reflex of the distribution of the signal waveform's slopes.  Figure 2 shows eye diagrams and ADTPs for six types of modulation formats under two different detection conditions. Each ADTP is made up of 30 30  bins. It can be found that ADTPs corresponding to various formats appear as distinctive portraits when OSNR = 20 dB without CD impairment, whereas they are difficult to distinguish when the received optical signals are impaired by large CD. To tackle this problem, we introduce four types of classical entropies into ADTP as the features for the enhancement of recognition performance and impairment tolerance.  Figure 2 shows eye diagrams and ADTPs for six types of modulation formats under two different detection conditions. Each ADTP is made up of 30ˆ30 bins. It can be found that ADTPs corresponding to various formats appear as distinctive portraits when OSNR = 20 dB without CD impairment, whereas they are difficult to distinguish when the received optical signals are impaired by large CD. To tackle this problem, we introduce four types of classical entropies into ADTP as the features for the enhancement of recognition performance and impairment tolerance.   It has been successfully proven that the value of entropy indicates information-related properties for an accurate representation of a given image [18][19][20]. Usually, the entropy E must be an additive cost function, such that Ep0q " 0. In this paper, an ADTP with pixel intensity is presented as a 30ˆ30 matrix. Then, four types of ADTPE are defined as below: I. The Shannon entropy: II. The exponent entropy: with the convention 0{log2p0q " 0 and 0{logp0q " 0, where the value of the pixel intensity, which here is obtained from the amplitude of the two-dimensional asynchronous delay-tap histogram. III. The singular Shannon entropy: IV. The singular exponent entropy: where , and the singular vector ts i , 1 ď i ď Nu is decomposed from a matrix of ADTPs.
Considering the negative and positive CD influence over the received signal waveform with the same magnitude, but the sign, Figure 3 only shows that four types of ADTPE corresponding to six different modulation formats change along with the increase of positive CD in the range from 0 to 4000 ps/nm in steps of 80 ps/nm under different OSNR levels. According to the figure, it is worth emphasizing that the four types of ADTPE corresponding to respective modulation formats fluctuate in a certain domain after the CD value increases more than 500 ps/nm. The domains for different modulation formats are distinctive in Figure 3. For example, the value of E 3 for 100-Gb PM-RZ-QPSK always fluctuates in the range from three to four. This is because the eye diagram of the received signal gradually closes until a steady state with the CD accumulation. Then, the asynchronous delay-tap sampling pairs of the signal waveform concentrate in the bottom left of the ADTP for all six modulation formats, as shown in Figure 2 (right column), but different modulation formats induce the different distribution of pixel intensities. As a result, ADTPEs corresponding to different modulation formats can be identified. Although there is overlapping in some types of ADTPE for different modulation formats in Figure 3, we can choose the most proper one for recognition by directly judging the selected ADTPE values in the domain. Additionally, it can be found from the figure that the variations of four types of ADTPE remain almost unaffected while changing the OSNR level from low to high. This implies that the ADTPEs are insensitive to amplified spontaneous emission (ASE) noise. From the above results, it is predicted that four types of ADTPE with large tolerance to transmission impairments can be exploited for the recognition of multiple modulation formats and different data rates. Entropy , page-page 5 of ADTPE with large tolerance to transmission impairments can be exploited for the recognition of multiple modulation formats and different data rates.     Figure 4 are not fixed. However, it is impractical that every recognition is according to continuously finding the most appropriate type of ADTPE. To solve this problem, four types of ADTPE are used as a four-dimensional eigenvector, and one type of ADTPE represents one dimension of the eigenvector. Each dimension of the eigenvector corresponding to different modulation formats is compared simultaneously and separately in the trained SVM. Modulation formats can be distinguished as long as the difference in any dimension exceeds the threshold, which is determined by the trained SVM. In addition, it is noted that the new modulation format can be recognized only after it is trained by SVM with four types of ADTPE.
Entropy , page-page 6 entropies in Figure 4 are not fixed. However, it is impractical that every recognition is according to continuously finding the most appropriate type of ADTPE. To solve this problem, four types of ADTPE are used as a four-dimensional eigenvector, and one type of ADTPE represents one dimension of the eigenvector. Each dimension of the eigenvector corresponding to different modulation formats is compared simultaneously and separately in the trained SVM. Modulation formats can be distinguished as long as the difference in any dimension exceeds the threshold, which is determined by the trained SVM. In addition, it is noted that the new modulation format can be recognized only after it is trained by SVM with four types of ADTPE.

Support Vector Machine for Modulation Format Classification
In the basic SVM approach with extraction features [21][22][23][24][25], the classifier separates the input feature vectors into two classes based on the maximal distance algorithm using the most powerful classifying functions, which defines the judgment boundary (two-dimensional space) or hyperplane (multidimensional space). The mathematical expression for the two classes of linear SVM classifiers can be defined as below:

Support Vector Machine for Modulation Format Classification
In the basic SVM approach with extraction features [21][22][23][24][25], the classifier separates the input feature vectors into two classes based on the maximal distance algorithm using the most powerful classifying functions, which defines the judgment boundary (two-dimensional space) or hyperplane (multidimensional space). The mathematical expression for the two classes of linear SVM classifiers can be defined as below: where x i is the input vector, y i P t´1,`1u represents two classes labels, ω is the vector of weight coefficient and b is the correction coefficient. To obtain the maximum distance, 1 2 ||ω|| 2 has to be minimized subject to the condition in (5). The linear input vectors cannot be separated in some cases, whereas SVM can map inseparable input vectors into a higher dimensional feature space through a kernel function. The kernel function could be one of many types of functions, such as linear, quadratic, radial basis function (RBF), polynomial and multilayer perceptron (MLP). In this paper, four popularly-used kernel functions, including linear, polynomial, RBF and sigmoid, are selected. Their expressions are given as below [26][27][28]: I. The linear kernel function: II. The polynomial kernel function: III. The RBF kernel: K`x i , x j˘" exp´´γ||x i´xj || 2¯ ( 8) IV. The sigmoid kernel function: where x i , x j are input vectors, γ is the reciprocal of the number of modulation formats (γ " 1{6), r is default zero and d is the order of the polynomial function. It is noted that the polynomial kernel changes to a linear kernel when the order equals one. The comparison of overall recognition accuracies for different kernel functions is given in the following section. For multiclass classification, a multiclass SVM comprising fifteen two-class sub-SVMs is designed. The number of sub-SVMs is calculated through npn´1q{2, where n is the number of the modulation format. Figure 5 depicts the structure of the recognition procedure using multidimensional ADTPE and a multiclass SVM. Multidimensional ADTPEs are divided into two parts: testing database and training database. The training database is utilized to train each sub-SVM, while the testing database is processed as the SVM structure for the trained sub-SVM testing. Each trained sub-SVM separates testing data into two parts, which can be labeled +1 and´1. The accuracy based on the one-versus-one algorithm is computed as below: where A is the recognition accuracy, C`1 is the correct part for labeled +1 testing data, C´1 is the correct part for labeled´1 testing data, E`1 is the error part for labeled +1 testing data and E´1 is the error part for labeled´1 testing data. Finally, the recognition results are output from each sub-SVM, and the overall accuracy is calculated by the average value of all 15 sub-SVM results.
Entropy , page-page

The Structural Process of the Real-Time Modulation Format Recognition System and Database Formation
In the fiber backbone transmission link, the 10-Gb RZ OOK format still presents in the metropolitan area network, and later, the speed will increase to 40 Gb with RZ-DPSK, DUO or RZ-DQPSK formats; the selection depends on many factors, such as cost and link distance. On the other side, the 100-Gb PM-RZ-QPSK format has been commercially applied in optical fiber

The Structural Process of the Real-Time Modulation Format Recognition System and Database Formation
In the fiber backbone transmission link, the 10-Gb RZ OOK format still presents in the metropolitan area network, and later, the speed will increase to 40 Gb with RZ-DPSK, DUO or RZ-DQPSK formats; the selection depends on many factors, such as cost and link distance. On the other side, the 100-Gb PM-RZ-QPSK format has been commercially applied in optical fiber international communication, while 200-Gb PM-NRZ-16QAM, which is a potential format for the next generation international backbone network, has proven its feasibility experimentally [9,16]. To sum up, these six kinds of modulation formats are selected as a result of wide application. The structure of the real-time recognition system for the proposed method is shown in Figure 6. Six different modulated optical signals as mentioned before are transmitted in the pseudo-random bit sequence (PRBS) at the same laser power of 1 MW over a single-mode fiber (SMF). The OSNR is regulated in the range of 10 to 30 dB (in steps of 2 dB) by using an erbium-doped fiber amplifier (EDFA) and a variable optical attenuator (VOA). The CD is considered in the range of 0 to 4000 ps/nm (in steps of 100 ps/nm) by using a CD emulator. With the large CD accumulation, the first-order PMD should be considered for our proposed method in practice. Thus, the differential group delay (DGD) is changed in the range of 0 ps to 10 ps (in steps of 1 ps) by using a first-order PMD emulator. The angle α between the principle state-of-polarization (SOP) of the PMD emulator and the different modulation optical SOP is varied randomly. The random value of α corresponding to different PMDs is selected in the range of 0 to 90 degrees. The initial azimuth angle between the two same bit-rate polarization modulation signals is 90 degrees. Then, the optical signal is split by a coupler at static power and filtered by an optical band-pass filter (OBPF) with a bandwidth of 0.8 nm to get the demand channel signal. After that, the filtered optical signal inputs a 600-ps/nm fixed dispersion module (FDM) to ensure the ADTPE with the best available characteristics of identification. Finally, the optical signal is transformed into an electrical signal by a photodiode detector with a 50-GHz bandwidth. The received electrical signal is asynchronously sampled at f s " 2.5 GHz/symbol rate, much slower than the symbol rates of all modulation formats.
Entropy , page-page 600-ps/nm fixed dispersion module (FDM) to ensure the ADTPE with the best available characteristics of identification. Finally, the optical signal is transformed into an electrical signal by a photodiode detector with a 50-GHz bandwidth. The received electrical signal is asynchronously sampled at 2.5 s f  GHz/symbol rate, much slower than the symbol rates of all modulation formats. It is noted that the extraction of multidimensional ADTPE and the training of SVM take up much response time when applying the proposed method to the real-time modulation recognition system. To reduce the response time, a DDR3 SDRAM module is added to cache the sampling data and implement a serial-to-parallel function after the asynchronous delay-tap sampling and analog-to-digital converter (ADC), as shown in Figure 6. In this system, the received serial data at a bit-rate of s f are de-multiplexed into 15 parallel data channels at a bit-rate of  It is noted that the extraction of multidimensional ADTPE and the training of SVM take up much response time when applying the proposed method to the real-time modulation recognition system. To reduce the response time, a DDR3 SDRAM module is added to cache the sampling data and implement a serial-to-parallel function after the asynchronous delay-tap sampling and analog-to-digital converter (ADC), as shown in Figure 6. In this system, the received serial data at a bit-rate of f s are de-multiplexed Entropy 2016, 18, 30 9 of 14 into 15 parallel data channels at a bit-rate of f s {15, and hence, the response time can be anticipated to be much shorter than the direct use of serial sampling data for recognition in sequence. This is due to the fact that the extraction of multidimensional ADTPE can be accomplished in 15 parallel sub-modules with 1/15 of the original time, and the training time is the largest one rather than the sum of all sub-SVMs. Each ADTP is formulated by 100,000 pairs (x i , y i ) with delayed time ∆τ " 15 ps between x i and y i . A four-dimensional eigenvector is comprised of four types of ADTPE only extracted from an ADTP without any other information about the channel impairment. Then, a set of 43,296 eigenvectors to different OSNR, CD, DGD, initial angle and different modulation formats are obtained. The scatter points of four types of ADTPEs for multiple modulation formats are shown in Figure 7, which is called a "plot matrix" here. From the figure, it is clear that several existing sub-matrices (for example, E 1 -versus-E 4 ) can be used to immediately identify 10-GB NRZ from others formats. For 40-Gb DUO, E 1 -versus-E 3 is firstly used to divide the six formats into three groups, including {10-Gb NRZ}, {40-Gb RZ-DQPSK, 100-Gb PM-RZ-QPSK} and {40-Gb NRZ-DPSK, 40-Gb DUO, 200-Gb PM-NRZ-16QAM}, and then, E 3 -versus-E 4 can be utilized to recognize 40-Gb DUO. Therefore, we can also expect good recognition accuracy for these two modulation formats. Nevertheless, there is an overlap more or less between 40-Gb RZ-DQPSK and 100-Gb PM-RZ-QPSK in all sub-matrices, while 40-Gb NRZ-DPSK and 200-Gb PM-NRZ-16QAM are the same situation. Consequently, a few estimation errors are anticipated for these four modulation formats.
Entropy , page-page Figure 7. The scatter points of four types of ADTPEs for multiple modulation formats.

Results and Discussion
To evaluate the performance of different kernel functions for multiclass SVM, we use the N-fold stratified cross-validation (SCV) technique [29]. In this study, the obtained eigenvectors are randomly divided into 10 mutually-exclusive subsets with close lengths; after that, 10-1 subsets are used for training and the rest for testing. The procedure repeats 10 times, and each subset is utilized

Results and Discussion
To evaluate the performance of different kernel functions for multiclass SVM, we use the N-fold stratified cross-validation (SCV) technique [29]. In this study, the obtained eigenvectors are randomly divided into 10 mutually-exclusive subsets with close lengths; after that, 10-1 subsets are used for training and the rest for testing. The procedure repeats 10 times, and each subset is utilized only once for testing. The 10 testing results from the 10 times are then combined together so as to decrease the variance of the estimation of classification performance. The comparison results of the overall recognition accuracy are shown in the following Table 1. The simulations are implemented on a computer with a center process unit (CPU) of Intel(R) Core(TM) 3.2 Ghz i5-4570 and 8-Gb RAM, under the 64-bit Microsoft Windows operation system. The multiclass SVM is accomplished via MATLAB 2015a (The Mathworks ©, Natick, MA, USA). It can be found from the table that the multiclass SVM using the polynomial kernel function with nine-order or each sub-SVM can achieve the highest overall recognition accuracy. To validate the real-time recognition system using the proposed method in a short response time, the recognition time of each sub-SVM is shown in Table 2. According to the table, the total testing time of all sub-SVMs is 3360 ms, which can be considered as the serial-based multiclass SVM testing time. However, the usage time for the parallel-based multiclass SVM is actually 332 ms, which is the maximum one among all sub-SVMs thanks to the parallel process. It is noteworthy that whatever the recognition system is based on, serial or parallel, the computation time of voting in multiclass SVM can be predicted to be equal. Because the voting scheme can only start after all sub-SVM proceedings finish, therefore we deem that the voting time can be ignored here, so as to highlight the advantage of time saving by the parallel design of the recognition system. As a result, the recognition time in this study only includes sub-SVM testing time. In addition, although the recognition simulations are simulated in the computer, the final goal is that the proposed method can be implemented with DSP for practical applications. Thus, considering the realization in the DSP with hardware language, we should carefully think over the time sequence for each sub-module processing, which corresponds to one-versus-one sub-SVM in the designed real-time recognition system. Each sub-SVM testing time is advisable to be estimated before the testing results of all sub-SVMs feeding in the voting scheme, and the largest testing time is chosen to ensure all testing results without absence [30][31][32]. Table 2. Each sub-SVM recognition time in the real-time recognition system. To compare the performance of our proposed method with the ADTP-based method, the sizes of the training and testing subsets are chosen to be 50% and 50% of the overall eigenvector set, respectively, while the size of training, validating and testing datasets are chosen to be 56%, 19% and 25% of the overall dataset in [11]. Namely, the number of training and testing eigenvectors for each modulation format is 3608 respectively in this study. Table 3 shows the recognition accuracies of the respective optical modulation format using multidimensional ADTPE and a multiclass SVM when the order of the polynomial kernel function is nine for each of sub-SVM. An overall recognized accuracy of 99.05% is a little lower than the research [11] claimed 99.95%. However, the enhancement of recognition accuracy can be expected by introducing new ADTPE (e.g., wavelet exponent entropy) into ADTP in order to increase the dimension of the eigenvector, as well as the complexity of computation and response time. Moreover, the stringent 500 ps/nm CD restraint in [11] has been freed up to 4000 ps/nm, while the range of OSNR is expanded to 10 to 30 dB. Considering the typical coefficient 17 ps/nm/km of single mode fiber (SMF), 4000 ps/nm CD equates to a 235.2-km transmission distance in realty, which can meet the requirements of long-haul optical fiber communication. The reasons are two-fold. First, compared to the ADTP-based feature, multidimensional ADTPE described the physical properties of the modulation format are the counterparts of two distinct and entirely different aspects, which will lead to the precise distinction in the large CD situation. Second, entropy is the statistical feature with the advantage of insensitivity to amplified spontaneous emission (ASE) noise. Meanwhile, it is effective at describing the uncertainty and complexity of a two-dimensional image. To investigate the influence of the proportion between the training and testing eigenvector set, we change the size of the training eigenvector from 10% to 90% in steps of 10% of the overall eigenvector set. The remaining eigenvectors are used to test, and Figure 8 shows the overall recognition results. It is evident from the figure that the accuracy rapidly increases in the range from 10% to 70%, whereas the accuracy decreases when the proportion of the training set is greater than 70%. Because the size of the overall eigenvector set is fixed, the increase of the training eigenvectors causes the decrease of the testing eigenvectors. As a result, "over-training" occurs when there is not enough testing data for SVM. On the other hand, it is noted that the recognized accuracy for different proportions fluctuates by a small absolute value of 0.1434%. This proves that the SVM can classify multiclasses with high and steady performance using a small number of samples [20]. Furthermore, we increase the number of unknown modulation formats to prove the flexible expansion of the proposed method. The method is capable of distinguishing the modulation formats with various bit rates, recognition time and correct accuracies, as listed in Table 4. It can be found that the recognition accuracy of each modulation format is decreased a little, but the overall accuracy can still be up to 98.21%, and the recognition time keeps within 397 ms. Thus, we believe that the proposed method can be compatible with large numbers of emerging new optical modulation formats through simply adding the corresponding sub-SVM module in the digital signal processer (DSP) when it is applied in the real OTN system.  Furthermore, we increase the number of unknown modulation formats to prove the flexible expansion of the proposed method. The method is capable of distinguishing the modulation formats with various bit rates, recognition time and correct accuracies, as listed in Table 4. It can be found that the recognition accuracy of each modulation format is decreased a little, but the overall accuracy can still be up to 98.21%, and the recognition time keeps within 397 ms. Thus, we believe that the proposed method can be compatible with large numbers of emerging new optical modulation formats through simply adding the corresponding sub-SVM module in the digital signal processer (DSP) when it is applied in the real OTN system.

Conclusions
In this paper, a competitive method using multidimensional ADTPE and SVM is proposed for multiple modulation format real-time recognition. The method can quickly and accurately recognize the six different widely-used optical modulation formats with large tolerance to the received signal waveform distortion. In addition, we further prove that our proposed method can be flexibly expanded to arbitrary signal types and bit rates. Owing to its excellent performance, this method can be employed in the next generation OTN and CON for auto-adaption real-time demodulation.