Comparing Performance of Deep Convolution Networks in Reconstructing Soliton Molecules Dynamics from Real-Time Spectral Interference

: Deep neural networks have enabled the reconstruction of optical soliton molecules with more complex structures using the real-time spectral interferences obtained by photonic time-stretch dispersive Fourier transformation (TS-DFT) technology. In this paper, we propose to use three kinds of deep convolution networks (DCNs), including VGG, ResNets, and DenseNets, for revealing internal dynamics evolution of soliton molecules based on the real-time spectral interferences. When analyzing soliton molecules with equidistant composite structures, all three models are effective. The DenseNets with layers of 48 perform the best for extracting the dynamic information of complex ﬁve-soliton molecules from TS-DFT data. The mean Pearson correlation coefﬁcient (MPCC) between the predicted results and the real results is about 0.9975. Further, the ResNets in which the MPCC achieves 0.9906 also has the better ability of phase extraction than VGG which the MPCC is about 0.9739. The general applicability is demonstrated for extracting internal information from complex soliton molecule structures with high accuracy. The presented DCNs-based techniques can be employed to explore undiscovered mechanisms underlying the distribution and evolution of large numbers of solitons in dissipative systems in experimental research.


Introduction
Soliton molecules are localized soliton bound states formed by self-organized dissipative soliton through subtle interaction mechanisms [1]. The potential of soliton molecules to expand the transmission capacity in optical communication systems has drawn much research attention and has become an attractive topic for nonlinear optical fibers in recent decades [2][3][4][5][6][7][8][9][10]. In addition to predicting the dynamic evolution of soliton molecules theoretically [3], the dynamic evolution of soliton molecules is also proved experimentally [8][9][10][11][12], which extends the degrees of freedom toward internal dynamics. The internal dynamics of soliton molecules is difficult to analyze when only the change of the pulse energy is considered in the oscilloscope traces. Recently, the photonic time-stretch dispersive Fourier transformation (TS-DFT) technology has been used to real-time monitor the internal dynamics of soliton molecules in passive mode-locked lasers (PMLs). Concretely, TS-DFT observe various rare events and transient phenomena including soliton buildup [6,7], soliton pulsation [13,14], soliton explosion [15,16], and soliton molecules [2][3][4]9,10]. It appears tremendous potential in simulating dynamic process of various complex molecules. The structure of soliton molecules encompasses simple two-soliton and three-soliton molecules [4,5,[10][11][12], 2+2 soliton molecular complexes [9], composite patterns in both global and local ranges [14], and supramolecular arrangements that mimic various manybody biochemical and biological systems [8]. To restructure the internal dynamics of the soliton molecules from TS-DFT spectra, a autocorrelation method is usually employed [9]. In this method, a discrete Fourier transform is performed on the interference fringes to obtain the single-shot autocorrelation traces for retrieving the soliton separation and the relative phase in the soliton molecules.
However, the autocorrelation method cannot further quantitatively analyze all the dynamic evolution processes in complex molecular structures, such as relative phase of each soliton. When multisoliton molecules and soliton pairs with near equal spacing happen [11], it is almost impossible to obtain relative phase differences (PDs) evolution [17]. Therefore, the autocorrelation method is suitable for analyzing simple soliton molecule structures consisting of soliton pairs, unequally spacing three solitons [4], etc. Recent years have seen the rapid growth and development of the field of ultrafast photonics, where artificial intelligence algorithms are being applied in exploring complex dynamical processes of soliton molecules in PMLFLs [17], the extreme events in optical fibre modulation instability [18], and the generation and characterization of light pulses [19,20]. In order to solve the internal dynamics of complex soliton molecules, we introduced artificial intelligence combining with TS-DFT. Although the residual networks (ResNets) [21] have been used for exploring complex dynamical processes in soliton molecules experimentally and numerically based on TS-DFT in passive mode-locked lasers (PMLs), emerging models continue to push the limits of what can be achieved. It also has proved that the data generated based on theory can be used to analyze experimental data [17]. It is necessary to consider whether the network structures outside the ResNets are more accurate and effective to analyze the internal dynamics of multisoliton molecules.
Recently, deep convolution networks (DCNs) have demonstrated a powerful ability to apply in mode-locked lasers [17,22], decompose the modes in few-mode fibers [23], recognize orbital angular momentum modes with fractional topological charges [24], mitigate fiber nonlinearity in optical communication [25], and the characterization and control of ultrafast propagation dynamics [26]. It is well known that convolutional neural networks (CNNs) have dominated machine-learning landscape in data-rich applications, such as VGG (Visual Geometry Group) [27], Residual Networks (ResNets) [21], Dense Convolutional Networks (DenseNets) [28], and other models. Theoretical and empirical evidences indicate that the depth of neural networks is crucial for its accuracy and/or performance [29]. The core of DenseNets and ResNets models is to establish "shortcuts, Skip Connection" between the front and back layers, which will facilitate shortcuts and skip connections during training and enable deeper CNN networks to be trained and achieve higher accuracy. The difference in DenseNets model is that each layer can directly obtain the gradients from the loss function and the original input signal, thus forming an implicit form of deep supervision [30,31]. This makes the feature reuse through the connection of features across the channel for faster error converge. Considering the representativeness of VGG, ResNets, and DenseNets models and their characteristics of easily deepening the network layers, the three kinds of models are chosen to compare the ability in extracting internal dynamics evolution of soliton molecules.
Here, we propose and demonstrate, theoretically, the analysis the internal dynamics of bound states of complex dissipative solitons by employing DCNs. We implement VGG, ResNets, and DenseNets which are able to extract the phase evolution information of more complex soliton molecules from TS-DFT spectra data by modifying the network structure. Comparing the performance of the three DCNs by numerical analysis, the ResNets and DenseNets represent lower complexity than VGG and can easily enjoy accuracy gains from greatly increased layers. The DenseNets we used have better parameter efficiency and more lower error than ResNets in the test data. Thus, DenseNets have been demonstrated to achieve superior performance in comparison to other two models by almost any meaningful metric.

Generate Simulated TS-DFT Data of Soliton Molecules
The generation of simulated TS-DFT data of soliton bound state is considering factors such as bandwidth, sampling, and noise, which has been proven to be used for deep learning data sets [17]. The complex amplitude of the slowly varying envelope of soliton molecules is described by the superposition of solitons, which is given by [32] where T is the relative reference time of the pulse, M is the number of solitons, and u k , τ k , and ϕ k represent the slowly varying envelope, relative temporal delay, and relative phase of the k-th soliton, respectively. When the bandwidth and sampling speed of the electronic devices are matching with experiment, for example, the parameters of a real-time oscilloscope are 59 GHz and 200 GSa/s, the TS-DFT spectrum with resolution of 2.8626 ps is calculated first with high temporal resolution (0.01 ps) and then filtered by a fourth-order Butterworth lowpass filter and downsampled. Thus, the simulated TS-DFT dataset for the soliton molecules can be acquired based on a series of relative temporal delay τ k and phase ϕ k is given. All the TS-DFT data are superimposed white noise. The TS-DFT system, we used here, has a dispersion-compensating fiber (DCF, −134 ps2/km) with length of 1.5 km.
We assume that the solitons in soliton molecules are hyperbolic secant pulses with a central wavelength of 1560 nm. As shown in Figure 1, when multisoliton molecules are considered, the TS-DFT dataset is generated with random PDs. The TS-DFT dataset is filtered and divided into a training set and a verification set proportionally (8:2). All the TS-DFTs are converted to bitmap for the inputs of DCNs. After the training via DCNs, the simulated testing dataset, with noise, is used to predict the PDs of the soliton dynamics.

Structures of Deep Convolution Networks (DCNs)
The based architectures of three DCNs, namely VGG, ResNets, and DenseNets, are ref. [28,33,34]. We made some modification in these three models, including the number of layers of the network, the size of convolution kernel, and the structure of subblock. Especially, in Figure 2a, a batch normalization (BN) is added before each convolution block unlike VGG nets in ref [35]. Meanwhile a regularization L2-norm is used in each convolutional layer. The convolutional layers have the same convolution kernel (K i ) in one convolution block. With the stack of convolution blocks, the number of convolution kernels increases or is the same as the previous block. The main parts of ResNets/DenseNets are made up of their ResBlocks/Dense Blocks as shown in Figure 2b,c. The number of subblocks for each ResBlock/Dense Block is set respectively. Their structure of the subblocks are displayed in the box pointed to by the arrow. In addition, all the convolutional layers with regularization L2-norm are employed and batch normalization is applied among the layers. The activation function, which uses the rectified linear unit (ReLU) [36] and the Batch-Normalization [37], regularization L2-norm, and pooling, used in our three DCNs, can prevent overfitting. The regularization L2-norm makes the objective function easy to converge to the global optimal solution. The weights of the DCNs are optimized during the training process through backpropagation. The optimizer we used is Adam [38], a variant of stochastic gradient descent that has individual adaptive learning rates for different parameters, which are calculated from estimates of the first and second moments of the gradients. Moreover, the mean absolute error (MAE) is chosen here because DCNs implement regression problems. The function of the optimizer is to reduce the gap between the predicted value and the sample label value. The DCNs' models are implemented using the Tensorflow framework [39].

Soliton Molecular Structure of Test Set
A complex soliton molecular structure with five solitons, which is exhibited in Figure 3, is used to test the ability of the three DCNs in extracting relative phase differences (PDs). In particular, the internal phase evolution of the soliton molecules contain oscillating and the diverging sliding phase [4,5]. The test set includes both phases and the equal temporal separations so it is impossible to extract internal phase evolution of each soliton by autocorrelation method. The temporal trace of simulated dataset is shown in Figure 3a. The temporal separations of the five solitons contain two kinds of equal spacing 17 and 42 picoseconds (ps). As presented in Figure 3a, a phasor representation is constructed to picture the five-soliton molecules constituted. We defined the leftmost soliton as the first pulse which is set as the reference with a fixed pointing direction. Then, the PD from the following pulse to the first pulse are defined as PD2, PD3, etc., denoted by the variables (ϕ). Figure 3e lists two PDs as representatives containing oscillating and the diverging sliding phase [40]. The TS-DFT of five-soliton molecules with given phases as the simulated testing dataset show in Figure 3b. Because there are soliton pairs with almost equal separation within the soliton molecule, their corresponding autocorrelation peaks are coherently superposed. The autocorrelation trajectories are flickering as shown in Figure 3c. Specifically, two roundtrips (580 and 704 roundtrips) of autocorrelation curves are drawn in Figure 3d. It is obvious that the intensity varies greatly at the autocorrelation peaks for the interaction of isometric soliton molecules. This complex molecular structure as a test set involves the difficulties mentioned above and has the ability to evaluate the merits and demerits of the DCNs.

Perform Three DCNs on TS-DFT Datasets of Five-Soliton Molecules
The TS-DFT dataset, with 39 × 39 pixels each, put into three DCNs for training. We add three callback functions to control the program. These include dynamic adjustment of learning rate (LR) which is multiplied by 0.6 to decrease value if the error of lose function does not decrease after 5 iterations. The Early-Stop function is to terminate the program when the error of lose function does not decrease after 20 iterations. The Best-Model function saves optimal parameter model when the error is less than previous error. The training results are shown in Figure 4. The convergence speed and error of different networks are diverse because of the number of layers. As shown in Figure 4a Table 1 lists the depth of networks, the size of parameter model, the number of iterations, the verification errors and test errors of different model structures for TS-DFT of fivesoliton molecules. Thereinto, the DenseNet of 161(k = 48) has the best testing results with smallest error 2.2355 and faster convergence rate on the comprehensive. Because overfitting cannot be avoided completely and different networks have different inhibitory overfitting effects. Thus, the trends of verification error and testing error have a little inconsistency. From Figure 4, the error trend remains the same: the lower the verification error, the lower the testing error. Here we evaluate the accuracy of the networks mainly based on the error of the test data. It can be seen from Table 1 and Figure 4 that VGG networks have the worst effect for phase extraction. Its minimum testing error is high, 5.2528. DenseNet, with minimum testing error 2.2355, has a slightly smaller advantage over ResNet whose value is 2.6260. By comparing the verification errors of the optimal results in each DCNs, as shown in Figure 4d, we can still conclude that the VGG shows the worst convergence and the optimal one is the DenseNet, where the networks with shortcut connection can suppress gradient explosion better than the common convolutional network.

Pearson Correlation Analysis of Real and Predicted Values
Next, we compare the real relative PDs (black lines) with the extraction results (red lines) from the optimal model in each DCN. The left column in Figure 5a is the PDs extracted from VGG-17 with a minimum error of 5.2528. Figure 5b plot the PDs extracted from ResNet-77 with a minimum error of 2.6260. In addition, the PDs extracted from DenseNet-161 (k = 48) with a minimum error of 2.2355 in Figure 5c. The correlation between the real value and the extracted value is analyzed by Pearson Correlation Coefficient (PCC). The mean Pearson correlation coefficients (MPCC) of each group of PDs are 0.9739, 0.9906, 0.9975 which correspond to DCNs of VGG-PDs, ResNet-PDs, and DenseNet-PDs, respectively. After comparing the VGG, ResNet, and DenseNet, the ResNets and DenseNet represent fewer smaller error and lower complexity than VGG and can easily enjoy accuracy gains from greatly increased layers. It is worth noticing that extremely deep nets with shortcut paths are easy to optimize, but simply stack layers exhibit higher testing error when the depth increases [21]. Because short paths in the network have a strong regularizing effect and reduce overfitting on smaller training sets [30]. Besides, DenseNets we used have better parameter efficiency and more lower error than ResNets in the test data. It has been reported that DenseNets are easier to train due to their improved information flow and gradients throughout the network [30,31]. On these, the DenseNets have the best testing results with smallest testing error and superior parameter efficiency on the comprehensive. They tend to require far fewer parameters when compared against alternative algorithms with comparable accuracy. Consequently, we infer that the DCNs model have the potential to analyze the dynamics of more complex soliton molecules and DenseNets performs best.

Conclusions
The methods based on DCNs can solve the situation of more solitons and existence of equidistant soliton pairs where the autocorrelation method is limited. Comparing the VGG, ResNet, and DenseNet models, we demonstrate their effectiveness on TS-DFT interference spectra of more complex five-soliton molecules datasets with equal spacing pairs. The DenseNets outperform VGG and ResNets in extracting the internal information from complex five-soliton molecules, where the second best is the ResNets whether considering parameter efficiency or testing error. The investigation on the soliton molecule in the PMLs would contribute to understanding the complex nonlinear dynamics of pulse propagation in PMLs and benefit the potential applications of telecommunications and fiber laser sources. This provides the possibility of simulating the dynamic behaviors of complex chemical molecules and other multibody systems based on soliton molecules in PMLs optically. We expect that our method can promote simulating the dynamic behaviors of complex chemical molecules and other multibody systems based on soliton molecules in PMLs optically and explore the potential mechanism of the distribution and evolution of a large numbers of solitons in a dissipative system.

Data Availability Statement:
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request. The data processing and simulation codes that were used to generate the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
This manuscript has not been published or presented elsewhere in part or in entirety and is not under consideration by another journal. We have read and understood your journal's policies, and we believe that neither the manuscript nor the study violates any of these. There are no conflicts of interest to declare.

Abbreviations
The following abbreviations are used in this manuscript: TS-DFT time-stretch dispersive Fourier transformation DCNs deep convolution networks MPCC mean Pearson correlation coefficient PDs relative phase differences