Semi-Supervised Deep Learning Model for Efficient Computation of Optical Properties of Suspended-Core Fibers

Suspended-core fibers (SCFs) are considered the best candidates for enhancing fiber nonlinearity in mid-infrared applications. Accurate modeling and optimization of its structure is a key part of the SCF structure design process. Due to the drawbacks of traditional numerical simulation methods, such as low speed and large errors, the deep learning-based inverse design of SCFs has become mainstream. However, the advantage of deep learning models over traditional optimization methods relies heavily on large-scale a priori datasets to train the models, a common bottleneck of data-driven methods. This paper presents a comprehensive deep learning model for the efficient inverse design of SCFs. A semi-supervised learning strategy is introduced to alleviate the burden of data acquisition. Taking SCF’s three key optical properties (effective mode area, nonlinear coefficient, and dispersion) as examples, we demonstrate that satisfactory computational results can be obtained based on small-scale training data. The proposed scheme can provide a new and effective platform for data-limited physical computing tasks.


Introduction
Suspended-core fiber (SCF) is a microstructured fiber (MOF) that is considered an excellent alternate choice for enhancing the nonlinear properties of optical fibers [1,2]. Compared to other microstructured fibers, SCF is easier to fabricate and can achieve higher numerical apertures (NA). Therefore, it is widely used in various fiber laser applications, such as Raman fiber lasers [3], Brillouin fiber lasers [4], etc. In addition, its application is also extended to high-capacity networks [5].
Accurate modeling and optimization of SCF structures usually rely on numerical computational methods such as finite difference, finite element methods [6], block iterative frequency domain methods [7]. Plane wave expansion [8]. However, large-scale iterative analysis is often required to obtain more accurate results in modeling and optimizing fiber optic structures. In addition, the large number of structural design parameters makes each round of iterative analysis time-consuming, and complex structures require multiple simulations to optimize the design. These requirements seriously reduce the efficiency of numerical methods and pose a serious challenge to traditional modeling and optimization methods.
With the rapid development of artificial intelligence (AI), especially the gradual improvement of deep learning (DL) techniques, researchers are actively seeking to apply DL techniques to solve challenging tasks, including materials science [9], chemistry [10], laser physics [11], particle physics [12], and quantum mechanics [13]. DL model that, as a typical data-driven approach, can learn the respective complex nonlinear relationships in a given dataset and abstract their relationships to form a solution strategy through a computational model consisting of multiple layers of data processing units. This avoids the high cost and inefficiency caused by human intervention in solution computation and the direct interaction between the fundamental physical laws in traditional optimization. As DL techniques spread in various fields, research on optics or optical systems based on DL has been explored and refined. On the one hand, a well-trained DL model can be utilized as a fast solver to predict a physical quantity, such as in fiber-optic demodulation systems [14,15], optical computational imaging [16], and biomedical engineering [17]. In addition, it can be seen as an optimization tool without expert empirical intervention for applications in the inverse design of optical materials, such as metamaterials [18][19][20], integrated photonics [21], plasmonics [22][23][24], etc.
DL techniques have now been widely applied to the problem of solving optical properties during the inverse design of MOFs. Da Silva Ferreira et al. combined a multilayer perceptron (MLP) and an extreme learning machine-artificial neural network (ELM-ANN) to calculate photonic crystal (PCs) for dispersion relations [25]. Chugh et al. introduced an ANN model to achieve fast computation of various optical properties of PC fibers (PCF) [26]. Yuan et al. applied a back-propagation neural network (BPNN) to accelerate the calculation process of SCF's optical properties [27]. Without exception, however, the premise that these works guarantee superior performance requires a large-scale a priori data set to train the model. However, collecting and labeling the dataset is laborious and tedious. Therefore, exploring an effective deep learning model that can mine the nonlinear physical relationship between SCF geometric and optical properties in minimal data, is crucial in the reverse design process of MOFs.
In this paper, we proposed an efficient deep learning model for optimizing the optical parameter solving process for the inverse design of SCF structures with limited available a priori data. The proposed model consists of a Generative Adversarial Network (GAN) for data augmentation, and a back-propagation neural network (BPNN) cascaded for optical properties computation. The model employs a semi-supervised learning strategy that performs online data augmentation during training based on pre-collected data input to the computation model. The computation model calculates the optical properties of the designed SCF structure, including the effective mode area, nonlinear system, and dispersion. Experimental results show that the proposed model can obtain efficient and highly accurate optical property calculations with extremely limited a priori data. In conclusion, the model creates a new framework and platform for the efficient design of optical fiber structures with arbitrary microstructures and also provides reliable support for DL-based physical computational problems under minimal data sets.
This work is organized as follows. Section 2 describes the design and theoretical analysis of the SCF model in this work. Section 3 describes the structure and configuration of the proposed semi-supervised deep learning model for computing the optical properties of the SCF. Section 4 compares the predicted and actual values of the optical properties. Section 5 concludes this work.

Structure Design of SCFs
The structures of the SCFs used in this work are reproduced from reference [28]. Figure 1a-c show the cross-sectional schematics of the three-, four-, and six-bridges (cantilever) SCFs, respectively, and the corresponding Figure 1c shows the fundamental optical mode field distributions of the three-, four-, and six-bridges (cantilever) SCFs. The diameter, width of the cantilever, and the number of the cantilever of SCFs structure are denoted by d, W, and n, respectively.

Basic Theoretical Analysis
The real part of the RI of the chalcogenide materials can be calculated from Sellmeier equation as shown in Equation (1) [29].
The effective index of the fundamental mode propagation of the SCF optical field distribution can be obtained by a finite element model (FEM). With the known effective index (n e f f ), the dispersion of the SCF as a function of wavelength can be expressed by Equation (2).
where Re n e f f (λ) is the real part of the effective index, and c is the light velocity in free space.
where c is the light speed in free space, n 2 is the nonlinear refractive index and n 2 of As 2 S 3 is 4.2 × 10 −18 m 2 /W, the A e f f denotes the effective area of the propagation mode in the SCF and is defined as Equation (4) [27].
where E(x, y) is the distribution of the optical field across the SCF cross-section, and NLR stands for nonlinear material region.

Data Pre-Processing
The data pre-processing is necessary to avoid the problem of slow convergence before the training dataset is fed into the neural network model. The 'Min-Max' normalization was used to linearly transform the original data and map the original data to the interval [0, 1], which can eliminate the adverse effects caused by singular samples in the characteristic data and improve the convergence speed and performance of the model. The transformation function can be defined as Equation (5).
where x represents the original eigenvalue of SCF and x * represents the normalized result.

Model Design for Optical Properties Calculation
The semi-supervised deep learning model used to compute the optical properties of the SCF is shown in Figure 1. The data processing process of the collation model is shown in Figure 2a, where the original data set is input to the GAN after the data mentioned above pre-processing, and the network generates a series of generative data with a distribution close to the original data. The structure of the GAN is shown in Figure 2b, which is an unsupervised deep learning model for generating data, consisting of a separate generative model (G) and discriminative model (D). The objective of the G is to generate actual data to deceive the D as much as possible, while the objective of the D is to distinguish the data generated by the G from the actual data as much as possible. The training process is a dynamic game process. The original data, instead of random noise, was used as the output of the G to make the generated data more realistic. The original data is fed to the D together with the generated data. The G is a typical encoder-decoder structure. In contrast, the D outputs a data truth score to encourage the G to generate actual data as much as possible, and their structures are shown in the red dashed box in Figure 2b. Then the generated data were fed to the BPNN for computation with the original data, and the network structure is shown in Figure 2c.
GAN and BPNN are entirely independent in the training phase, where the training process of GAN is a "Min-Max" dynamic game process. The loss function of the network can be established as Equation (6), while the training process of BPNN is an error back propagation process, and MSE is chosen as its loss function, which can be defined as Equation (7).
where Z is the random noise fed into the G, n represents the total number of samples, and y i is the predicted value of the neural network model, y i is the actual value.
To ensure that the parameters are approximate to nonlinear functions in the error propagation process, Sigmoid [30] was used as the activation function between hidden layers, and the function can be formulated as Equation (8): where x is the input of the hidden node.
where n represents the total number of samples, y i is the predicted value, and y i denotes the actual value.

Model Training Process
In the training process, MSE was used as the loss function. We calculated the MSE of the training and verification process of different learning rates and optimizers (SGD, Adagrad, Adam), as shown in Table 1. The similar MSEs on the validation dataset indicate that the models trained by all three optimizers can achieve satisfactory results. Detailed prediction results and analysis of the effective mode area, nonlinear coefficients, and dispersion of the SCF will be presented in the following.

Effective Mode Area
The waveguide characteristics played by SCF in mid-infrared applications depend mainly on its effective mode area. This relationship can be seen in Equation (3). The smaller its mode area, the greater the advantage of the fiber in nonlinear applications. Conversely, a larger effective mode area can also enhance the effectiveness of SCF in optical power transmission applications.
This section shows the proposed model's results in predicting the SCF's effective mode area (A e f f ). The predicted and actual values for each sample in the test dataset are compared, and the performance of the obtained model with different training configurations is analyzed, as shown in Table 2. Among them, the SGD optimizer and the Adam optimizer have better performance.
Based on the above conclusions, the models trained by Adam and SGD were selected for testing to compare the error between the actual and predicted values of A e f f , as shown in Figure 3. As these points get closer to y equal to 0, the residual between the predicted and actual values is more diminutive. Their proximity to the origin is positively correlated with the model's performance. The model with Adam as the optimizer works best in the effective pattern area prediction for SCF when the learning rate is 0.001. This means that the best-performing model can be obtained with this set of training configurations (Optimizer: Adam, LR: 0.001).  The A e f f corresponding to these parameters has never been recorded or provided. Figures 4-6 show the A e f f predictions of the model when the number of cantilevers is 3, 4, and 6, respectively. As can be seen in the A e f f prediction results for the SCF with several cantilevers of 3, the network trained by the Adam optimizer predicts best when the wavelength is between 3 µm and 4 µm. When the number of cantilever beams is 4, the predicted value of A e f f by the network trained by Adam is closer to the actual value in the wavelength range of 1 µm-2 µm. When the number of cantilever beams is 6, in the wavelength range of 2 µm to 3 µm, the best network prediction trained by Adam is still obtained.
Combined with Figures 4-6, the predicted value of A e f f in the test dataset of Adam trained model is closer to the real value. This can be proved by the quantitative analysis in Table 3 that our proposed model can work well in predicting A e f f of SCF.

Nonlinear Coefficient
This section uses the well-trained model to predict the nonlinear coefficient (γ) directly related to the performance presented by the SCFs in nonlinear applications. Table 4 shows the quantitative analysis of the prediction results for the γ value. The model trained with the SGD optimizer and Adagrad optimizer could achieve better results. Figure 7 shows the magnitude of the residuals between the predicted and actual γ values of the test samples for the models trained with different LRs by the SGD optimizer and the Adgrad optimizer. The γ predicted values of the SGD-trained model are closest to the actual values. Combined with Table 4 and Figure 7, the model trained by the SGD optimizer with 0.001 LR shows the best results in terms of evaluation metrics, which means that the trained model configured by this parameter achieves superior performance in terms of γ prediction.   Figures 8-10 show the comparison results when the number of cantilever beams is 3, 4, and 6, respectively. The γ values gradually decrease as the wavelength increases, and the predicted values of γ of the network model trained by SGD are closer to the actual values. In addition, the network trained by SGD shows higher prediction performance than other networks in the wavelength range of 2 µm to 3 µm.
The proposed model can predict γ well during the linear variation of wavelength. Combining Figures 8-10, the predicted values of γ in the SGD training model test dataset (compared with other optimizers) are closer to the actual values. Moreover, the quantitative analysis in Table 5 proves that our proposed model can better predict the γ value of SCF.

Dispersion
Dispersion (D) is essential in generating SCF-based supercontinuum spectra. Table 6 shows the quantitative analysis of the models in terms of D-value prediction, where the models trained by SGD and Adam show good performance. Figure 11 shows the magnitude of the residuals between the predicted and actual values of D in the test set of the model trained by the SGD optimizer and Adam optimizer at different LRs. The model trained by Adam with an LR of 0.001 achieves the best performance.   Figures 12-14 show the comparison results when the number of cantilevers is 3, 4, and 6, respectively. The network trained by the Adam optimizer makes a better prediction of the D value of the SCF as the wavelength varies linearly, implying that its prediction is closer to the actual value than the other two optimizers. However, during the model's prediction of the D value, its predicted value is smoother with the wavelength, probably because the D value shows severe irregularities in the range of wavelength variation. The model is less sensitive to such nonlinear irregularities. Even though the proposed model can present superior prediction results, the problem still poses an obstacle to its excellent prediction results. To address this problem, it is straightforward to consider increasing the number of iterations and adjusting the nonlinear activation function to solve it.
Combining Figures 12-14, we can see that the predicted values of D in the test dataset for the Adam-trained model (compared to the other optimizers) are much closer to the actual values. Combined with the quantitative analysis in Table 7, we can prove that the proposed model can predict D effectively.

Discussion
During the experiments, the LRs were adjusted to 0.01, 0.005, and 0.001. SGD, Adam, and Adagrad were selected as the optimizers to explore the model's performance. Combined with the above experimental results, the Adam-based model has a good prediction effect on the A e f f and D. The model with SGD optimizer is the best in the prediction process of γ. The LR impacted the model's performance when well-trained models were used to predict optical properties. Table 8 shows the combined prediction of the model for the three optical properties (evaluated in terms of MSE) under different training configurations (optimizer and LR). The overall prediction performance of the model is best when the LR is 0.001. The proposed model can predict the important optical properties of SCF well. Furthermore, It is straightforward to consider increasing the feature dimensionality of the input, increasing the number of iterations, and using more significant augmented data to improve the model's predictive performance further. However, these strategies will also inevitably increase resource utilization and iteration time. Table 8. MSE averages for models trained using different training configurations (optimizer and LR) predict different optical properties, the bold denotes the optimal in this prediction.

Optimizer
LR

Conclusions
In summary, this paper presents a comprehensive semi-supervised deep learning model that can efficiently solve important optical properties (effective mode area, nonlinear coefficients, and dispersion) in the inverse design of an SCF with an extremely limited a priori dataset. The model consists of a GAN, which is used to augment the limited a priori data set efficiently, and a BPNN cascade, which receives both the original and augmented data sets for calculating and optimizing the optical properties of the SCF. It is experimentally demonstrated that the model accurately predicts the optical properties. In addition, different training configurations (optimizer and learning rate) were used to train the model during the experiments to demonstrate the reliability of the model. The strategy can provide practical support for the inverse design of microstructured optical fibers that can be used for other materials and a new and reliable platform for machine learning-based physical property calculations with extremely limited a priori data sets.