Data-Enhanced Deep Greedy Optimization Algorithm for the On-Demand Inverse Design of TMDC-Cavity Heterojunctions

A data-enhanced deep greedy optimization (DEDGO) algorithm is proposed to achieve the efficient and on-demand inverse design of multiple transition metal dichalcogenides (TMDC)-photonic cavity-integrated heterojunctions operating in the strong coupling regime. Precisely, five types of photonic cavities with different geometrical parameters are employed to alter the optical properties of monolayer TMDC, aiming at discovering new and intriguing physics associated with the strong coupling effect. Notably, the traditional rigorous coupled wave analysis (RCWA) approach is utilized to generate a relatively small training dataset for the DEDGO algorithm. Importantly, one remarkable feature of DEDGO is the integration the decision theory of reinforcement learning, which remedies the deficiencies of previous research that focused more on modeling over decision making, increasing the success rate of inverse prediction. Specifically, an iterative optimization strategy, namely, deep greedy optimization, is implemented to improve the performance. In addition, a data enhancement method is also employed in DEDGO to address the dependence on a large amount of training data. The accuracy and effectiveness of the DEDGO algorithm are confirmed to be much higher than those of the random forest algorithm and deep neural network, making possible the replacement of the time-consuming conventional scanning optimization method with the DEDGO algorithm. This research thoroughly describes the universality, interpretability, and excellent performance of the DEDGO algorithm in exploring the underlying physics of TMDC-cavity heterojunctions, laying the foundations for the on-demand inverse design of low-dimensional material-based nano-devices.


Introduction
One of the most significant current research topics in low-dimensional materials is transition metal chalcogenides (TMDC) due to their attractive and alluring physical properties [1], strong exciton oscillator strength associated with a large binding energy [2], and evident valley degree of freedom [3]. These provide TMDCs with a great potential for versatile applications in light emitting, light detection, and optical modulation [4][5][6]. Take monolayer (ML) WS 2 for instance, which not only owns a direct bandgap and an exciton binding energy of ∼0.71 eV but also demonstrates large optical absorption, high quantum yields and a strong photoluminescence (PL) effect in the visible regime [7,8].
In practice, it is not feasible to directly use the optical properties of ML WS 2 in photonic devices considering the giant nonradiative decay rates. Fortunately, the photonic cavity is viewed as one of the promising platforms that can easily tune and engineer local optical density states of TMDCs after their integration by means of the coupling effect between the cavity mode and TMDC excitons [9,10], which greatly enhances the luminescence and and RF belongs to machine learning. In terms of operating mode, RF and DNN belong to direct prediction methods, while the DEDGO algorithm is an iterative optimization method. The iterative optimization method mainly relies on the "reward function network" (i.e., "Model" in the right box). It optimizes the prediction of geometric parameters according to the Reward Vectors (i.e., the output of the model). If the spectral error is less than the threshold, the optimization is completed. Otherwise, the new spectra will compose new State Vectors and continue the iteration process. This strategy realizes the inverse design through automatic and intelligent iterations, improving the performance and robustness of the model. The effect of this strategy cannot be achieved through iterations using the direct prediction methods, since their predictions remain unchanged when the target spectrum is constant. The design inspiration of the "reward function network" mainly comes from the "reward function" in RL. Retaining some RL factors that are conducive to solving the optimization problem in a local and greedy way is a wise choice, considering the difficulties in training an RL model with high suitability. Here, "greedy" means that the algorithm only makes the most beneficial choice for the current rather than the overall situation. The other pivot is DATA ENHANCEMENT (DE), targeting enriching the dataset using "pseudo data" to train large networks with small amounts of simulated data, decreasing the time consumption in spectroscopic simulations, and improving the prediction performance of the algorithm. The "pseudo data" are generated statically, which guarantees the denoising and filtering for it before utilization, reducing the systematical errors introduced by data enhancement. Especially, the "pseudo data" should be distinguished from the "original data" generated by the traditional rigorous coupled wave analysis (RCWA) method.
while DNN only belongs to DL, and RF belongs to machine learning. In term mode, RF and DNN belong to direct prediction methods, while the DEDG an iterative optimization method. The iterative optimization method main "reward function network" (i.e., "Model" in the right box). It optimizes th geometric parameters according to the Reward Vectors (i.e., the output o the spectral error is less than the threshold, the optimization is completed. new spectra will compose new State Vectors and continue the iteration pro egy realizes the inverse design through automatic and intelligent iteratio the performance and robustness of the model. The effect of this strat achieved through iterations using the direct prediction methods, since th remain unchanged when the target spectrum is constant. The design ins "reward function network" mainly comes from the "reward function" in some RL factors that are conducive to solving the optimization problem greedy way is a wise choice, considering the difficulties in training an RL m suitability. Here, "greedy" means that the algorithm only makes the most b for the current rather than the overall situation. The other pivot is DAT MENT (DE), targeting enriching the dataset using "pseudo data" to train with small amounts of simulated data, decreasing the time consumption in simulations, and improving the prediction performance of the algorithm data" are generated statically, which guarantees the denoising and filteri utilization, reducing the systematical errors introduced by data enhancem the "pseudo data" should be distinguished from the "original data" gener ditional rigorous coupled wave analysis (RCWA) method. Concretely, The DEDGO algorithm is applied to handle the inverse d of TMDC-cavity heterojunctions with a three-dimensional parameter spa incident angle θ, the structural variable g, and whether the photonic cavi grated with the monolayer WS2. By utilizing the rigorous coupled wave an Concretely, The DEDGO algorithm is applied to handle the inverse design problem of TMDC-cavity heterojunctions with a three-dimensional parameter space, namely, the incident angle θ, the structural variable g, and whether the photonic cavity is to be integrated with the monolayer WS 2 . By utilizing the rigorous coupled wave analysis (RCWA) approach and the DEDGO algorithm, the reflection spectra of multi-shaped heterojunctions with geometrical parameters are studied. The process of the algorithm can be separated into two phases, namely, the training that focuses on data enhancement and network training and the testing that utilizes the reward function model to optimize the solution greedily step by step. Importantly, comprehensive results confirm that the DEDGO algorithm is a powerful and promising tool that can not only achieve the efficient inverse design for multiple heterojunctions and the exploration of the underlying physics regarding strong coupling effects in a precise (>85% effectiveness) and rapid manner (several seconds for 89 testing samples) but also has obvious advantages over traditional scanning optimization methods and many other machine learning methods in terms of universality, accuracy, and time consumption.

Heterojunctions with Different Structures
The main aim of this work is to achieve the on-demand design of cavity-TMDCbased heterojunctions utilizing relatively few reflection spectra generated by traditional RCWA simulations. More specifically, the schematic illustration of the monolayer (ML) WS 2 -photonic cavity-integrated heterojunction is presented in Figure 2. One can easily discern the light-matter interaction between the heterojunction and an incident light with an incident angle of θ. Herein, five different categories of cavity-TMDC-based heterojunctions are studied in this work, whose geometrical structures are shown in Figure 2a-e, respectively: (1) Figure 2a represents a Fabry-Perot (FP) microcavity containing two silver mirrors and an LiF integrated with ML WS 2 that is inserted in the middle of the LiF layer, whose hybrid structure is referred to as S1. (2) The WS 2 -based heterojunction denoted as S2 is illustrated in Figure 2b, whose cavity is similar to the S1 cavity, with the structural difference being the distributed Bragg reflector (DBR) consisting of four layers of SiO 2 and TiO 2 .
(3) Figure 2c shows a S3 heterojunction possessing ML WS 2 that is sandwiched between a one-dimensional (1D) periodically PMMA photonic crystal (PhC) and a passivated silver substrate. (4) The S4 heterojunction comprising ML WS 2 and a Tamm-plasmon cavity is displayed in Figure 2d, in which the top mirror is a 40 nm silver film and the bottom mirror is a 10-layer DBR (SiO 2 and TiO 2 pairs). (5) Figure 2e shows a S5 heterojunction whose structure is akin to S3 but with waveguide type changes from strip to rib in 1D PhC. It is significant to emphasize that altering the geometrical parameters of photonic cavities allows for the control and engineering of cavity photonic modes, which then affect the coupling effect between the cavity and ML WS 2 . Thus, we also change the parameter g of microcavities in S1-S5 heterojunctions, namely, the half-thickness of the LiF film in S1 (g1: 80-92 nm), the half-thickness of the SiO 2 film in S2 (g2: 74-86 nm), the width of the PMMA strip waveguide in S3 (g3: 160-400 nm), the thickness of the TiO 2 layer in S4 (g4: 59-71 nm), and the width of the PMMA rib waveguide in S5 (g5: 40-160 nm). For convenience, the thickness of the ML WS 2 film used in these structures is fixed to 1 nm, whilst in cases of pure photonic cavities, the thickness of ML WS 2 is automatically set to 0 nm in numerical simulations. In addition to the structural variable g, the incident angle is also an important factor for the inverse design, which varies between −40 degrees and 40 degrees. As a matter of fact, the incident angle θ is measured in degrees, and the structural variable g is measured in nanometers, accompanied by the different variation ranges of g1-g5, making it impossible to directly apply these datasets in the DEDGO training process. Therefore, a normalization method is highly desired to make the above parameters be on the same scale. Thus, the parameter normalization method can be represented as [45]: where x represents the actual value, m in x stands for the minimum value, and  is the distance of discrete attributes. Notably, the dataset contains only reflectance spectra with non-negative incident angles are included in the dataset due to the symmetry feature, with the incident angle resolution being 1°. Here, the structural variable g is regulated to be positive integers varying between 1 and 13 after normalization, whereas the thickness of ML WS2 can only be selected between 0 and 1 for pure photonic cavity and TMDC-cavity heterojunctions, respectively. The RCWA method is a fundamental finite element analysis method to characterize the interaction between electromagnetic waves and nanostructures, which is specifically tailored for multilayer structures. Here, the RCWA approach is employed to calculate the reflection response of S1-S5 heterojunctions and their corresponding pure cavities, generating 298 reflection spectra serving as the training dataset for the DEDGO algorithm, with a wavelength range of 500-900 nm and an incident angle change from −40~40 degrees. As a matter of fact, the incident angle θ is measured in degrees, and the structural variable g is measured in nanometers, accompanied by the different variation ranges of g1-g5, making it impossible to directly apply these datasets in the DEDGO training process. Therefore, a normalization method is highly desired to make the above parameters be on the same scale. Thus, the parameter normalization method can be represented as [45]: where x represents the actual value, x min stands for the minimum value, and δ is the distance of discrete attributes. Notably, the dataset contains only reflectance spectra with non-negative incident angles are included in the dataset due to the symmetry feature, with the incident angle resolution being 1 • . Here, the structural variable g is regulated to be positive integers varying between 1 and 13 after normalization, whereas the thickness of ML WS 2 can only be selected between 0 and 1 for pure photonic cavity and TMDC-cavity heterojunctions, respectively. The RCWA method is a fundamental finite element analysis method to characterize the interaction between electromagnetic waves and nanostructures, which is specifically tailored for multilayer structures. Here, the RCWA approach is employed to calculate the reflection response of S1-S5 heterojunctions and their corresponding pure cavities, generating 298 reflection spectra serving as the training dataset for the DEDGO algorithm, with a wavelength range of 500-900 nm and an incident angle change from −40~40 degrees. Notably, the whole simulation procedure of RCWA covers the structure model design, the Nanomaterials 2022, 12, 2976 6 of 17 optical properties implementation for each material, the boundary condition determination, the convergence test, and the scanning simulation. The original datasets are divided into three parts, which are used in the control and experimental groups in different proportions: (1) 57%, 13%, and 30% for the training, validation, and testing in control groups (without data enhancement); (2) 70%, 0%, and 30% for the training, validation, and testing in experimental groups (with data enhancement). The validation datasets of experimental groups are composed of "pseudo data" calculated via data enhancement.

Data-Enhanced Deep Greedy Optimization Algorithm
The working flow of the DEDGO algorithm is shown in Figure 3, which contains two main modules. The first module is the forward prediction networks, trained for data enhancement. The second module is the reward function networks, trained for deep greedy optimization, which acquire angle and structural variable parameters step by step and distinguish previous deep learning methods that attempt to obtain the structural parameter value directly. The essential innovation of the DEDGO algorithm lies in the data source and in the usage of reward function networks. In particular, the data enhancement method can rapidly expand datasets and easily meet the data requirements of deep learning methods at a low cost, improving the fitting level of the model to complex nonlinear relationships between parameters and reflection spectra. On the other hand, the deep greedy optimization can realize higher accurate predictions of parameters in an acceptable speed using a simple network compared to other machine learning approaches. This is mainly due to its "iteration" and "reset" mechanism, which can reduce the model training difficulty by avoiding predicting parameters directly.
The exhaustive process of training is described as follows: first, a neural network for forward predictions with seven hidden layers is trained with the original training spectra. Notably, this work uses two slightly different network structures for the S1-S5 heterojunctions to predict their reflection spectra precisely. The network takes the angle and geometrical parameters as its input and the spectra indicating the wavelength from 500 nm to 900 nm as its output. Furthermore, data enhancement based on this model has shown a high reliability. Since the absolute values of reflection spectra usually vary between 0 and 1, it is necessary to convert the spectra into a space with greater absolute values in order to make the network extraction much easier [34]: where y represents the original spectra and y denotes the values after calculation. The loss function of the forward prediction network is described by the mean absolute error (MAE) [37]: where n is the batch size, y' pred represents the prediction of the spectra, and y' true is the corresponding true reflection spectra. The MAE of the results is in the 10 −2 order of magnitude, indicating the credibility of the predicted spectra. In this instance, approximately 750 reflection spectra are generated within 0.06 s to intensify the dataset as "pseudo" spectra. The angle θ and structural variable g combinations from the "pseudo" spectra are different from the spectra created via RCWA simulations. The "pseudo" spectra are added to the training and validation datasets in a 1:1 proportion. Note that "pseudo" spectra are not used to test the model to avoid possible spectral errors. Additionally, the lookup table is established to speed up the subsequent step, producing training data for reward function networks. 7 of 17 Nanomaterials 2022, 12, x FOR PEER REVIEW 7 of 17 Figure 3. Schematics of the DEDGO algorithm for the efficient inverse design of multiple heterojunctions. The algorithm can be divided into two main stages, namely, training (left) and testing (right). The first stage is applying the forward prediction networks to intensify the dataset (A,B) and training the reward function networks for the next part (C,D). The second part is utilizing the trained reward function networks to predict structural parameters for heterojunctions in a gradual optimization way (I,II).
The exhaustive process of training is described as follows: first, a neural network for forward predictions with seven hidden layers is trained with the original training spectra. Notably, this work uses two slightly different network structures for the S1-S5 heterojunctions to predict their reflection spectra precisely. The network takes the angle and geometrical parameters as its input and the spectra indicating the wavelength from 500 nm to 900 nm as its output. Furthermore, data enhancement based on this model has shown a high reliability. Since the absolute values of reflection spectra usually vary between 0 and 1, it is necessary to convert the spectra into a space with greater absolute values in order to make the network extraction much easier [34]: where y represents the original spectra and yʹ denotes the values after calculation. The loss function of the forward prediction network is described by the mean absolute error (MAE) [37]: Figure 3. Schematics of the DEDGO algorithm for the efficient inverse design of multiple heterojunctions. The algorithm can be divided into two main stages, namely, training (left) and testing (right). The first stage is applying the forward prediction networks to intensify the dataset (A,B) and training the reward function networks for the next part (C,D). The second part is utilizing the trained reward function networks to predict structural parameters for heterojunctions in a gradual optimization way (I,II).
To validate the effectiveness of the forward prediction network in the DEDGO model, we illustrate the comparison results of the reflection spectra calculated by the RCWA method and the forward prediction network for the S1, S3, S4, and S5 heterojunctions and their pure cavity counterparts in Figure 4. Note that the reflection spectra of S1 and S2 are highly similar, so we only display the reflectance result of S1 in this figure. Here, we randomly select the incident angle of reflection spectra to be 20 • and fixed geometrical parameters for four categories of samples. Specifically, the values of structural variable g are given as follows: S1: LiF, 85 nm; S3: PMMA, 22 nm; S4: TiO 2 , 65 nm; S5: PMMA, 100 nm. One distinctive finding from Figure 4 is that the DEDGO-predicted results are in extremely good agreement with the RCWA-simulated spectra, indicating the dependability and high accuracy of the DEDGO network using "pseudo data". Furthermore, it can be easily extracted from Figure 4 that, after the integration of ML WS 2 with the photonic cavity, extra reflectance resonances occur in the spectra, in contrast to the original one main resonant peak, indicating the physical phenomena of Rabi splitting and the strong coupling effect.
PMMA, 100 nm. One distinctive finding from Figure 4 is that the DEDGO-predicted results are in extremely good agreement with the RCWA-simulated spectra, indicating the dependability and high accuracy of the DEDGO network using "pseudo data". Furthermore, it can be easily extracted from Figure 4 that, after the integration of ML WS2 with the photonic cavity, extra reflectance resonances occur in the spectra, in contrast to the original one main resonant peak, indicating the physical phenomena of Rabi splitting and the strong coupling effect. There are distinct differences between the forward prediction network and the reward function network, which are mainly reflected in the network structure, input, and There are distinct differences between the forward prediction network and the reward function network, which are mainly reflected in the network structure, input, and output. The reward function network is a 5-hidden-layer model, whose input and output are an 804-number and a 99-number vector, respectively. The input vector consists of three parts, namely, the current angle θ and structural variable g, the current spectrum, and the target spectrum. The current angle θ and structural variable g are inseparable from the current spectrum. These two elements, together with the target spectrum, make up the input vector called "state", which contains two aspects of information. Firstly, it embodies the implicit relationship between the incident angle, structural variable, and reflection spectrum. Secondly, it provides an opportunity for the model to extract the diversity between the two spectra, making greedy optimization possible. Here, "optimization" means reducing the discrepancies between the current spectrum and the target spectrum to obtain more precise used parameters by changing the incident angle and structural variable, whose adjustment process is defined as "action". Moreover, "greedy" indicates that the model selects actions that better reduce the spectral differences, characterized by the usage of mean absolute percentage error (MAPE) [45]: Here, "MAPE" is calculated from the target spectrum and the spectrum corresponding to the "current parameters" in the "state vector". Similarly, it is necessary to quantify the actions in a proper way. Thus, the "reward" for each action can be described as follows: where MAPE ori and MAPE new denote the errors before and after executing the action, respectively. The error between the target spectrum and the spectrum corresponding to new predicted parameters is reduced when "∆MAPE" is greater than 0. Obviously, the better the action that can narrow the spectral gap, the higher the reward that can be obtained. According to the "greedy" principle, the algorithm will select the "action" with the largest reward to execute. On this condition, the reward of the selected action must be greater than or equal to 0. Notably, the reward equals 0 when the action is "changing the parameters by 0 units". Therefore, each iteration gains some improvements and optimization. It is almost impossible to optimize the MAPE to zero due to the potential small errors introduced in each step. In the circumstances, the spectral error is small enough when the MAPE is less than 0.25 for functional consideration, and the current incident angle and structural variable can be regarded as proper target parameters. The algorithm sets up 99 actions to achieve this goal as quickly as possible and to minimize the training difficulty of the model. The adjustment range is −5~5 for angle θ, which is −4~4 for structural variable g. The output vector contains the rewards for these 99 actions, during which the reward function networks need to learn the relationship between actions and rewards in different current spectra and target spectra. The theme of the testing stage is to employ the trained model to calculate the corresponding parameters of the spectra. The current angle and structural variable are set to 0 and 1 to ensure the validity and reliability, respectively. This preset corresponds to "Initialize" in Figure 3II. It is indispensable to compare the initial spectrum with the spectrum for the test and calculate their MAPE. If the MAPE is less than the threshold, the sample can be exempted from optimization; thus, 0 and 1 are taken as the true values of the target parameters. Otherwise, the vector is input into the trained reward function network, and the action with the maximum reward is chosen to optimize the vector. After that, the corresponding spectral response can be compared with the target value to calculate the MAPE. In particular, the action is recorded to verify whether the action itself changes the parameters by zero or whether the optimization is in a loop. Specifically, a "loop" means that the total changes of the two parameters by several actions are both zero. The occurrence of these two cases implies that the process is stuck in a local optimum. If the MAPE is less than the current threshold, the inverse design is still successful. Otherwise, it is requisite to reset the input vector to guarantee that the optimization can continue normally. If the algorithm does not get stuck in the local extrema and the MAPE is less than the threshold, the current parameters can be seen as target parameters. In another case with a lager MAPE, the algorithm checks the number of steps that have been run. If it exceeds the limit, the algorithm declares a failure; otherwise, the optimization process continues. The limit is set to 30 in this research considering the balance of time and effect.
The mentioned neural network in the DEDGO algorithm is coded using an opensource artificial intelligent framework, namely, TensorFlow2, and run on an RTX 3090 graphic card and an i9-10900X CPU. The Adam optimization algorithm is chosen as the optimizer to train the networks.

Inverse Design of the Heterojunctions
In this section, the DEDGO algorithm is utilized to achieve the efficient and ondemand inverse design of TMDC-cavity-based heterojunctions, intending to signify the model's superiority. For comparison, RF and DNN algorithms are also applied in the testing stage, indicating the effectiveness of the DEDGO algorithm in inverse design. More specifically, RF uses the original spectra as its input, since processing the spectra according to Equation (3) has little or even negative effects on this method. DNN employs the same hidden layer structure as the reward function network for a better comparison. In particular, the above two methods attempt to directly predict the parameters of the 401-bit testing spectra, which can, however, be regarded as an acceptable prediction in the case where the MAPE of the testing spectrum and the spectrum corresponding to predicted parameters is less than the threshold. The testing dataset for each heterojunction is 89 randomly selected samples from the original spectra composed of 298 items. The number of samples with different WS 2 thicknesses varies slightly, because sampling is conducted with a total number criterion. The typical results of inverse design with the DEDGO algorithm are shown in Figure 5. To demonstrate the reasonableness of the threshold selection and the effectiveness of the DEDGO algorithm, the spectral comparison of true and predicted values for the S1-S5 heterojunctions (1 nm) and their pure cavity counterparts (0 nm) is shown as Figure 5a-e, revealing that the inverse design results can satisfy the optical functional demands when the error is less than the threshold. The reliability of this conclusion is guaranteed by the randomly selected 10 sets of geometrical parameters. Notably, we present the original reflectance spectra calculated via RCWA and the spectra of inverse-designed low-dimensional structures for a vivid comparison. One significant finding from the left panels of Figure 5 is that the relevant inverse design results are consistent with the input data, indicating the strong and powerful inverse design ability of DEDGO. Furthermore, the quantified design results are shown in Figure 5f-j, which can be divided into four groups according to the thickness of WS 2 and whether data enhancement is used.
Two indicators, namely, the success rate and computing time, are chosen to describe the performances of the three algorithms. Here, "success rate" refers to the proportion of predictions that hit the true value accurately or whose spectral errors are less than the threshold, revealing the effectiveness of the algorithm. "Time" records the computational time needed to complete the inverse design process on the testing dataset. It can be easily found from Figure 5f-j that the DEDGO algorithm owns a much larger success rate than RF and DNN for all S1-S5 heterojunctions and the corresponding pure cavities. The relative difference is at least 15% to 30%, especially for cases without data enhancement. Furthermore, an at least 85% success rate can be reached by the DEDGO algorithm for all types of samples with data enhancement, revealing its powerful capability. Concretely, the success rate for S1, S2, and S5 is almost 100%, while that for S3 and S4 is slightly lower (85~95%). Note that the S3 group obtains the most performance improvement, while the S5 group gets the least, probably due to the difficulty of feature extraction. As the difficulty increases, the advantages of the DEDGO algorithm outperform the other two algorithms. Nevertheless, this performance improvement comes, in part, at the cost of a larger time consumption due to the two characteristics, namely, "iteration" and "reset", which introduce extra operations compared with the other two algorithms. The inverse design consumed times of the RF and DNN algorithms are both less than 0.5 s, while it changes from 1.21 to 5.52 s for the DEDGO algorithm. Even though the DEDGO algorithm is a little inferior in this respect, it is still far faster than traditional iterative methods, which pre-configure a parameter combination, calculate the corresponding spectrum, and constantly adjust the parameters until the spectral error is less than the threshold, usually taking tens of minutes or even hours to design a sample [48][49][50]. Therefore, the DEDGO algorithm is confirmed to be an excellent method for the inverse design of TMDC-cavity heterojunctions. Further insight regarding the average success rate and time consumption of the DEDGO, RF, and DNN algorithms is explicitly presented in Table 1. Two indicators, namely, the success rate and computing time, are chosen to the performances of the three algorithms. Here, "success rate" refers to the prop predictions that hit the true value accurately or whose spectral errors are less threshold, revealing the effectiveness of the algorithm. "Time" records the comp time needed to complete the inverse design process on the testing dataset. It can found from Figure 5f-j that the DEDGO algorithm owns a much larger success RF and DNN for all S1-S5 heterojunctions and the corresponding pure cavities.  It is also worthwhile to investigate the role of data enhancement in these machine learning algorithms. Firstly, it is apparent that the data enhancement groups almost always improve the success rate for all of the S1-S5 heterojunctions and their cavity counterparts. This is more pronounced for RF and DNN, whose success rates can even be improved by more than 50%, since their success rate is usually low without data enhancement. In terms of time consumption, data enhancement mainly works on the DEDGO algorithm and has little influence on the other two algorithms, which is probably determined by the running mechanism. As aforementioned, RF and DNN try to predict the parameters directly, which means that more spectral data can only optimize the internal weights and biases of the network rather than the running process of the algorithm, making it difficult to reduce the testing time. However, since the DEDGO algorithm is an iterative method, with the assistance of data enhancement, it can better learn the relationship between parameters, spectra, and rewards to choose better actions, reducing time by decreasing the running steps and restarts. The loss and running performance metrics revealing the effect of data enhancement on the DEDGO algorithm are shown in Table 2. The train loss and validation loss are significantly reduced in the data enhancement groups, suggesting that the network can more accurately capture features and learn abstract relationships. In addition, the effect of fewer running steps and restarts is more pronounced in groups with a lower success rate, such as S3 and S4, which also have the highest drop in testing time. In contrast, this effect is not apparent for groups with a higher success rate due to the randomness of the neural network training process. For instance, the randomness causes more samples with more running steps or restarts for S1 with no WS2, resulting in larger time consumption, although the average steps and restarts are decreased. To conclude, data enhancement makes a remarkable contribution to the DEDGO algorithm, which is an indispensable part, as well as deep greedy optimization.
Besides its high success rate and low time consumption, the universality of the DEDGO algorithm deserves to be mentioned. In the research, five types of heterojunctions and the corresponding photonic cavities are studied, showing the potential of the algorithm to design more different structures. It has a remarkable superiority over most deep learning methods used before.

DEDGO Assisted the In-Depth Study of Multiple Cavity-TMDC Heterojunctions
By means of the DEDGO algorithm, we can go further to investigate the nonintuitive and complex relationships between the structural parameters of TMDC-cavity-based heterojunctions and their optical properties. As aforementioned, the geometry, material component, and relevant geometrical parameters are important and intriguing parameters for photonic cavities that dramatically change the light-matter interactions of TMDCcavity-based heterojunctions. Therefore, to obtain further insight into the coupling effect in the WS 2 -cavity hybrid device, we employ the DEDGO network to characterize and measure the angle-resolved white-light reflectance spectra of S1-S4 heterojunctions; the results are shown in Figure 6. Considering the geometric similarity between the S3 and S5 heterojunctions, the reflectance spectra of S5 are not included in Figure 6. For comparison, we have also included the wavelength-angle dispersions of pure photonic cavities in the left panels of Figure 6 for the cases of TE polarization. It is worthwhile to mention that all the angle-resolved reflectance calculated in this work is performed with an angular resolution of 1 • and at room temperature.
The most significant and eye-catching finding from Figure 6 is that the optical-guided modes of photonic cavities strongly overlap with the A exciton of ML WS 2 , illustrated as the horizontal line lying around 0.625 µm in Figure 6b,d,f,h, for all S1-S4 hybrid devices. If one looks deeper into the above results, it is not difficult to observe the clear and apparent anticrossing behaviors associated with progressive dispersions of optical-guided modes through the A excitonic transition, revealing that the hybrid WS 2 -cavity device operates in the strong coupling regime. In other words, the dispersion of guided modes in S1-S4 heterojunctions splits into two parts, which are denoted as the lower and upper polariton bands, indicating the formation of half-light-half-matter quasi-particles. Evidently, the optical guide modes of photonic cavities in S1, S2, and S4 are quite similar since they share the F-P cavity structure, which is distinct from the 1D PhC cavity. However, due to the different parameter selections for the photonic cavity, namely, g1 = 85 nm for the S1 cavity, g2 = 80 nm for the S2 cavity, g3 = 220 nm for the S3 cavity, and g4 = 65 nm for the S4 cavity, the corresponding anticrossing points locate at different incident light angles. Another significant conclusion suggested by Figure 6 is that, after the integration of the ML WS 2 film in the photonic cavities, their optical mode dispersion curves exhibit the red-shift phenomena. Take S3 in Figure 6e,f, for instance-the original mode crossing points locate the 1D PhC cavity around 0.6 µm, which shift to around 0.61 µm after placing WS 2 underneath 1D PMMA PhC. One reasonable explanation for this effect is the change in the refractive index of the WS 2 -PhC heterojunction compared to pure PhC, where the dielectric environment dominates the optical mode dispersion of the photonic cavity. The most significant and eye-catching finding from Figure 6 is that the optical-guided modes of photonic cavities strongly overlap with the A exciton of ML WS2, illustrated as the horizontal line lying around 0.625 μm in Figure 6b,d,f,h, for all S1-S4 hybrid devices. If one looks deeper into the above results, it is not difficult to observe the clear and apparent anticrossing behaviors associated with progressive dispersions of optical-guided modes through the A excitonic transition, revealing that the hybrid WS2-cavity device operates in the strong coupling regime. In other words, the dispersion of guided modes in S1-S4 heterojunctions splits into two parts, which are denoted as the lower and upper polariton bands, indicating the formation of half-light-half-matter quasi-particles. Evidently, the optical guide modes of photonic cavities in S1, S2, and S4 are quite similar since they share the F-P cavity structure, which is distinct from the 1D PhC cavity. However, due to the different parameter selections for the photonic cavity, namely, g1 = 85 nm for the S1 cavity, g2 = 80 nm for the S2 cavity, g3 = 220 nm for the S3 cavity, and g4 = 65 nm for the S4 cavity, the corresponding anticrossing points locate at different incident light angles. Another significant conclusion suggested by Figure 6 is that, after the integration of the ML WS2 film in the photonic cavities, their optical mode dispersion curves exhibit the red-shift phenomena. Take S3 in Figure 6e,f, for instance-the original mode crossing

Conclusions
In summary, a DEDGO algorithm based on deep greedy optimization and data enhancement has been established. This algorithm can fulfil the inverse design of multiple TMDC-cavity heterojunctions in a rapid and precise way. With the aid of the RCWA method, a relatively smaller training dataset is generated for the training process of DEDGO, followed by a supplementary dataset created by the integral forward network. Next, the enhanced dataset is employed to train the reward network, which played a major role in achieving the deep greedy optimization strategy in the testing stage. The experiments show that the DEDGO algorithm achieves a higher accuracy than the RF and DNN methods at the cost of a little time. Furthermore, a detailed analysis of the coupling effect between the TMDC excitons and micro-cavity photons in TMDC-based heterojunctions comprising different cavity categories, material components, and geometrical parameters, as well as the on-demand inverse designs, was conducted using the DEDGO algorithm, in which the strong coupling effects and exciton-polaritons were thoroughly revealed. This work shows the outstanding universality and accuracy of the DEDGO algorithm, indicating its excellent potential in the inverse design of multi-shaped heterojunctions and the fabrication of photonic crystals.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.