4.1. Comparative Model Evaluation
To validate the effectiveness of the proposed MDIAN model for the simultaneous retrieval of NO and NO2 concentrations in gas mixtures, a standard DOAS least-squares method (LSM), support vector regression (SVR), convolutional neural networks (CNN), long short-term memory networks (LSTM), and a combined CNN + LSTM model were selected for comparison. LSM was introduced as the conventional spectroscopy baseline, while SVR was retained as a representative machine-learning baseline. All models were evaluated under the same dataset partitioning scheme and evaluation metric framework to ensure a fair comparison.
As shown in
Table 4, the retrieval accuracy of NO and NO
2 exhibits a consistent improvement trend as the methods evolve from the conventional DOAS least-squares method to machine-learning and deep-learning approaches. As the standard spectroscopy baseline, LSM provides a physically interpretable concentration retrieval method based on least-squares fitting. However, it produces the largest prediction errors and the lowest R
2 values among all compared methods, indicating that the conventional linear fitting framework has limited capability in handling severe spectral overlap and cross-interference between NO and NO
2. By introducing kernel mapping, SVR enhances nonlinear modeling capability and thus outperforms LSM overall. CNN is effective in extracting local spectral features, while LSTM captures contextual dependencies along the wavelength dimension. The combined CNN + LSTM model leverages the strengths of both, leading to further improvements in prediction accuracy. In contrast, the proposed MDIAN model integrates multi-scale dual-branch convolution, cross-attention fusion, and Bi-LSTM-based wavelength sequence modeling in a unified framework. This enables simultaneous extraction of narrow-band absorption details, broad spectral profile information, and inter-species coupling features. Compared with the best-performing deep learning comparison model (CNN + LSTM), MDIAN achieves additional reductions in MAE of 42.9% for NO and 38.6% for NO
2, demonstrating superior discriminative capability and higher regression accuracy under complex overlapping spectral conditions.
A more detailed view is provided by the full-scale error distributions in
Figure 8. Although the errors of all comparative models are already maintained at relatively low levels for most samples, a small number of outliers with comparatively large errors can still be observed. By contrast, the error distribution of MDIAN is more concentrated and exhibits smaller overall fluctuations, indicating that it not only achieves higher average prediction accuracy but also provides better stability and robustness.
To further address the possible influence of repeated spectra under the same concentration condition, a concentration-condition-level grouped five-fold cross-validation was conducted. The results are shown in
Table 5. In this evaluation, all 500 repeated spectra from the same concentration condition were kept in the same fold. Therefore, the spectra in each test fold came from concentration conditions that were not included in the corresponding training folds. The proposed MDIAN achieved stable performance across the five grouped folds. For NO retrieval, the MAE values ranged from 0.072 to 0.079 ppm, and the R
2 values ranged from 0.9997 to 0.9998. For NO
2 retrieval, the MAE values ranged from 0.061 to 0.067 ppm, and the R
2 values ranged from 0.9997 to 0.9998.
These results are highly consistent with those obtained using the original dataset partitioning strategy. More importantly, the grouped evaluation avoided the situation in which repeated spectra from the same concentration condition appeared in both the training and test sets. Therefore, the high prediction accuracy of MDIAN was not mainly caused by near-duplicate spectra shared between the training and test sets. Instead, the results indicate that the proposed model learned concentration-related differential absorption features and showed good generalization capability to unseen concentration conditions within the investigated concentration range.
It should be noted that this grouped cross-validation also evaluated the model on concentration combinations that were not included in the corresponding training folds. In particular, each test fold contained unseen equal-concentration and inverse-gradient mixed-gas combinations. Therefore, the additional grouped evaluation further examined the ability of MDIAN to generalize to mixed-gas concentration pairs outside the training subset.
4.2. Ablation Study
To systematically evaluate the contributions of each core component in MDIAN, we designed five groups of progressively cumulative ablation experiments. The configurations of these models are summarized in
Table 6, and all variants were trained under the same training strategy and hyperparameter settings to ensure a fair comparison. To avoid ambiguity, the attention module in Improve_2 refers to self-attention applied within the single-branch spectral feature representation. It is used to enhance the weighting of informative wavelength regions under the single-branch setting. In contrast, the interaction attention module in Improve_3 refers to bidirectional cross-attention between the two branches. It enables feature interaction between the NO-related and NO
2-related branches and is different from the self-attention used in Improve_2. The ablation results are presented in
Figure 9. The Base model, which consists only of a single-branch convolutional module followed by a fully connected layer, exhibits the largest prediction errors, indicating that relying solely on single-branch convolution is insufficient to fully characterize the complex overlapping information in differential spectra of gas mixtures. After introducing Bi-LSTM, the MAE of Improve_1 shows a significant reduction compared with the Base model (with decreases of 48.7% for NO and 49.8% for NO
2), demonstrating that contextual dependencies along the wavelength dimension play a critical role in concentration retrieval. With the further incorporation of an attention mechanism, Improve_2 achieves additional error reduction, suggesting that adaptive weighting of key spectral regions enhances the model’s ability to extract informative absorption features. Under the single-branch setting, to isolate and evaluate the contribution of the attention mechanism itself, a self-attention module is introduced as a replacement for comparative analysis.
When the model is extended from a single-branch to a dual-branch architecture with the incorporation of cross-attention, the performance of Improve_3 is further enhanced. This indicates that explicitly separating features from different branches and enabling cross-branch information exchange can more effectively mitigate spectral coupling interference between NO and NO2. Building upon Improve_3, the full MDIAN model further integrates multi-scale convolutional design, where small convolutional kernels are employed in the NO branch and larger kernels in the NO2 branch, enabling differentiated multi-scale feature extraction. Experimental results show that the complete MDIAN achieves the lowest MAE for both NO and NO2 (0.076 ppm for NO and 0.062 ppm for NO2). This demonstrates that small convolutional kernels are more effective in capturing local narrow-band absorption details, while larger kernels are better suited for modeling broad spectral profile characteristics. The complementary combination of the two further enhances the model’s capability to represent complex overlapping spectra.
The ablation study was designed in a progressively cumulative manner. This design allows the contribution of each main component to be evaluated step by step under a consistent experimental framework. Compared with isolated and non-cumulative variants, the progressive setting provides a clearer view of how the model performance changes when key modules are sequentially introduced. It also avoids excessive structural changes between adjacent variants, making the performance differences easier to interpret. The consistent performance improvement observed across the ablation variants demonstrates the effectiveness and necessity of the main modules in the proposed architecture.
4.3. Stability Analysis
In addition to prediction accuracy, the practical applicability of the model in real detection scenarios also depends on the stability and sensitivity of its outputs. To this end, the uncertainty
and detection limit
were adopted to further evaluate the MDIAN model. Their definitions are given in Equations (10) and (11):
where
denotes the concentration value predicted in the
-th repeated measurement (ppm),
denotes the average value of the repeated measurement results (ppm), and
is the number of repeated measurements. In this study,
.
is the dimensionless confidence factor and was set to 3. This expression is equivalent to the standard 3
/sensitivity convention, where the sensitivity is estimated as
.
is the standard concentration of the target gas (ppm), which was 10 ppm for both NO and NO
2 in the repeated measurement experiment.
denotes the standard deviation of the repeated measurement results (ppm) and was calculated from the 50 consecutive predicted concentrations for each gas. Specifically, for each target gas, the 50 predicted concentrations were first used to calculate
, and the corresponding standard deviation was then taken as
. The uncertainty and detection limit were calculated separately for NO and NO
2, and the detection limit is expressed in ppm.
A mixed gas containing 10 ppm NO and 10 ppm NO
2 was measured in 50 consecutive trials, and the results are presented in
Figure 10. For NO detection, the uncertainty
and detection limit
were 0.69% and 0.15 ppm; for NO
2, the corresponding values were 0.76% and 0.16 ppm. It should be noted that these detection limits were statistically estimated from repeated measurements at 10 ppm and represent the estimated sensitivity of the system under the current experimental conditions. In the present study, the experimental validation was conducted within the concentration range of 1–20 ppm, while sub-ppm measurements were beyond the scope of the current experimental design. The uncertainty
for both gases is below 1%, indicating that the model output exhibits minimal fluctuations and thus demonstrates good repeatability and stability. In terms of the detection limit
, both NO and NO
2 remain at relatively low levels, with a difference of only 0.01 ppm. These results suggest that MDIAN exhibits good repeatability and estimated sensitivity for the simultaneous retrieval of NO and NO
2 under the current experimental conditions.
To further evaluate the repeatability and estimated sensitivity of MDIAN across different concentration levels, four equal-concentration NO/NO2 mixtures were tested. The selected concentration levels were 1, 5, 10, and 20 ppm for both NO and NO2, representing near-zero, low, medium and high concentration levels within the investigated 1–20 ppm range. For each concentration level, 50 consecutive repeated measurements were performed.
The results are summarized in
Table 7. For the 1 ppm NO/NO
2 mixture, the uncertainty and detection limit were slightly higher than those obtained at higher concentrations. Specifically, the uncertainty values were 0.87% for NO and 0.95% for NO
2, while the detection limits were 0.24 ppm and 0.27 ppm, respectively. This is reasonable because the absorption signal is weaker at the near-zero/low-concentration level, making the relative influence of prediction fluctuation more pronounced. As the concentration increased to 5, 10, and 20 ppm, both uncertainty and detection-limit values became more stable. For these three concentration levels, the uncertainty values remained below 0.80%, and the detection limits were approximately 0.15–0.16 ppm for both gases.
These results indicate that MDIAN maintains good repeatability across the investigated 1–20 ppm range. The slightly larger uncertainty and detection limit at 1 ppm reflect the expected behavior of low-concentration spectral retrieval, where weaker absorption signals lead to relatively larger fluctuations. Overall, the model shows stable uncertainty and detection-limit performance under low, medium, and high concentration conditions.