Fast NOx Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction

Chen, Hao; Chen, Jianan; Zhao, Feiyang; Yu, Wenbin

doi:10.3390/en19030680

Open AccessArticle

Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction

School of Energy and Power Engineering, Shandong University, Jinan 250061, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(3), 680; https://doi.org/10.3390/en19030680

Submission received: 17 December 2025 / Revised: 21 January 2026 / Accepted: 26 January 2026 / Published: 28 January 2026

(This article belongs to the Collection State of the Art Electric Vehicle Technology in China)

Download

Browse Figures

Versions Notes

Abstract

Amid the growing global environmental challenges, precise and efficient vehicle emission management plays a critical role in achieving energy-saving and emission reduction goals. At the same time, the rapid development of connected vehicles and autonomous driving technologies has generated a large amount of high-dimensional vehicle operation data. This not only provides a rich data foundation for refined emission accounting but also raises higher demands for the construction of accounting models. Therefore, this study aims to develop an accurate and efficient emission accounting model to contribute to the precise nitrogen oxide (NO_x) emission accounting for hybrid electric vehicles (HEVs). A systematic approach is proposed that combines incremental dimensionality reduction with advanced regression algorithms to achieve refined and efficient emission accounting based on multiple variables. Specifically, the dimensionality of the real driving emission (RDE) data is first reduced using the feature selection and t-distributed stochastic neighbor embedding (t-SNE) feature extraction method to capture key parameter information and reduce subsequent computational complexity. Next, an incremental dimensionality reduction method based on dictionary learning is employed to efficiently embed new data into a low-dimensional space through straightforward matrix operations. Given the computational cost of the dictionary learning training process, this study introduces the FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) for accelerated iterative optimization and enhances the computational efficiency through parameter optimization, while maintaining the accuracy of dictionary learning. Subsequently, an NO_x emission factor correction factor prediction model is trained using the low-dimensional data obtained from t-SNE embeddings, enabling direct computation of the corresponding correction factor when presented with new incremental low-dimensional embeddings. Finally, validation on independent HEV datasets shows that parameter K improves to 1 ± 0.05 and R² increases up to 0.990, laying a foundation for constructing an emission accounting model with broad applicability based on multiple variables.

Keywords:

hybrid electric vehicles; real drive emission; emission factor; dimensionality reduction; dictionary learning

1. Introduction

With global urbanization accelerating, growing transportation demand has driven a continuous rise in vehicle populations, elevating mobile sources to a leading role in atmospheric pollution [1,2] and triggering widespread concern over sustainable development problems. Against this backdrop, countries around the world have been continuously updating emission regulations [3,4,5]. China has also proposed its “dual carbon” goals—carbon peaking by 2030 and carbon neutrality by 2060 [6]—in an effort to achieve effective management and control of road emissions. Meanwhile, the rapid development of technologies such as the Internet of Vehicles, autonomous driving, and virtual calibration [7,8,9] has generated massive volumes of vehicle operation data [10]. These data not only provide a foundation for precise emission accounting but also place higher demands on the accuracy and real-time performance of emission accounting models [11].

In response to increasingly stringent emission regulations, vehicle emission reduction technology is constantly innovating. In terms of physical emission reduction technology, Hassan et al. [12] developed a sandwich filter based on nanocomposite, which exhibits excellent adsorption performance, achieving removal rates of 88.5 ± 2.2%, 99.2 ± 0.5%, and 80.0 ± 1.8% for CO, SO₂, and NO_x, respectively. In addition, in order to achieve more intelligent emission management, an environmentally friendly filter combined with an Internet of Things (IoT) monitoring system [13] has also been applied to vehicle emissions reduction. While achieving a 70 ± 3.4% CO₂ removal rate, real-time online monitoring of emission data has been completed through a cloud platform. Complementing these physical emission reduction measures, accurate emission accounting models are equally crucial for vehicle emission control. Current research on road vehicle emission accounting methods mainly focuses on two directions: on one hand, various advanced algorithms, including machine learning, are employed for high-fidelity modeling. Wen et al. [14] proposed an expertise-guided NO_x emissions modeling method for HEVs of peak-valley enhanced Gaussian process regression (PV-GPR) to accurately capture the mapping characteristics. The results indicated a root mean square error (RMSE) of 0.49 for PV-GPR, significantly outperforming both feedforward neural networks (RMSE = 0.84) and cascade neural networks (RMSE = 1.01). Furthermore, PV-GPR maintained an RMSE of 0.96 even with 50% missing data. Wang et al. [15] proposed a learning model based on a BP-Adaboost algorithm combined with a transfer learning strategy. This study constructed and trained a real-world NO_x emission sub-model to address the NO_x emissions modeling issue for hybrid diesel vehicles under real-world driving conditions. Validation results showed that the coefficient of determination between the predicted instantaneous NO_x emissions and the measured data was 0.854. On the other hand, correction factors are used to calibrate emission data, thereby enhancing prediction accuracy. LEE, Kyu Jin et al. [16] proposed a method to derive emission factors across various road conditions by introducing correction factors based on relative differences in fuel consumption, informed by existing laboratory data on atmospheric pollutant emissions. This approach enhanced the credibility of emission factor estimations. Similarly, Wang et al. [17] addressed significant deviation between calculated and measured highway vehicle emissions. By adjusting emission factors using the measured fuel consumption data, the study reduced the bias of calculated CO and NO_x emissions to within 5% of measured values.

During real-world driving, vehicle emissions are influenced by a combination of factors [18], including internal observed variables of the vehicle (such as engine speed, intake manifold absolute pressure, etc.) and external observed variables (such as driving speed, road gradient, ambient temperature, etc.) [19]. Therefore, to construct a more effective emission accounting model, it is necessary to fully consider the impact of these multivariate variables. However, directly processing high-dimensional variables not only incurs high computational costs but may also introduce redundant information, affecting both the accuracy and speed of the model. Thus, how to effectively reduce the dimensionality of high-dimensional data while retaining key information is one of the important issues in constructing a precise and efficient emission model. To extract the essence of vast, high-dimensional datasets, previous studies have explored various feature parameter identification, correlation analysis, and dimensionality reduction methods to capture the most representative information for emission modeling [20]. Wang et al. [21] proposed a novel feature engineering processing approach that utilized gray correlation analysis and principal component analysis (PCA) to process 16 initial feature parameters, quantifying their correlation with NO_x emissions, and analyzing the correlation coefficients to eliminate redundant parameters, thereby facilitating rapid model convergence during training. Balogun et al. [22] addressed the NO₂ pollution prediction problem based on Internet of Things (IoT) emission sensors by proposing a hybrid model that integrated the Boruta algorithm with grid search optimization. The method selected key features from 14 IoT sensors, effectively eliminating low-relevance features and reducing data dimensionality. Mohammad et al. [23] introduced 37 variables into a data-driven dimensionality reduction, employing supervised learning, such as lasso regression and linear support vector machines, and unsupervised techniques (including PCA and factor analysis) to select and extract relevant features. These features were then used to model NO_x, CO, HC, and soot emissions in order to assess the impact of dimensionality reduction on model accuracy.

As mentioned above, conventional dimensionality reduction methods typically require recalculation when handling incremental data, leading to high computational costs. Meanwhile, although data-driven predictive models are widely used, they often lack physical interpretability. Therefore, it is of great significance to construct a framework that not only achieves rapid incremental dimensionality reduction but also constructs an emission model based on a physically meaningful formula. Hybrid Electric Vehicles (HEVs), as a transitional product between conventional internal combustion engine vehicles and fully electric vehicles, can deliver substantial fuel savings [24] while alleviating the range anxiety inherent to pure electric vehicles and have thus become a major focus of current research. Therefore, this study focuses on the research of the RDE data for HEVs. Specifically, t-distributed stochastic neighbor embedding (t-SNE) firstly projects the multivariate inputs into a compact representation, preserving the most prominent features while dramatically cutting computational overhead. Subsequently, utilizing these low-dimensional embeddings, a dictionary learning-based incremental dimensionality reduction approach constructs both a high-dimensional dictionary and a low-dimensional dictionary, enabling rapid and precise reduction in incoming data. By using straightforward matrix operations with these dictionaries, each incoming data is embedded swiftly. Considering the long computation time of the dictionary learning training process, this study introduces the FISTA (Fast Iterative Shrinkage-Thresholding Algorithm), along with parameter optimization, improving computational efficiency while maintaining the accuracy of dictionary learning. Meanwhile, the t-SNE embeddings train a SuperLearner regression model such that, when presented with new data, the tailored correction factor is instantly generated by leveraging simple matrix calculation, simplifying the path to accurate, real-time NO_x emission factor (EF_NOx) accounting.

However, a critical gap remains in the current literature: existing methods often fail to balance computational efficiency with prediction accuracy when processing the high-dimensional and nonlinear operating data of HEVs. To address this gap, the main contributions of this study can be summarized as follows: 1. By integrating dictionary learning with the SuperLearner model, a high-precision correction method for the NO_x emission factor was established. 2. An incremental dimensionality reduction strategy was developed, which allowed for the fast processing of large-scale driving data through simple matrix operations, avoiding time-consuming retraining. 3. The method was tested using independent RDE datasets, demonstrating superior accuracy and generalization capability.

This study is structured as follows: Section 2 presents the methodologies employed in this research, including t-SNE, dictionary learning, and SuperLearner. Section 3 describes the analysis process and results for constructing the refined emission accounting model. Ultimately, some conclusions are provided in Section 4.

2. Materials and Methods

To better illustrate the system proposed for NO_x emissions for HEVs, which coupled dictionary learning-based incremental dimensionality reduction with a SuperLearner regression model, Figure 1 summarizes the main steps of the method framework. The raw data of HEVs obtained through real driving emission (RDE) tests was processed in which data quality control is put forward. Based on the RDE test data, a formula for calculating the NO_x emission rate was constructed based on engine operational and emission data from the portable emission measurement system (PEMS) from Japan’s HORIBA company. Considering that there was always a certain bias between the calculated and measured values of NO_x emission rate, a correction factor optimization method was therefore proposed in this study to calibrate the measured NO_x amount. Due to the large number of feature variables obtained from RDE tests, it was necessary to apply dimensionality reduction technology to extract the most relevant information to emissions. Therefore, to enable fast, accurate NO_x emission factor prediction, this study first applied t-SNE to embed the processed high-dimensional feature matrix into a low-dimensional space. Dictionary learning then established a mapping between the high- and low-dimensional dictionaries, allowing each new data to be reduced in dimension via straightforward matrix operations. Additionally, SuperLearner was developed to construct a model linking the low-dimensional parameters to the correction factors. Based on the research above, for upcoming incremental data, the low-dimensional embeddings were derived through the dictionary learning-based incremental reduction method, which was subsequently substituted into the constructed correction factor model to calculate the correction factors in order to obtain fast and accurate NO_x emission factor prediction. These correction factors related to the parameter of interest were used to enhance the accuracy of the NO_x emission rate calculations at a high rate of speed.

2.1. Description of Dataset with Quality Control

The dataset for this study was collected during RDE tests of ten HEVs by the China Automotive Engineering Research Institute Co., Ltd. in the city of Chongqing, China. These 10 HEVs are all hybrid passenger cars based on gasoline engines. The rated power of the selected vehicles is presented in Table 1. These vehicles cover representative power ranges of low, medium, and high. By ensuring the balance of the dataset, the model can effectively learn and capture the differentiated features of vehicles with varying power performance.

The sample data included vehicle information, location information, vehicle speed, collection time, and corresponding engine operational and emission data. The tests were conducted under three distinct driving conditions: urban (v ≤ 60 km/h), suburban (60 km/h < v ≤ 90 km/h), and highway (v > 90 km/h), for example, as shown in Figure 2.

Initial processing revealed that certain key parameters were invalid. Therefore, quality control should be primarily carried out to check the effectiveness of the real-time measured data, ensuring that all values were nonnegative and the dataset was reliable for practical use. The data quality control process primarily involved two steps: handling invalid values, such as setting negative vehicle speeds as missing and removing them, and conducting effectiveness checks, for instance, correcting negative NO_x emission values to zero to align with physical reality.

2.2. NO_x Emission Factor Formulation

A method was proposed to calculate the instantaneous NO_x emission factor based on the real-time collected volumetric NO_x concentration in exhaust, the engine intake air, and the fuel flow rate [25,26]. The basic formula was expressed in Formula (1). Furthermore, due to the accessibility of engine intake air or fuel flow rate not always being available, the averaged air–fuel ratio was used instead in Formulas (2) and (3) if one of them was invalid:

(1): When the engine intake air rate Q_aM and engine fuel flow rate Q_fv were both valid,

${E F}_{N O x} \approx C_{N O x (d o w n)} \times [Q_{a M} + Q_{f v} \times ρ_{f u e l}] \times \frac{M \times (1 - l o s s e s)}{ρ_{e x h a u s t} \times 3.6} \times 10^{- 6},$

(1)

where EF_NOx was the instantaneous NO_x emission factor of the vehicle (g/s); C_NOx(down) was the SCR downstream NO_x concentration (ppm); Q_aM was the engine intake air rate (kg/h); Q_fv was the engine fuel flow rate (L/h); ρ_fuel was the gasoline fuel density, set at 0.73 kg/L; M was the molecular weight of NO_x, which was estimated to be equivalent to that of NO₂ of 46 g, as the rapid conversion of a significant proportion of NO into NO₂ under ambient conditions; (1 − losses), where losses was loss of air at the valve, was an instantaneous coefficient that was obtained by verifying the calculation results using PEMS data; ρ_exhaust was the exhaust density, which was assumed to be roughly equivalent to the air density, set at 29 g/mol.
(2): When the engine intake air rate Q_aM was valid but the engine fuel flow rate Q_fv was invalid,

${E F}_{N O x} \approx C_{N O x (d o w n)} \times Q_{a M} \times (1 + \frac{1}{α}) \times \frac{M \times (1 - l o s s e s)}{ρ_{e x h a u s t} \times 3.6} \times 10^{- 6},$

(2)

where α was the air–fuel ratio for gasoline engines.
(3): When the engine fuel flow rate Q_fv was valid but the engine intake air rate Q_aM was invalid,

{E F}_{N O x} \approx C_{N O x (d o w n)} \times Q_{f v} \times ρ_{f u e l} \times (1 + α) \times \frac{M \times (1 - l o s s e s)}{ρ_{e x h a u s t} \times 3.6} \times 10^{- 6} .

(3)

However, analysis of the available RDE test data revealed some limitations in the above formula: Firstly, using a fixed α (in Formulas (2) and (3)) may introduce certain biases in the calculation results due to the neglect of the instantaneous variation in α on varied engine output power. Meanwhile, on the other hand, effective dynamic intake air flow data was definitely required when calculating instantaneous α. Therefore, from the view of the engine operating principle, in the present study, the formula for accounting EF_NOx was adopted by replacing the intake air rate with the difference value between the engine exhaust rate and the engine fuel rate, as shown in Equation (4),

{E F}_{N O x} \approx C_{N O x (d o w n)} \times [(Q_{e x h a u s t} - Q_{f u e l}) + Q_{f v} \times ρ_{f u e l}] \times \frac{M \times (1 - l o s s e s)}{ρ_{e x h a u s t} \times 3.6} \times 10^{- 6},

(4)

where Q_exhaust was the engine exhaust rate (kg/h); Q_fuel was the engine fuel rate (kg/h).

2.3. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Because this study was based on RDE tests encompassing numerous interest parameters, it was essential to employ suitable dimensionality reduction techniques to extract representative features and streamline subsequent computations. Given the nonlinear nature of the data, t-SNE was selected as the foundational dimensionality reduction method for the offline training phase, laying the groundwork for subsequent dictionary learning. t-SNE is a typical nonlinear dimensionality reduction method that effectively maps high-dimensional data into a low-dimensional space while preserving local structures and revealing the global structure of the data. In this study, the low-dimensional representations generated by t-SNE are utilized as training targets to construct both high-dimensional and low-dimensional dictionaries. This strategy effectively leverages t-SNE’s advantages in feature extraction while overcoming its inherent limitation in processing unseen data through the dictionary learning framework.

t-SNE works by transforming the Euclidean distances between data points in the high-dimensional space into probability distributions that assign higher probabilities to similar points, and then constructing another set of probability distributions in the low-dimensional space. The core idea is to minimize the Kullback–Leibler divergence between the high-dimensional and low-dimensional probability distributions, thereby achieving dimensionality reduction. To improve robustness to outliers, t-SNE employs a symmetrized joint probability distribution, which not only simplifies the gradient computation but also accelerates the optimization process. Meanwhile, instead of using the conventional Gaussian distribution in the low-dimensional space, t-SNE uses a Student-t distribution with one degree of freedom. This heavy-tailed distribution effectively alleviates the “crowding problem” that often occurs when mapping high-dimensional data to a low-dimensional space, thereby better reflecting the global structure [27].

In summary, t-SNE effectively balances local and global data structures, making it one of the most widely used dimensionality reduction methods.

2.4. Dictionary Learning-Based Incremental Dimensionality Reduction

The goal of dictionary learning is to reconstruct the original data as a linear combination of a small number of dictionary atoms, effectively eliminating redundant information in the dataset [28]. The general model of dictionary learning can be expressed as follows:

X = D_{H} C + ε,

(5)

where X represents the multivariate dataset, D_H is the data dictionary, C denotes the sparse representation coefficients, and ε refers to the relative error.

Suppose Y₁ was the low-dimensional dataset obtained from X through dimensionality reduction, and D_L was the dictionary for the low-dimensional data. For a given data point x_i, let c_i denote its encoding in the dictionary D_H, which corresponded to the i-th column of the encoding matrix. First, t-SNE was applied to X to obtain the corresponding low-dimensional dataset Y₁. Next, K items were randomly selected from X to initialize the high-dimensional dictionary D_H [29]. After constructing the initial D_H and encoding matrix C, an iterative optimization process was employed to refine these parameters. This iterative procedure continued until the reconstruction error fell below a predefined threshold or the maximum number of iterations was reached. The detailed computational process was illustrated in Figure 3.

Through dictionary learning and the iterative optimization process, the solution for low-dimensional data dictionary D_L can be given as

D_{L} = Y_{1} C^{T} {(C C^{T})}^{- 1} .

(6)

When new data was encountered, its encoding matrix on the high-dimensional dictionary was first obtained. Then, by coupling with the low-dimensional dictionary, rapid dimensionality reduction in the new data was achieved. The dimensionality reduction calculation formula was shown in Equation (7).

Y_{x} = D_{L} C_{x} .

(7)

Dictionary learning encapsulates data characteristics by extracting a set of representative “atoms” from the original database, enabling a compact representation of the entire data. For new incremental data, only a sparse code computed using the pre-learned dictionary is required, without the need to rebuild the full dictionary each time. This approach significantly reduces iteration time and overall computational cost, offering an efficient solution for incremental data dimensionality reduction.

2.5. SuperLearner Regression Method

In this study, t-SNE was firstly applied to reduce the dimensionality of the original high-dimensional dataset. The resulting low-dimensional embeddings were then used both to obtain a low-dimensional dictionary by dictionary learning and as inputs to a regression model for predicting the NO_x emission factor correction factor.

In predictive modeling problems, selecting appropriate machine learning algorithms and the hyper-parameters to fit specific datasets is very important. The SuperLearner algorithm mitigates the problem of selecting a single optimal learner by allowing the inclusion of multiple learners for consideration [30]. Initially, a k-fold split of data is pre-defined in SuperLearner, followed by evaluating various algorithms and configurations on the same data split. The out-of-fold predictions from all models are then retained and used to train a meta-learner, which combines the predictions to determine the best-performing model [31].

2.6. Data Splitting and Validation Protocol

In order to effectively evaluate the performance of the proposed method framework and prevent data leakage issues, this study adopted a vehicle-based dataset split strategy. Specifically, during the training phase, RDE test data from 10 HEVs were used for dictionary learning and model regression parameter training. During the validation phase, independent datasets from 3 additional HEVs were used to validate the performance of the model. This data split strategy ensured that the test vehicle data is completely independent of the training phase, thereby ensuring that the evaluation results can truly reflect the model’s ability to predict unknown vehicle emissions under real conditions.

3. Results and Discussion

3.1. Comparison of Calculated and Measured NO_x Emission Factor

To verify the accuracy and effectiveness of formulating EF_NOx, this study compared the calculated EF_NOx (g/s) derived from Equation (1) with the instantaneous measured EF_NOx (g/s) collected during the RDE test of ten HEVs. At this point, all (1 − losses) factors in the calculation formula (4) were pre-set to 1, which meant there was no correction effect at the initial stage. By fitting with the least squares method, the overall correlation between calculated and measured EF_NOx was evaluated via the two dimensions of goodness of fit R² and the fitted curve equation.

Table 2 listed the correlation between calculated values and measured values without considering the correction factor (1 − losses); the corresponding parameters K, R², and RMSE were quantified to evaluate the correlations between formulated and measured EF_NOx (g/s). The parameter K was defined as the slope of the fitted curve between the calculated and measured values of EF_NOx for the individual HEV over the RDE test as follows:

E F_c a l c u l a t e d = K \times E F_m e a s u r e d .

(8)

It can be seen from Table 2 that there are obvious biases for the EF_NOx of the ten HEVs. The minimum R² was just 0.879. Hence, it is necessary to further consider the correction by introducing (1 − losses) to ensure the value of K and R² approaching 1. Due to the presence of the correction factor (1 − losses) in Equation (4), it is necessary to verify its specific value based on the corresponding variables from RDE test data. Therefore, the expected high precision correlated with EF_NOx using these variables would drive the value of K and R² very close to 1.

3.2. Calibrated Method of the NO_x Emission Factor of HEVs

3.2.1. Correction Factor Model Constructed by SuperLearner

As mentioned above, it is necessary to formulate the EF_NOx using correction factors, to keep accordance with measured values. Therefore, to address this issue, this study first established a parameter database related to NO_x emissions. Then, the t-SNE method was used to reduce the dimensionality of the variable. Utilizing the resulting low-dimensional data and dictionary learning, a low-dimensional dictionary and an incremental dimensionality reduction method were constructed. Finally, the reduced data were used in conjunction with the SuperLearner machine learning algorithm to construct a model for the correction factor (1 − losses) in Equation (4).

During the RDE tests, a substantial amount of variable information was collected. Based on engine characteristics, driving characteristics and external characteristics, the parameter selection ranges were determined, and the relevant information for each parameter is tabulated in Table 3. These parameters formed the initial database related to NO_x emissions. Specifically, the engine characteristics contained four variables, including engine coolant temperature, exhaust temperature, engine speed, and intake air temperature. Meanwhile, VSP (Vehicle Specific Power), speed, and acceleration were included in the driving characteristics. The external characteristics took ambient temperature, ambient pressure, and slope gradient. These variables were collected every second through RDE tests.

In order to reduce dimensionality while retaining the essential information from interest parameters, this study employed the t-SNE algorithm to process the data in the initial parameter database, ultimately projecting it into a two-dimensional space. As a typical nonlinear dimensionality reduction technique, t-SNE is well-suited for handling the complex nonlinear data structures between the selected parameters and NO_x emissions. Moreover, the t-distribution used in t-SNE effectively mitigates the influence of noise in the RDE test data. The specific hyperparameters of t-SNE in this study are available in Table 4.

Subsequently, the two-dimensional embeddings extracted from the operating parameters through dimensionality reduction were used as inputs for the SuperLearner algorithm to develop a predictive model for (1 − losses). The calculation formula for (1 − losses) was as follows:

1 - l o s s e s = \frac{{E F}_{N O x}}{E F_m e a s u r e d},

(9)

where EF_NOx represented the value from Equation (4) with (1 − losses) set to 1, which corresponded to the uncorrected calculated value. Both the calculated and measured values were collected on a second-by-second basis. To preserve the transient characteristics of the emissions, no smoothing filters or constraints were applied to the calculated (1 − losses).

The base models of the SuperLearner algorithm consisted of nine diverse model: Linear Regression, ElasticNet, SVR, K-Neighbors Regressor, Decision Tree, AdaBoost, Bagging, Random Forest, Extra Trees. For the meta-model, a Linear Regression model was employed to optimize the linear weighted combination of the base models’ outputs. Regarding the validation strategy, a 10-fold cross-validation procedure was implemented on the low-dimensional embedding dataset, thereby preventing overfitting and ensuring the robustness of the model. Specifically, this 10-fold cross-validation was performed exclusively within the training dataset, with folds generated at the sample level. The performance comparison between the SuperLearner and individual base regression models is presented in Table 5. As indicated by the tabulated data, SuperLearner achieved the highest R², demonstrating its superior capability in capturing the variance and trends of the correction factors. Although KNN achieved a slightly lower RMSE (0.147) compared to the SuperLearner (0.160), this difference is very small. Given that R² served as the primary metric for this study, and ensemble methods can offer enhanced robustness by mitigating the bias and overfitting risks of single models, SuperLearner was selected as the modeling method.

Moreover, the interest parameters reduced by t-SNE would be fed into the dictionary learning-based dimensionality reduction method described in Section 3.2.2, enabling the construction of the low-dimensional dictionary and subsequent incremental computations.

3.2.2. Dimensional Reduction Based on Dictionary Learning

A learning-based dictionary learns each column of dictionary atoms from the sample data being processed. These learned dictionary atoms make full use of the feature information within the data, enabling high accuracy in matching the target data. Therefore, algorithms that use learning-based dictionaries are capable of generating dictionaries with strong adaptability and accurate representations of the data characteristics.

The dictionary learning-based dimensionality reduction method can be described as identifying a mapping relationship between high-dimensional data and corresponding low-dimensional data using them as training samples, and then applying this mapping relationship to compute the corresponding low-dimensional representation for new upcoming data points. The dictionary learning implemented in this study adopted L1 regularization to introduce sparse constraints and set the regularization parameter λ to 0.1 and the dictionary size K to 100. The stopping criteria for dictionary learning was set to reach a maximum of 1000 iterations or a convergence tolerance below 10⁻⁶.

In order to verify the efficiency of the dictionary learning-based method, this study plotted the curves of the objective function value and reconstruction accuracy as a function of the number of iterations, as shown in Figure 4. It can be seen that as the number of iterations increased, the objective function value of dictionary learning gradually decreased and eventually stabilized, while the reconstruction accuracy reached up to 90%. This clearly demonstrated the excellent performance of the dictionary learning method in terms of effectiveness.

3.2.3. Dimensionality Reduction and Calibration of Incremental Data

Based on the discussions above, this study has essentially established a system that combines an incremental dimensionality reduction method based on dictionary learning with a predictive model for the correction factor (1 − losses) using the SuperLearner regression approach. When new data arrived, it was first subjected to quality control. Next, based on dictionary learning theory, the encoding matrix C_x for the incremental data was computed. According to Equation (7), coupled with the pre-constructed low-dimensional dictionary D_L of feature variables, the low-dimensional embedding Y_x was quickly obtained. Finally, this embedding Y_x was fed into the correction factor prediction model developed in Section 3.2.1, resulting in a more accurate EF_NOx forecast.

The advantages of this dimensionality reduction and correction process are: first, it avoids the complex processes typical of traditional dimensionality reduction methods, and can obtain the corresponding low-dimensional data through straightforward matrix calculations, reducing computation time. Second, the excellent modeling capability of the SuperLearner regression method enables the construction of a more precise predictive model for the correction factor (1 − losses), effectively calibrating the EF_NOx corresponding to the incremental data.

3.3. Acceleration of the Dictionary Learning Training Process

Due to the use of large volumes of high-dimensional data during dictionary learning and model training, the iterative optimization process was often time-consuming. To reduce training time costs, this study introduced an acceleration method to enhance the efficiency of the dictionary learning iterations.

Among existing acceleration methods, the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) is an approach used for solving composite convex optimization problems. Its core idea is to incorporate Nesterov’s acceleration technique into the iteration process, thereby significantly improving convergence speed. Compared to non-accelerated methods, FISTA improves the convergence rate from O(1/k) to O(1/k²), where k is the number of internal iterations. Owing to its simplicity and computational efficiency, FISTA has become a commonly used tool for problems such as sparse optimization and low-rank matrix recovery [32]. Therefore, to improve training speed while maintaining model accuracy, FISTA was introduced into the sparse coding stage of the dictionary learning process in this study. The core iterative update formula at this stage was expressed as follows:

Y^{k} = C^{k} + \frac{t_{k - 1} - 1}{t_{k}} (C^{k} - C^{k - 1});

(10)

C^{k + 1} = S_{η λ} (Y^{k} - η {D_{H}}^{T} (D_{H} Y^{K} - X));

(11)

t_{k} = \frac{1 + \sqrt{1 + 4 {t_{k - 1}}^{2}}}{2}, t_{0} = 1,

(12)

where

S_{η λ} (v) = s i g n (v) \max (|v| - η λ, 0)

was the soft-thresholding operator.

To determine the optimal parameters for FISTA, this study tested the regularization parameter λ within the range of [0.001, 1]. Considering both reconstruction accuracy and convergence speed, this study set the final regularization parameter to 0.1 and the internal iterations to 50. Additionally, all model computations were performed on a system with an Intel Core i5-12400 processor and 16 GB RAM. The introduction of the FISTA into the dictionary learning training process achieved iterative acceleration, reducing the time cost to approximately 1/100 of that required without acceleration. Specifically, the baseline training without acceleration required approximately 1020 min, whereas the accelerated process was completed in just 10 min. Since the iteration formula for the sparse coding stage was modified by FISTA, it inevitably introduced some losses in reconstruction accuracy. Therefore, this study further optimized the key parameters of FISTA, including the number of internal iterations and the regularization parameters. As shown in Figure 5, although the reconstruction accuracy was slightly reduced compared to the pre-acceleration of 0.89, it consistently remained above 0.8, indicating that the optimized dictionary can still effectively and stably represent the features of high-dimensional data.

3.4. Validation of the Method Effectiveness and Result Analysis

3.4.1. Sensitivity Analysis

Considering the randomness of the t-SNE algorithm, this study performed a sensitivity analysis using three different random states to evaluate its potential impact on the model. The evaluation metrics for these different settings are listed in Table 6. The results showed that changing the random state had a minimal effect on the final metrics. Specifically, the fluctuation of parameter K was within ±0.01, and the variation of R² was less than 0.002. These results confirmed that the emission accounting model was robust to the randomness of the dimensionality reduction process, ensuring stable and reliable predictions.

3.4.2. Performance Evaluation on Independent Datasets

To validate and evaluate the performance of the proposed system, which integrates a dictionary learning-based dimensionality reduction method with a SuperLearner-based predictive model for the correction factor of the NO_x emission factor, this study employed three independent RDE test datasets to assess its correction performance. The evaluation indicators included parameters K, R², and RMSE, which collectively provide a comprehensive evaluation of the system’s performance.

Table 7 presented the correlation between the calculated EF_NOx values and the measured values for three HEV test datasets without any calibration of (1 − losses). Table 8 showed the accuracy of the calibrated EF_NOx values using the proposed system. As compared, the application of the system significantly improved the evaluation indicators for the correlation between the predicted and measured EF_NOx values. Figure 6 illustrated the changes in the evaluation indicators before and after correction. Specifically, both parameter K and R² showed significant improvement, with K generally increasing from approximately 0.8–0.9 to around 1 ± 0.05, and R² rising from a minimum of 0.956 to above 0.990. In addition, the RMSE results demonstrated a notable improvement.

To visually evaluate the performance of the proposed method, Figure 7 presented a time-series comparison between the measured and predicted EF_NOx values. As shown in the figure, the predicted trajectory closely matches the measured data, accurately capturing both the peaks during acceleration events and the trends during stable driving. These results indicated that the proposed system, which combined dictionary learning–based incremental dimensionality reduction with a SuperLearner model for the (1 − losses) correction factor, not only simplified the computational process by enabling fast dimensionality reduction in incremental data via simple matrix operations but also delivered precise correction factor predictions, resulting in more accurate EF_NOx estimates than the uncorrected approach.

3.4.3. Comparative Analysis Against Baseline Methods

To quantify the specific contribution of each component within the proposed framework, a comparative analysis was conducted in this section. The evaluation metrics are listed in Table 9. First, regarding dimensionality reduction, the results indicated that the strategy based on t-SNE and dictionary learning performed better. Specifically, the PCA-based method produced a higher RMSE (51.12) and a lower R², suggesting that t-SNE better captures the nonlinear structure of the data. Furthermore, compared to using raw high-dimensional data, the proposed framework reduced the RMSE by about 30%. This suggested that t-SNE effectively removed redundancy from the high-dimensional data. Second, regarding the regression algorithm, the SuperLearner was compared with the Random Forest model to evaluate the benefits of the ensemble approach. The results showed that the SuperLearner further reduced the RMSE from 35.40 (Random Forest) to 16.35, while increasing the R² from 0.985 to 0.992. This demonstrated that the ensemble algorithm effectively can reduce prediction bias and improve the generalization ability of the model.

Moreover, computational efficiency was evaluated by comparing the proposed incremental dimensionality reduction method against the re-computation strategy. All computational processes were executed on a system equipped with an Intel Core i5-12400 processor and 16 GB RAM. The results, summarized in Table 10, show a significant speed advantage for the proposed approach. Specifically, for the independent validation dataset, the traditional method required 25.21 s, whereas the proposed method took only 0.39 s. Furthermore, the accuracy loss caused by dictionary learning was minimal and fell within an acceptable range. This demonstrated that the proposed framework achieved a balance between computational speed and prediction precision.

4. Conclusions

This study proposed a systematic approach that coupled an incremental dimensionality reduction method based on dictionary learning with a SuperLearner-based predictive model for the correction factor, conducting in-depth research on the refined emission accounting method for road vehicles based on the EF_NOx of HEVs. The system not only obtained low-dimensional data for incremental data through straightforward matrix calculations, reducing computational complexity, but also leveraged the outstanding modeling capabilities of SuperLearner to construct a more accurate predictive model, improving the accuracy of EF_NOx accounting by considering multivariate variables. The main conclusions are summarized below:

(1): To accurately calculate the NO_x emission factor for HEVs, this study innovatively applied a dictionary learning-based incremental dimensionality reduction method to the multivariate variables influencing NO_x emissions. Through the effective integration of t-SNE embeddings and dictionary learning, this approach captured key information within the feature parameters and achieved incremental data dimensionality reduction using only simple matrix operations. Meanwhile, to address the lengthy computation time of the dictionary learning training process, this study introduced FISTA and optimized the parameter value, achieving computational efficiency improvement while maintaining the accuracy of dictionary learning.
(2): After completing the dimensionality reduction process, the low-dimensional data obtained were used to train an accurate NO_x emission correction factor model through the SuperLearner regression method. Furthermore, for upcoming incremental data, their low-dimensional embeddings Y_x, derived through the dictionary learning-based incremental reduction method, were fed into the SuperLearner model to directly produce the corresponding correction factor, enhancing the accuracy of EF_NOx accounting.
(3): This study integrated the dictionary learning-based incremental dimensionality reduction method with the SuperLearner-based model, obtaining a precise EF_NOx correction factor for incremental data. Through validation and evaluation of three independent RDE test datasets, the results showed that the correction factor obtained by this system significantly improved the accuracy of EF_NOx. Specifically, parameter K improved from approximately 0.8–0.9 to within the range of 1 ± 0.05, R² increased up to 0.990, and RMSE also displayed a marked improvement. Therefore, the proposed system not only achieved fast dimensionality reduction in incremental data but also provided precise correction factor prediction, achieving more accurate EF_NOx estimates than uncorrected methods.

In summary, based on the RDE test of HEVs in the city of Chongqing in China, this study presented a systematic approach that coupled a dictionary learning-based incremental dimensionality reduction method with a SuperLearner-based model for the correction factor (1 − losses). This approach accelerated the dimensionality reduction in incremental data and guaranteed the accuracy of the correction factor prediction model. Admittedly, the universality of the EF_NOx accounting model results obtained in this study is to some extent limited by the dataset used. However, despite this limitation, the analytical processes and methodology provide a scientific and intelligent approach for constructing a broadly applicable emission accounting model based on multiple variables. This model relies on universal vehicle operating condition parameters, so it can be effectively transferred to other vehicle types, regions, or regulatory contexts by retraining the dictionary and regression weights using relevant local datasets.

Author Contributions

Conceptualization, J.C., F.Z. and W.Y.; methodology, H.C., J.C. and F.Z.; validation, H.C.; formal analysis, H.C.; investigation, H.C. and J.C.; writing—original draft preparation, H.C.; writing—review and editing, F.Z.; supervision, F.Z. and W.Y.; funding acquisition, F.Z. and W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Funds of Chongqing Key Laboratory of Vehicle Emission and Economizing Energy, grant number No. PFJN-09, Key R&D plan of Shandong Province (No. 2024TSGC0515), and Science and Technology Planning Project (No. 2024ZDCX004).

Data Availability Statement

The datasets presented in this article are not readily available because they were obtained from China Automotive Engineering Research Institute Co., Ltd. and are available only with their permission. Requests to access the datasets should be directed to Feiyang Zhao.

Conflicts of Interest

The authors declare that this study received funding from Open Funds of Chongqing Key Laboratory of Vehicle Emission and Economizing Energy, Key R&D plan of Shandong Province and Science and Technology Planning Project. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Ministry of Ecology and Environment of the People’s Republic of China. China Mobile Source Environmental Management Annual Report 2024. Available online: https://www.mee.gov.cn/hjzl/sthjzk/ydyhjgl/202503/W020250326518388591055.pdf (accessed on 26 March 2025).
Ministry of Ecology and Environment of the People’s Republic of China. China Mobile Source Environmental Management Annual Report in 2023. Environ. Prot. 2024, 52, 48–62. (In Chinese) [Google Scholar]
Liang, X.; Wang, Y.; Chen, Y.; Deng, S. Advances in emission regulations and emission control technologies for internal combustion engines. SAE Int. J. Sustain. Transp. Energy Environ. Policy 2021, 2, 101–119. [Google Scholar] [CrossRef]
Ravi, S.S.; Osipov, S.; Turner, J.W.G. Impact of modern vehicular technologies and emission regulations on improving global air quality. Atmosphere 2023, 14, 1164. [Google Scholar] [CrossRef]
Key Takeaways from Stringent Standards Set by EPA’s New Emissions Rule. 2024. Available online: https://www.cummins.com/news/2024/04/15/key-takeaways-stringent-standards-set-epas-new-emissions-rule (accessed on 25 June 2025).
Xi, J. Continuing to Develop, Starting a New Journey in the Global Fight Against Climate Change-Speech at the Climate Ambition Summit; Bull State Council PR China: Beijing, China, 2020; Volume 7. [Google Scholar]
Cui, W.; Liu, H.; Li, H.; Zhang, S.; Wang, Q.; Cui, W.; Zhao, F.; Yu, W. Development of a Modular Virtual Calibration Platform for Power Machinery. Designs 2025, 9, 13. [Google Scholar] [CrossRef]
Liu, H.; Cui, W.; Li, H.; Shuai, X.; Wang, Q.; Zhang, J.; Zhao, F.; Yu, W. Design of High-Speed Signal Simulation and Acquisition System for Power Machinery Virtual Testing. Designs 2025, 9, 5. [Google Scholar] [CrossRef]
Shuai, X.; Liu, H.; Li, H.; Cui, W.; Wang, Q.; Yu, W.; Zhao, F. Model-in-the-Loop Simulation for Model Predictive Controlled High-Pressure Direct Injection Dual-Fuel Engine Combustion Control. Designs 2025, 9, 24. [Google Scholar] [CrossRef]
Ahmad Jan, M.; Adil, M.; Brik, B.; Harous, S.; Abbas, S. Making Sense of Big Data in Intelligent Transportation Systems: Current Trends, Challenges and Future Directions. ACM Comput. Surv. 2025, 57, 1–43. [Google Scholar] [CrossRef]
Addressing Data Processing Challenges in Autonomous Vehicles. 2024. Available online: https://www.iotforall.com/addressing-data-processing-challenges-in-autonomous-vehicles (accessed on 25 June 2025).
Hassan, S.S.M.; Mohamed, N.R.G.; Saad, M.M.A.; Salem, A.M.; Ibrahim, Y.H.; Elshakour, A.A.; Fathy, M.A. A novel non-woven fabric sandwich filter with activated carbon/polypyrrole nanocomposite for the removal of CO, SO₂ and NO_x emitted from gasoline engines. Fuel 2025, 401, 135838. [Google Scholar] [CrossRef]
Hassan, S.S.M.; Mohamed, N.R.G.; Saad, M.M.A.; Ibrahim, Y.H.; Elshakour, A.A.; Fathy, M.A. Eco-Friendly Removal and IoT-Based Monitoring of CO₂ Emissions Released from Gasoline Engines Using a Novel Compact Nomex/Activated Carbon Sandwich Filter. Polymers 2025, 17, 1447. [Google Scholar] [CrossRef] [PubMed]
Wen, C.; Li, J.; Wang, B.; Lu, G.; Xu, H. Expertise-guided NO_x emission modeling of hybrid vehicle engines via peak-valley-enhanced Gaussian process regression. Energy 2025, 322, 135166. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, L.; Chen, Y.; Li, C.; Du, B.; Han, J. Energy management strategies for hybrid diesel vehicles by dynamic planning embedded in real-world driving emission model. Case Stud. Therm. Eng. 2025, 65, 105643. [Google Scholar] [CrossRef]
Lee, K.J.; Choi, K. Analysis on the correction factor of emission factors and verification for fuel consumption differences by road types and time using real driving data. J. Korean Soc. Transp. 2015, 33, 449–460. [Google Scholar] [CrossRef][Green Version]
Wang, X.; Li, M.; Peng, B. A study on vehicle emission factor correction based on fuel consumption measurement. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2018; Volume 108, p. 042049. [Google Scholar]
Chen, J.; Yu, H.; Xu, H.; Lv, Q.; Zhu, Z.; Chen, H.; Zhao, F.; Yu, W. Investigation on Traffic Carbon Emission Factor Based on Sensitivity and Uncertainty Analysis. Energies 2024, 17, 1774. [Google Scholar] [CrossRef]
Fernandes, P.; Macedo, E.; Bahmankhah, B.; Tomas, R.F.; Bandeira, J.M.; Coelho, M.C. Are internally observable vehicle data good predictors of vehicle emissions? Transp. Res. Part D Transp. Environ. 2019, 77, 252–270. [Google Scholar] [CrossRef]
Polymeni, S.; Spanos, G.; Pitsiavas, V.; Lalas, A.; Votis, K.; Tzovaras, D. Analyzing Energy Consumption in Battery Electric Vehicles: A Statistical-based Approach. TechRxiv 2025. [Google Scholar] [CrossRef]
Wang, Z.; Feng, K. Nox emission prediction for heavy-duty diesel vehicles based on improved gwo-bp neural network. Energies 2024, 17, 336. [Google Scholar] [CrossRef]
Balogun, H.; Alaka, H.; Egwim, C.N. Boruta-grid-search least square support vector machine for NO₂ pollution prediction using big data analytics and IoT emission sensors. Appl. Comput. Inform. 2025, 21, 101–113. [Google Scholar] [CrossRef]
Mohammad, A.; Rezaei, R.; Hayduk, C.; Delebinski, T.; Shahpouri, S.; Shahbakhti, M. Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression. Int. J. Engine Res. 2023, 24, 904–918. [Google Scholar] [CrossRef]
Pan, M.; Cao, S.; Zhang, Z.; Ye, N.; Qin, H.; Li, L.; Guan, W. Recent progress on energy management strategies for hybrid electric vehicles. J. Energy Storage 2025, 116, 115936. [Google Scholar] [CrossRef]
Lv, Z.; Zhang, Y.; Ji, Z.; Deng, F.; Shi, M.; Li, Q.; He, M.; Xiao, L.; Huang, Y.; Liu, H.; et al. A real-time NO_x emission inventory from heavy-duty vehicles based on on-board diagnostics big data with acceptable quality in China. J. Clean. Prod. 2023, 422, 138592. [Google Scholar] [CrossRef]
Wang, J.; Wang, R.; Yin, H.; Wang, Y.; Wang, H.; He, C.; Liang, J.; He, D.; Yin, H.; He, K. Assessing heavy-duty vehicles (HDVs) on-road NO_x emission in China from on-board diagnostics (OBD) remote report data. Sci. Total Environ. 2022, 846, 157209. [Google Scholar] [CrossRef] [PubMed]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Garcia-Cardona, C.; Wohlberg, B. Convolutional dictionary learning: A comparative review and new algorithms. IEEE Trans. Comput. Imaging 2018, 4, 366–381. [Google Scholar] [CrossRef]
Li, Y.; Chai, Y.; Zhou, H.; Yin, H. A novel dimension reduction and dictionary learning framework for high-dimensional data classification. Pattern Recognit. 2021, 11, 107793. [Google Scholar] [CrossRef]
Phillips, R.V.; Van Der Laan, M.J.; Lee, H.; Gruber, S. Practical considerations for specifying a super learner. Int. J. Epidemiol. 2023, 52, 1276–1285. [Google Scholar] [CrossRef]
Brownlee, J. Ensemble Learning Algorithms with Python: Make Better Predictions with Bagging, Boosting, and Stacking; Machine Learning Mastery: Vermont, VIC, Australia, 2021. [Google Scholar]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]

Figure 1. Methodological framework.

Figure 2. The driving conditions of RDE test.

Figure 3. Flowchart of dimensionality reduction method for dictionary learning.

Figure 4. Iterative diagram of objective function and reconstruction accuracy.

Figure 5. Iterative diagram of objective function and reconstruction accuracy after acceleration.

Figure 6. Comparison of validation metrics (K, R², RMSE) before and after calibration for three independent test HEVs.

Figure 7. Comparison between measured and predicted instantaneous NO_x emissions.

Table 1. Rated power of the vehicle.

HEV	Rated Power (kW)
1	67.5
2	113
3	115
4	138
5	74
6	140
7	110
8	113
9	140
10	70

Table 2. R², normalized K, and the RMSE for ten HEVs.

HEV	K	$R^{2}$	RMSE (×10⁻⁶)
1	1.314	0.87	17.45
2	1.31	0.897	98.02
3	1.22	0.953	75.52
4	1.07	0.995	7.765
5	1.06	0.996	4.810
6	0.87	0.984	122.3
7	1.12	0.983	41.23
8	0.83	0.969	135.8
9	0.81	0.962	222.4
10	1.02	0.999	19.79

Table 3. Interest parameters descriptions.

Parameter Type	Parameter	Unit
Engine Characteristics	Engine Coolant Temperature	°C
	Exhaust Temperature	°C
	Engine Speed	rpm
	Intake Air Temperature	°C
Driving Characteristics	VSP	kw/t
	Vehicle Speed	m/s
	Acceleration	m/s²
External Characteristics	Ambient Temperature	°C
	Ambient Pressure	kPa
	Slope Gradient	%

Table 4. The specific hyperparameters of t-SNE.

Hyperparameter	Value
Perplexity	30
Learning rate	200
Iterations	1000
Random state	42
Initialization strategy	PCA

Table 5. Performance comparison between SuperLearner and individual base regression models.

Model	$R^{2}$	RMSE
Linear Regression	0.604 ± 0.029	0.202 ± 0.001
Elastic Net	0.609 ± 0.018	0.203 ± 0.001
SVR	0.781 ± 0.012	0.187 ± 0.002
Decision Tree Regressor	0.796 ± 0.021	0.185 ± 0.002
K Neighbors Regressor	0.896 ± 0.008	0.147 ± 0.001
Ada Boost Regressor	0.738 ± 0.013	0.192 ± 0.002
Bagging Regressor	0.836 ± 0.014	0.155 ± 0.002
Random Forest Regressor	0.834 ± 0.010	0.155 ± 0.002
Extra Trees Regressor	0.806 ± 0.013	0.159 ± 0.002
SuperLearner	0.900 ± 0.013	0.160 ± 0.002

Table 6. Sensitivity analysis of model performance under different t-SNE random state.

Random State	K	$R^{2}$	RMSE (×10⁻⁶)
1	1.015	0.992	30.32
42	1.020	0.993	30.40
100	1.023	0.993	30.55

Table 7. Parameter K, R², and RMSE for the test HEVs.

HEV	K	$R^{2}$	RMSE (×10⁻⁶)	Average Measured Value (×10⁻⁶ g/s)	Average Calculated Value (×10⁻⁶ g/s)
Test 1	0.895	0.989	93.89	860.7	818.2
Test 2	0.796	0.956	68.5	588.9	511.6
Test 3	0.899	0.989	11.95	176.5	235.0

Table 8. R², parameter K, and RMSE for the calibrated test HEVs data.

HEV	K	$R^{2}$	RMSE (×10⁻⁶)	Average Calibrated Value (×10⁻⁶ g/s)
Test 1	1.020	0.993	30.40	862.3
Test 2	1.052	0.994	13.03	579.6
Test 3	0.949	0.990	5.578	246.3

Table 9. Comparison of predictive performance between the proposed framework and various baseline methods.

Method	Average K	$Average R^{2}$	Average RMSE (×10⁻⁶)
Proposed framework (t-SNE + Dictionary learning + SuperLearner)	1.007	0.992	16.35
PCA + Dictionary learning + SuperLearner	0.923	0.980	51.12
Raw features + SuperLearner	0.975	0.981	23.35
t-SNE + Dictionary learning + Random forest	0.950	0.985	35.40

Table 10. Comparison of computational time and prediction accuracy between the proposed method and the re-computation baseline.

Method	Average Runtime (s)	Average K	$Average R^{2}$	Average RMSE (×10⁻⁶)
Proposed framework	0.39	1.007	0.992	16.35
Re-computation baseline	25.21	1.046	0.994	16.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, H.; Chen, J.; Zhao, F.; Yu, W. Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction. Energies 2026, 19, 680. https://doi.org/10.3390/en19030680

AMA Style

Chen H, Chen J, Zhao F, Yu W. Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction. Energies. 2026; 19(3):680. https://doi.org/10.3390/en19030680

Chicago/Turabian Style

Chen, Hao, Jianan Chen, Feiyang Zhao, and Wenbin Yu. 2026. "Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction" Energies 19, no. 3: 680. https://doi.org/10.3390/en19030680

APA Style

Chen, H., Chen, J., Zhao, F., & Yu, W. (2026). Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction. Energies, 19(3), 680. https://doi.org/10.3390/en19030680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast NO_x Emission Factor Accounting for Hybrid Electric Vehicles with Dictionary Learning-Based Incremental Dimensionality Reduction

Abstract

1. Introduction

2. Materials and Methods