Next Article in Journal
SAG-YOLO: A Lightweight Real-Time One-Day-Old Chick Gender Detection Method
Previous Article in Journal
Quantifying the Effects of Detraining on Female Basketball Players Using Physical Fitness Assessment Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Methane Concentration Inversion Based on Multi-Feature Fusion and Stacking Integration

Shanghai Marine Intelligent Information and Navigation Remote Sensing Engineering Technology Research Center, Key Laboratory of Fisheries Information, Ministry of Agriculture, College of Information, Shanghai Ocean University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(7), 1974; https://doi.org/10.3390/s25071974
Submission received: 15 February 2025 / Revised: 19 March 2025 / Accepted: 21 March 2025 / Published: 21 March 2025
(This article belongs to the Section Remote Sensors)

Abstract

:
To address the issue of relatively simple features and methods used in methane concentration inversion, which leads to low overall accuracy, this study proposes a methane concentration inversion method based on multi-feature fusion and Stacking ensemble learning. The method leverages the series-parallel cascade structure between multiple base models and meta-models to learn different feature representations and patterns in the original data, fully exploring the intrinsic relationships between various feature factors and methane concentration. This approach improves inversion accuracy and generalization capability. Finally, the research team conducted experimental validation in the eastern region of Xinjiang. The experimental results show that, compared with other typical methods, the Stacking ensemble model proposed in this study achieves the best inversion performance, with R2, RMSE, and MAE values of 0.9747, 2.8294, and 1.5299, respectively. In terms of seasonal distribution, methane concentration in eastern Xinjiang typically shows lower average values in the spring and autumn and higher average values in the summer and winter.

1. Introduction

Methane (CH4) is the second most significant greenhouse gas after carbon dioxide (CO2). Existing research indicates that the increase in atmospheric methane concentration is largely driven by continuously rising anthropogenic emissions [1,2]. Currently, the primary anthropogenic sources of methane include fossil fuel extraction and use, ruminant emissions, biomass burning, and rice cultivation [3,4]. Since 1750, methane emissions have contributed to nearly one-quarter of the cumulative radiative forcing caused by greenhouse gases, playing a significant role in global warming [5,6]. Additionally, due to its efficient absorption of infrared radiation, methane has a higher global warming potential (GWP) than CO2, with a GWP 28 times that of CO2 over a 100-year period [7,8]. Methane is also an important precursor of ozone (O3) and can affect air quality through photochemical reactions [9]. In the stratosphere, methane is oxidized to form water vapor (H2O), which is eventually converted into CO2, further exacerbating global warming [10,11]. Therefore, methane concentration inversion is of great significance for assessing global warming, identifying major emission sources, and addressing climate change.
Currently, atmospheric methane concentration observation methods primarily include ground-based and satellite-based approaches. Ground-based observations mainly consist of the Global Atmosphere Watch (GAW) and the Global Greenhouse Gas Reference Network (GGGRN) [12,13,14,15,16]. Since the 1980s, China has established atmospheric background stations, including the Waliguan station in Qinghai, to monitor greenhouse gases such as methane [17]. Ground-based observations, through long-term and high-precision monitoring of atmospheric methane (CH4) concentrations, can reveal its spatial patterns, seasonal fluctuations, and interannual variations. However, due to the limited number of stations, small monitoring coverage, and uneven distribution, ground-based monitoring is constrained in its spatiotemporal coverage. In contrast, satellite remote sensing, with its advantages of rapid observation, stable cycles, and large-scale synchronous monitoring, has become an important tool for continuous global methane concentration monitoring [18,19,20,21].
In recent years, many scholars have used satellite remote sensing data and statistical methods to analyze the spatial and temporal characteristics of global atmospheric methane concentration. Zhang Shaohui et al. [22] used AIRS satellite data and statistical methods to analyze the spatiotemporal variations and seasonal changes in methane concentration in the global and East Asian regions from December 2002 to November 2016. The study found that the global average methane concentration increased year by year from 2003 to 2016; the methane concentration in the East Asian region showed seasonal changes. He Qian et al. [23] used SCIAMACHY satellite remote sensing data and statistical methods to analyze the changes in global methane concentration from 2003 to 2009. The study found that southern China, India, and the Southeast Asian peninsula are high-value areas for global methane concentration. At the same time, latitude has an impact on methane concentration, which gradually decreases from north to south. Li Shengwei et al. [24] used methane observation data from the TROPOMI satellite and wind data from the ECMWF global reanalysis to estimate surface methane emissions in China using an efficient divergence method. The results showed that the high-concentration areas of methane were in central China, East China, the Beijing–Tianjin–Hebei region, the Sichuan Basin, and some northern parts of the Xinjiang Uygur Autonomous Region.
Existing research on methane concentrations mainly relies on statistical models and mechanistic models. Statistical models, while capable of providing certain trend analyses, often lack an in-depth understanding of physical and chemical mechanisms. Mechanistic models, on the other hand, require comprehensive consideration of complex physical and chemical processes, increasing model complexity [25,26]. In addition, dependence on specific environments and assumptions about boundary conditions also affect the applicability of these models in practical use. Therefore, these traditional models often show limitations when dealing with complex nonlinear relationships and multi-source data fusion. In contrast, machine learning methods can learn complex nonlinear relationships from large amounts of data, exhibiting significant advantages such as high accuracy and strong adaptability. They provide an efficient and flexible solution for monitoring atmospheric methane concentrations and have proven effective in fields such as simulating atmospheric pollution [27,28].
In recent years, machine learning has demonstrated exceptional performance in methane concentration monitoring. Xinyue Ai [4] utilized TROPOMI satellite methane concentration data and anthropogenic emission inventories to analyze the methane concentration enhancement trends in central and eastern China from 2001 to 2018 using the random forest method. The study found that the random forest model could accurately establish the relationship between emission sources and methane concentration enhancement, with a coefficient of determination (R2) of 0.89 and a root mean square error (RMSE) of 11.98. Guo Haohao et al. [29] employed GOSAT satellite methane concentration data and applied the random forest method to establish the relationship between various influencing factors and methane concentrations. The results indicated that the spatial distribution of near-surface methane concentrations in China generally exhibited higher levels in the east and lower levels in the west. Seasonally, methane concentrations were generally higher in summer and autumn and lower in spring and winter.
The above research demonstrated the potential of machine learning methods in monitoring atmospheric methane concentrations, but there were still some limitations. Most existing studies relied on single features or limited data sources, such as using only meteorological data or specific satellite data. This use of single features restricted the model’s ability to fully understand the complexity of methane concentration changes, making it difficult to fully capture the inherent relationships between methane concentrations and various factors. Additionally, single-feature models often struggled to adapt to the complex spatiotemporal variations, making it challenging to address the changing patterns of methane concentrations across different regions and seasons.
To address the above issues, this study proposed a methane concentration inversion method based on multi-feature fusion and the Stacking ensemble model (MFF-SEM). It used Level 2 methane column concentration data from the TROPOMI instrument on Sentinel-5P as the research object, combined with the ERA5 dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) and auxiliary data such as latitude and longitude. By comprehensively considering meteorological factors and other characteristic parameters that affect methane concentration, this study constructed a stacked ensemble learning methane inversion model based on multi-feature fusion. The model integrated multiple base models and a meta-model to capture different feature representations and patterns in the original data through cascaded learning, achieving complementary advantages and multi-feature fusion. This approach fully explored the hidden relationships between different features and methane concentration, enabling accurate methane inversion and spatiotemporal variation analysis. The method was validated through experiments in the eastern Xinjiang region, demonstrating its effectiveness.

2. Materials and Methods

2.1. Study Area Description

Xinjiang is located in the hinterland of the Eurasian continent and is characterized by a temperate continental climate. The annual average temperature is 32 °C, and the annual average precipitation is approximately 150 mm, indicating a dry climate [30]. The study area covers the eastern part of Xinjiang, with a longitude range of 84°29′4″ E to 94°9′18″ E and a latitude range of 37°59′42″ N to 46°11′6″ N. The study period spanned from May 2018 to May 2020.

2.2. Data Collection and Processing

This study combines meteorological factors, satellite auxiliary data, and latitude–longitude information to invert methane concentrations.
Research has shown that incorporating meteorological data into modeling can enhance the accuracy and reliability of methane concentration inversion. The meteorological factors used in this study were derived from the ERA5 dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). This dataset assimilates conventional observations and satellite remote sensing data from the surface and upper atmosphere across different regions globally [31,32,33]. As shown in Table 1, the meteorological factors employed in this study include 10 m zonal wind (u10), 10 m meridional wind (v10), 2 m temperature (t2m), 2 m dewpoint temperature (d2m), surface direct solar radiation under clear sky (cdir), near-infrared albedo for diffuse radiation (alnid), surface pressure (sp), surface solar radiation (ssr), total column ozone content (tco3), and boundary layer height (blh). The data have a spatial resolution of 0.25° × 0.25° and a temporal resolution of hourly.
The auxiliary data used in this study come from the accompanying data of the Tropospheric Monitoring Instrument (TROPOMI) L2 product output, which can be used for a posteriori filtering and total inversion error estimation, and can also be used for model analysis. The auxiliary data selected in this study include total column water vapor (water_total_column), aerosol optical thickness (aerosol_optical_thickness) and surface albedo (surface_albedo). The spatial resolution of the auxiliary data is 7 km × 7 km, and the temporal resolution is daily. In addition to the above auxiliary data, the features involved in the modeling also include longitude and latitude to characterize spatial characteristics.
The methane concentration data come from the methane (CH4) column concentration product of the Tropospheric Monitoring Instrument (TROPOMI). TROPOMI, an atmospheric monitoring spectrometer onboard the Sentinel-5P satellite, is one of the most technologically advanced and highest-spatial-resolution atmospheric spectrometers globally, providing daily global coverage [34,35]. The global spatial resolution is 7 km × 7 km (improved to 5.5 km × 7 km in August 2019). The CH4 column concentration product is a Level 2 (L2) offline (OFFL) data product. For this study, high-quality CH4 column concentration inversion data, after bias correction (methane_mixing_ratio_bias_corrected) and with a quality descriptor (qa_value > 0.5), are selected.
The data are processed from both temporal and spatial aspects. As shown in Figure 1, at the spatial scale, to eliminate the scale effect of different resolution data, meteorological factors are resampled to 7 km × 7 km resolution using bilinear interpolation method. At the temporal scale, to minimize errors, the values of meteorological factors closest to the methane concentration time are used. For anomalies and missing values, this study directly removes the missing values, and anomalies are processed using box plots. In order to guarantee the highest-quality data for methane concentration data, pixels classified with a qa_value of less than 0.5 are filtered to guarantee data quality.

2.3. Stacking Ensemble Learning Model for Methane Concentration Inversion

The core idea of Stacking ensemble learning is to combine multiple base learning models, leveraging the strengths of different models to capture patterns and relationships that might be overlooked by individual models, thereby reducing model bias and variance. A meta-model is then used to integrate the predictions of the base models, enhancing the overall generalization capability of the model. In this study, GBDT and LightGBM are employed to fit residuals through gradient boosting algorithms, effectively capturing the nonlinear feature mapping between different features and methane concentrations. Additionally, XGBoost and RF are utilized for their exceptional performance in preventing model overfitting. This approach achieves complementary advantages among the base models, fully exploring the intrinsic relationships between features and methane concentrations. Finally, the meta-learner Lasso is used to achieve precise inversion of methane concentrations.
Figure 2 shows the framework of methane concentration inversion based on Stacking ensemble learning. In this study, five-fold cross-validation is used to train the base learners. The training set is divided into five parts, with four parts used for training and the remaining one part, along with the test set, used for prediction. This process is repeated five times, and the prediction results for each sample are concatenated in their original order to generate a new feature matrix for the meta-learner training set. Simultaneously, the prediction results from the five test sets are averaged to create a new feature matrix for the meta-learner test set.

2.4. Base Learner

GBDT (Gradient Boosting Decision Tree) is an iterative decision tree algorithm that is part of the boosting strategy and comprises multiple decision trees. As a supervised machine learning algorithm, GBDT can accurately capture the nonlinear characteristics of various predictive variables and establish the relationship between predictors and methane concentrations. Its core idea is to use the gradient boosting method, where each iteration adds a new decision tree based on the previous one to fit the residuals between the predicted values and the true values from the last iteration [36]. In this way, each iteration brings the predicted results closer to the true values, and the optimal prediction is ultimately obtained by accumulating the results of multiple decision trees. The calculation formulas are shown in Equations (1)–(4):
f 0 x = a r g m i n i = 1 n L ( y i , f ( x ) )
f t x = f t - 1 x + Δ f t ( x )
  Δ f t x = P t h t
F ( x ) = t = 0 T f t ( x )
In the formula, f t x represents the function after the t-th iteration, L ( y i , f ( x ) ) denotes the loss function, Δ f t x is the boost value after each iteration, Ρ t is the optimal gradient descent step length, h t is the base-learner function, and F x is the total boost value after iteration.
XGBoost, a machine learning algorithm, is characterized by its extensive use, flexibility, and efficiency. It is based on the gradient boosting framework and improves prediction accuracy by integrating multiple decision tree models. The core idea is that during the training process, each new decision tree is constructed to improve upon the results of the previous tree, aiming to gradually reduce the loss function value and thereby enhance the predictive capability of the entire model [37]. The loss function is defined as follows:
L = i = 1 n l ( y i , y ^ i )
In the equations, y i represents the true value of methane concentration, y ^ i denotes the predicted value of methane concentration, and n is the number of samples. To prevent overfitting, XGBoost introduces a regularization strategy into the objective function. It controls the complexity of the decision trees to limit the model’s learning capacity. The objective function is defined in Equation (6):
O b j = i = 1 n l ( y i , y ^ i ) + i = 1 t Ω ( f i )
The regularization term, which sums the complexity of all trees, is incorporated into the objective function to prevent overfitting. Commonly used regularization strategies include limiting the maximum depth of decision trees, the minimum number of samples in leaf nodes, and the weight decay of leaf nodes.
LightGBM is a gradient boosting algorithm based on Gradient Boosting Decision Trees (GBDTs). Compared to traditional gradient boosting methods, LightGBM significantly improves training speed and efficiency by introducing Exclusive Feature Bundling (EFB) and a histogram-based algorithm. It adopts a leaf-wise optimization strategy, which effectively reduces memory usage and demonstrates excellent performance on large-scale datasets and high-dimensional features.
Random forest (RF) is a machine learning algorithm based on the Bagging strategy, which can effectively handle nonlinear problems and excels at processing large numbers of samples and features. It constructs multiple decision trees by randomly sampling a subset of the original data with replacement and then aggregates their results through voting or averaging to obtain the final prediction [38]. By combining different models, this technique effectively prevents overfitting while strengthening the model’s generalization ability.

2.5. Meta-Learner

The Lasso regression model is a linear regression method that achieves feature selection and model simplification by introducing an L1 regularization term. Its core idea is to constrain the model coefficients through the L1 regularization term λ | | β | | 1 , which forces the coefficients of less important features to become zero, thereby enabling feature selection. This approach not only reduces model complexity and improves generalization capability but also enhances model interpretability. It is shown in Equation (7):
min β | | y t - β x t | | 2 2 + λ | | β | | 1
where y t and x t are the t-th response variable and independent variable, respectively, and β is the parameter vector.

2.6. Model Evaluation Metrics

To evaluate model performance, this study uses the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) as performance metrics. The calculation formulas are shown in Equations (8)–(10):
  R 2 = 1 - i = 1 m ( y ^ i - y i ) 2 i = 1 m ( y - i - y i ) 2
R M S E = 1 m i = 1 m ( y ^ i - y i ) 2
M A E =   1 m i = 1 m | y ^ i - y i |
where y i represents the actual methane concentration value of the i-th sample; y ^ i represents the predicted methane concentration value of the i-th sample; y - i represents the mean value of actual methane concentrations; m represents the number of measurements. R2 measures the correlation between model predictions and actual values, with values close to 1 indicating strong correlation. RMSE emphasizes prediction error magnitude and model stability, where lower values indicate higher model stability. MAE reflects the overall degree of prediction error, where lower values indicate more accurate model predictions.

3. Results and Discussion

3.1. Experimental Setup

In this study, a multi-feature fusion Stacking ensemble model (MFF-SEM) is established using four machine learning models including XGBoost as base learners and the Lasso model as the meta-learner. The partial model parameters are shown in Table 2.

3.2. Factor Selection for MFF-SEM Model

To demonstrate the rationality of feature factor selection in this study, comparative experiments with different feature combinations are conducted. As shown in Table 3, F1, F2, and F3 represent three different feature combinations, where F1 represents meteorological factor features only; F2 represents meteorological factors and auxiliary data; and F3 represents all features including meteorological factors, auxiliary data, and geographical coordinates.
The data from May 2018 to April 2019 are used, with 80% as the training set and 20% as the test set, and the final results are shown in Figure 3. Figure 3a–c represent the results using F1 meteorological factors only, F2 meteorological features with auxiliary data, and F3 all feature combinations, respectively. The results indicate that when using only meteorological factors as features, the MFF-SEM model shows the lowest accuracy, with a coefficient of determination (R2) of 0.9489. The highest accuracy is achieved using all feature combinations, with R2 of 0.9747, RMSE of 2.8294, and MAE of 1.5299. The results also demonstrate that the model using all feature combinations performs better than the other two models in terms of fitting effect. These results confirm that the complete feature combination selected in this study has significant advantages.
To investigate factors influencing methane concentration more accurately, SHAP (SHapley Additive exPlanations) values are used to analyze the magnitude and direction of feature impacts on model predictions. Figure 4 presents the SHAP summary plot, which ranks the importance of selected factors affecting methane concentrations. Each point in the plot represents a sample, and each row represents a feature. The SHAP values are centered at zero, where samples on the left side have negative effects on predictions, and those on the right side have positive effects. The colors indicate the magnitude of corresponding feature values, transitioning from red to blue as feature values decrease. The wider the color region, the greater the feature’s influence.
As shown in Figure 4, temperature (t2m) has a negative impact on methane concentration, where higher temperatures correspond to lower SHAP values and lower methane concentrations in the corresponding regions. Boundary layer height (blh) and 10 m zonal wind (u10) also exhibit negative effects on methane concentration. Higher boundary layer height leads to a larger methane dispersion range, resulting in lower concentrations. Additionally, increased wind speed enhances long-distance transport and diffusion of airflow, which also reduces methane concentration.
Figure 5 shows the SHAP feature importance plot, where the global importance of each feature is calculated by averaging the absolute values of SHAP values. As shown in the figure, the total column water vapor (water_total_column) and latitude (latitude) from the auxiliary data, as well as the total column ozone (tco3) from the meteorological factors, have a significant impact on the methane concentration inversion model. This also indicates the importance of the combination of these three features for model construction.

3.3. Accuracy Analysis and Comparison Between Stacking Model and Other Single Models

The data from May 2018 to April 2019 are used, with 80% as the training set and 20% as the test set. The data are input into the MFF-SEM model and other models for experimentation, and the final model results are shown in Table 4.
From the comparison of evaluation metrics for different models in Table 4, it can be observed that deep learning models generally exhibit lower accuracy. Due to the limited sample size, the LSTM model struggles to effectively capture temporal dependencies, resulting in lower inversion accuracy, with R2, RMSE, and MAE values of 0.6718, 11.9198, and 9.0477, respectively. The 1DCNN model requires a larger dataset, and the limited data volume in methane concentration inversion affects its performance, making it only slightly better than the GBDT model among machine learning models (R2 of 0.8643). Machine learning models such as LightGBM, RF, and XGBoost generally outperform deep learning models in terms of accuracy. Among them, LightGBM and RF show similar performance, with R2, RMSE, and MAE values of 0.9435, 4.9206, 3.7154 and 0.9479, 4.0637, 2.1072, respectively. The XGBoost model, which incorporates a regularization term to effectively prevent overfitting, achieves higher inversion performance, with R2, RMSE, and MAE values of 0.9673, 3.2221, and 1.7284, respectively. However, the MFF-SEM model used in this study, by leveraging multiple base learners to fully explore the relationships between different feature factors and methane concentrations, achieves deep integration of feature combinations and complementary advantages among base learners, resulting in the highest inversion accuracy, with R2, RMSE, and MAE values of 0.9747, 2.9294, and 1.5299, respectively. Figure 6 shows density scatter plots of actual values versus predicted values for different models.
Figure 6a–g represent LSTM, 1DCNN, GBDT, LightGBM, RF, XGBoost, and MFF-SEM models, respectively. The results indicate that Stacking, XGBoost, and RF models show similar data distributions, with concentration values primarily distributed in the 1875–1900 ppb range. Among these, the MFF-SEM model demonstrates the best fitting performance, while XGBoost and RF models underestimate methane concentrations. LightGBM and GBDT models predict concentration values concentrated between 1810 and 1910 ppb, with LightGBM showing better fitting performance than GBDT. GBDT exhibits more outliers, mainly due to its sensitivity to anomalous values, which leads to decreased performance. Among deep learning models, LSTM shows the poorest fitting performance, with predicted concentration values concentrated between 1820 and 1880 ppb. For methane concentrations outside this range, the model predictions remain constant, primarily due to the imbalanced distribution of low and high concentration values in the samples. The limited sample size results in weak temporal relationships, preventing the model from learning the characteristics of these samples.

3.4. Seasonal Analysis of Methane Concentrations

Figure 7 shows the seasonal average methane concentrations from June 2018 to May 2020, where Figure 7a represents the period from June 2018 to May 2019 and Figure 7b represents the period from June 2019 to May 2020. Figure 8 presents the monthly average methane concentrations. Both figures indicate that methane concentrations exhibit distinct seasonal variation characteristics and an upward fluctuating trend, with generally higher concentrations in summer and winter seasons and lower concentrations in spring and autumn seasons.
Among the four seasons, spring (March–May) shows generally lower methane concentrations due to lower temperatures and reduced natural and anthropogenic methane emissions. March exhibits the lowest concentrations, after which levels gradually begin to rise. In summer (June–August), methane concentrations start to increase rapidly from June, reaching their peak in August. This summer surge in methane concentrations is primarily attributed to accelerated plant growth promoted by high temperatures, increased emissions from vegetation such as rice paddies, and substantial methane contributions from natural water bodies like rivers and lakes. Additionally, high-temperature decomposition of urban waste and industrial emissions significantly increase methane sources. In autumn (September–November), as temperatures gradually decrease, microbial activity weakens, slowing methane generation, though mean concentrations maintain relatively high levels. In winter (December–February), methane concentrations show relatively high trends due to heating demands, which involve extensive fossil fuel extraction, transportation, and combustion emissions containing high methane levels. Furthermore, winter climate conditions favor methane accumulation.

3.5. Generalization Experiment

Section 3.3 confirms the superior performance of the MFF-SEM model in methane concentration inversion. To validate the model’s generalization ability, data from May 2018 to April 2019 are used as the training set, while data from May 2019 to April 2020 are used as the test set for experimentation.
From the comparative analysis results shown in Table 5, it can be observed that the performance of deep learning models is relatively poor, with R2 values of 0.4300 and 0.4059 for LSTM and 1DCNN, respectively. This is primarily due to the limited data volume of one year and the increasing trend of methane concentrations over time. Deep learning models require large amounts of data to effectively learn sufficient features and patterns, which leads to their poor generalization performance. Compared to other machine learning models, LightGBM, which incorporates techniques such as histogram-based algorithms and leaf-wise growth strategies, significantly improves training speed and prediction efficiency while reducing memory usage. It also demonstrates better generalization performance while enhancing model accuracy, with R2, RMSE, and MAE values of 0.5715, 12.1671, and 9.9433, respectively. The MFF-SEM model used in this study integrates the advantages of multiple base models, enabling it to fully learn the hidden relationships between different features and methane concentrations, achieving the best generalization performance. Specifically, its R2 is 0.5838, RMSE is 11.9903, and MAE is 9.8294. Compared to LightGBM, the R2 is improved by 2.15%, while RMSE and MAE are reduced by 1.45% and 1.14%, respectively.
As shown in Figure 9, the methane concentration inversion distribution maps of different models are displayed. Figure 9a shows the true values of methane concentrations from May 2019 to April 2020. Figure 9b–h present the extrapolation results of the MFF-SEM, XGBoost, RF, LightGBM, GBDT, 1DCNN, and LSTM models, respectively. The proposed MFF-SEM model achieves the best inversion results compared to other models, especially in regions with high methane concentrations, where its performance is significantly better. The XGBoost, RF, LightGBM, and GBDT models show similar results, while the LSTM and 1DCNN models perform poorly, generally underestimating methane concentrations.
Additionally, due to the seasonal characteristics of methane concentration variations, seasonal generalization experiments are also conducted in this study. Table 6 shows the extrapolation results for different seasons, where the training set consists of one full year of data from June 2018 to May 2019, and the test set comprises seasonal data from June 2019 to May 2020.
As shown in Table 6, the MFF-SEM ensemble learning model demonstrates the best overall inversion performance across all four seasons, with R2 values of 0.4733, 0.4755, 0.2534, and 0.6401 respectively. This superior performance is primarily attributed to Stacking’s effective integration of multiple base model predictions, which achieves complementary advantages and effectively reduces the bias and variance of single models, resulting in optimal inversion results. Furthermore, due to the limited sample size, which is unfavorable for deep learning models to fully learn the hidden patterns between features and methane concentrations, machine learning models show better overall inversion performance than deep learning models.
For different seasons, the MFF-SEM model exhibits stable generalization performance in spring, summer, and winter. However, all models perform poorly in autumn, which may be related to uncertain factors affecting methane emissions during this season. For example, autumn is accompanied by changes in agricultural activities, such as harvesting and straw burning, which increase methane emissions. The influence of these uncertainty factors reduces model performance.
Figure 10 shows the comparison between predicted and actual values of the MFF-SEM model for all four seasons, where Figure 10a represents actual values and Figure 10b represents predicted values. As shown in Figure 9, overall, the MFF-SEM model achieves satisfactory inversion results across spring, summer, autumn, and winter seasons. In spring, due to the influence of anomalous values, the model fails to fully capture the actual changes in methane concentrations, leading to overestimation by the MFF-SEM model. For other seasons, predicted methane concentrations are generally lower than actual values, primarily due to imbalanced sample distribution, where samples with high concentrations are relatively few, preventing the model from fully learning data characteristics and hidden associations.

4. Conclusions

To address the issues of single-feature selection and low accuracy in methane concentration inversion, this study proposes a methane concentration inversion method based on multi-feature fusion and the Stacking ensemble model (MFF-SEM). The method combines TROPOMI observation data with meteorological data, auxiliary data, and geographical coordinates. The research conducts experiments in the eastern region of Xinjiang. The experimental findings show that the proposed approach performs better overall compared to other common methane concentration inversion techniques. The following is a summary of the specific conclusions:
(1)
The proposed MFF-SEM ensemble learning model effectively utilizes four base models (XGBoost, RF, LightGBM, and GBDT) and a Lasso meta-model in series-parallel cascade learning to capture different feature representations and pattern expressions from the original data. Through complementary advantages of multiple models, it thoroughly explores the intrinsic associations between multiple features and methane concentrations, achieving the best inversion performance with R2 of 0.9747, RMSE of 2.9294, and MAE of 1.5299.
(2)
SHAP plot analysis reveals that total column water vapor (water_total_column), latitude, and total column ozone (tco3) make significant contributions to the model in methane concentration inversion. Features such as surface pressure (sp) show positive correlations with methane concentration variations, while features like 2 m temperature (t2m) and boundary layer height (blh) exhibit negative correlations with methane concentration changes.
(3)
The mean methane concentrations from June 2019 to May 2020 are higher than those of the previous year, indicating an increasing trend in methane concentrations over the years. Summer methane concentrations are typically higher, primarily due to increased temperatures promoting methane generation and release. The decrease in methane concentrations in early 2020 is mainly attributed to reduced human activities during the pandemic. Overall, methane concentrations exhibit a pattern of higher levels in summer and winter, and lower levels in spring and autumn.
(4)
In terms of extrapolation performance, the proposed MFF-SEM model also outperforms other models. The model achieves its best performance in winter, with an R2 of 0.6401. However, due to the influence of other complex factors, the inversion performance is relatively low in autumn. The overall lower extrapolation performance is primarily related to limited sample size and short time span. Future research will expand the study area and time span and incorporate physical models for in-depth investigation.

Author Contributions

Conceptualization, Y.H. and W.L.; methodology, Y.H. and W.L.; software, W.L.; validation, Y.H. and W.L.; formal analysis, C.Y. and Y.Z.; investigation, Y.Z., C.Y. and W.L.; resources, G.S. and Y.Z.; data curation, G.S., W.L. and Y.Z.; writing—original draft preparation, W.L.; writing—review and editing, Y.H.; visualization, W.L.; supervision, Y.H.; project administration, Y.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant number: 42176175, 42271335).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors gratefully acknowledge the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing the ERA5 data through the Copernicus Climate Change Service (C3S).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Qin, D.H. Climate change science and sustainable development. Prog. Geogr. 2014, 33, 874–883. [Google Scholar]
  2. Yang, Q.; Guan, L.; Tao, F.; Liang, M.; Sun, W.Q. Changes of CH4 concentrations obtained by ground-based observations at five atmospheric background stations in China. Environ. Sci. Technol. 2018, 41, 1–7. [Google Scholar]
  3. Chen, J.D.; Zuo, Q.W.; Sun, M.S.; Huang, L.Z.; Shi, J.S. Research progress of satellite remote sensing detection of methane. Remote Sens. Inf. 2024, 39, 1–11. [Google Scholar]
  4. Ai, X.; Hu, C.; Yang, Y.; Zhang, L.; Liu, H.; Zhang, J.; Chen, X.; Bai, G.; Xiao, W. Quantification of Central and Eastern China’s atmospheric CH4 enhancement changes and its contributions based on machine learning approach. J. Environ. Sci. 2024, 138, 236–248. [Google Scholar] [CrossRef] [PubMed]
  5. Wójcik-Gront, E.; Wnuk, A. Evaluating Methane Emission Estimates from Intergovernmental Panel on Climate Change Compared to Sentinel-Derived Air–Methane Data. Sustainability 2025, 17, 850. [Google Scholar] [CrossRef]
  6. Yan, S.; Xie, Y.; Han, G.; Meng, X.; Li, Z. Methane Dynamics in Inner Mongolia: Unveiling Spatial and Temporal Variations and Driving Factors. Proceedings 2024, 110, 29. [Google Scholar] [CrossRef]
  7. Zhang, J.B. Chemical behavior of methane in atmosphere. Res. Environ. Sci. 1996, 9, 10–15. [Google Scholar]
  8. Xie, B.; Zhang, H.; Yang, D.D. A modeling study of effective radiative forcing and climate response due to the change in methane concentration. Adv. Clim. Change Res. 2017, 13, 83–88. [Google Scholar]
  9. Zhang, D.Y.; Liao, H. Advances in the research on sources and sinks of CH4 and observations and simulations of CH4 concentrations. Adv. Meteorol. Sci. Technol. 2015, 5, 40–47. [Google Scholar]
  10. Etminan, M.; Myhre, G.; Highwood, E.J.; Shine, K.P. Radiative forcing of carbon dioxide, methane, and nitrous oxide: A significant revision of the methane radiative forcing. Geophys. Res. Lett. 2016, 43, 12614–12623. [Google Scholar] [CrossRef]
  11. Bi, Y.; Chen, Y.J.; Xu, L.; Deng, S.M.; Zhou, R.J. Analysis of H2O and CH4 distribution characteristics in the middle atmosphere using HALOE data. Chin. J. Atmos. Sci. 2007, 31, 440–448. [Google Scholar]
  12. Jia, Y.Z.; Tao, M.H.; Ding, S.J.; Liu, H.Y.; Zeng, M.Y.; Chen, L.F. Spatial and temporal distribution of XCO2 and XCH4 in China based on satellite remote sensing. J. Atmos. Environ. Opt. 2022, 17, 679–692. [Google Scholar]
  13. Zhou, M.; Langerock, B.; Vigouroux, C.; Sha, M.K.; Ramonet, M.; Delmotte, M.; Mahieu, E.; Bader, W.; Hermans, C.; Kumps, N.; et al. Atmospheric CO and CH4 time series and seasonal variations on Reunion Island from ground-based in situ and FTIR (NDACC and TCCON) measurements. Atmos. Chem. Phys. 2018, 18, 13881–13901. [Google Scholar] [CrossRef]
  14. He, Z.; Li, Z.; Fan, C.; Zhang, Y.; Shi, Z.; Zheng, Y.; Han, Y. Satellite sensors and retrieval algorithms of atmospheric methane. Acta Opt. Sin. 2023, 43, 55–71. [Google Scholar]
  15. Yao, L.; Yang, D.X.; Cai, Z.N.; Zhu, S.H.; Liu, Y.; Deng, J.B.; Lu, N.M. Status and trend analysis of atmospheric methane satellite measurement for carbon neutrality and carbon peaking in China. Chin. J. Atmos. Sci. 2022, 46, 1469–1483. [Google Scholar]
  16. Zou, M.; Xiong, X.; Wu, Z.; Li, S.; Zhang, Y.; Chen, L. Increase of Atmospheric Methane Observed from Space-Borne and Ground-Based Measurements. Remote Sens. 2019, 11, 964. [Google Scholar] [CrossRef]
  17. Zhou, L.; Tang, J.; Wen, Y.; Worthy, D.; Trivet, N.; Zhang, X.; Ji, J.; Zheng, M.; Tans, P.; Conway, T. Characteristics of atmospheric methane concentration variation at Mt. Waliguan. J. Appl. Meteorol. Sci. 1998, 9, 2–8. [Google Scholar]
  18. Cao, S.; Zhang, S.; Gao, C.; Yan, Y.; Bao, J.; Su, L.; Liu, M.; Peng, N.; Liu, M. A long-term analysis of atmospheric black carbon MERRA-2 concentration over China during 1980–2019. Atmos. Environ. 2021, 264, 118687. [Google Scholar] [CrossRef]
  19. Xiao, Z.Y.; Lin, X.F.; Gao, X.; Chen, Y.F.; Wang, C.P.; Shi, Y.Q.; Chen, J.F.; Liu, S.H.; Xie, J.H. Study on the temporal and spatial variation of CH4 and its driving factors over China from 2010 to 2019. Environ. Sci. Technol. 2023, 46, 147–155. [Google Scholar]
  20. Zhang, X.; Bai, W.; Zhang, P.; Wang, W. Spatiotemporal variations in mid-upper tropospheric methane over China from satellite observations. Chin. Sci. Bull. 2011, 56, 2804–2811. [Google Scholar] [CrossRef]
  21. Schepers, D.; Guerlet, S.; Butz, A.; Landgraf, J.; Frankenberg, C.; Hasekamp, O.; Blavier, J.; Deutscher, N.M.; Griffith, D.W.T.; Hase, F.; et al. Methane retrievals from Greenhouse Gases Observing Satellite (GOSAT) shortwave infrared measurements: Performance comparison of proxy and physics retrieval algorithms. J. Geophys. Res. Atmos. 2012, 117, D10302. [Google Scholar] [CrossRef]
  22. Zhang, S.H.; Xie, B.; Zhang, H.; Zhou, X.X.; Wang, Q.Y.; Yang, D.D. The spatial-temporal distribution of CH4 over globe and East Asia. China Environ. Sci. 2018, 38, 4401–4408. [Google Scholar]
  23. He, Q.; Yu, T.; Gu, X.F.; Cheng, T.H.; Zhang, Y.; Xie, D.H. Global atmospheric methane variation and temporal-spatial distribution analysis based on ground-based and satellite data. Remote Sens. Inf. 2012, 27, 35. [Google Scholar]
  24. Li, S.W. Ground Methane Emission Monitoring in China Based on TROPOMI Satellite Observations. Master’s Thesis, China University of Mining and Technology, Beijing, China, 2023. [Google Scholar]
  25. Zhang, X.; Zhang, Y.; Meng, F.; Tao, J.; Wang, H.; Wang, Y.; Chen, L. Methane Retrieval from Hyperspectral Infrared Atmospheric Sounder on FY3D. Remote Sens. 2024, 16, 1414. [Google Scholar] [CrossRef]
  26. Jiang, Y.; Zhang, L.; Zhang, X.; Cao, X. Methane Retrieval Algorithms Based on Satellite: A Review. Atmosphere 2024, 15, 449. [Google Scholar] [CrossRef]
  27. Wang, J.P.; Wu, X.D.; Ma, D.J.; Wen, J.G.; Xiao, Q. Remote sensing retrieval based on machine learning algorithm: Uncertainty analysis. Natl. Remote Sens. Bull. 2023, 27, 790–801. [Google Scholar]
  28. Sha, T.; Li, L.Q.; Yan, S.Q.; Yang, S.Y.; Li, Y.; Dong, Z.P.; Chen, Q.C. Review of Machine Learning in Air Pollution Research. Environ. Sci. 2025, 1–18. [Google Scholar] [CrossRef]
  29. Guo, H.H.; Zhu, W.X.; Zhang, X.Y.; Zhang, H.; Wei, Y.X.; Hou, X.; Xun, N.N. Temporal and spatial distribution characteristics and influencing factors of near-surface methane concentration in China. China Environ. Sci. 2024, 44, 593–601. [Google Scholar]
  30. Dong, H.L.; Wang, W.T.; Xie, Y.; Aydana, Y.; Jiang, Y.T.; Xu, J.Q. Climate dry-wet conditions, changes, and their driving factors in Xinjiang. Arid Zone Res. 2023, 40, 1875–1884. [Google Scholar]
  31. Liu, M.X.; Sun, R.D.; Song, J.Y.; Zhang, Y.Y.; Li, B.W.; Yu, R.X.; Li, L. Research on ozone column concentration in Xinjiang based on OMI data. China Environ. Sci. 2021, 41, 1498–1510. [Google Scholar]
  32. Lyu, R.Q.; Li, X. Comparison between the applicability of ERA-Interim and ERA5 reanalysis in Jiangsu Province. Mar. Forecast. 2021, 38, 27–37. [Google Scholar]
  33. Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar]
  34. Schneising, O.; Buchwitz, M.; Hachmeister, J.; Vanselow, S.; Reuter, M.; Buschmann, M.; Bovensmann, H.; Burrows, J.P. Advances in retrieving XCH4 and XCO from Sentinel-5 Precursor: Improvements in the scientific TROPOMI/WFMD algorithm. Atmos. Meas. Tech. 2023, 16, 669–694. [Google Scholar]
  35. Lindqvist, H.; Kivimäki, E.; Häkkilä, T.; Tsuruta, A.; Schneising, O.; Buchwitz, M.; Lorente, A.; Martinez Velarte, M.; Borsdorff, T.; Alberti, C.; et al. Evaluation of Sentinel-5P TROPOMI Methane Observations at Northern High Latitudes. Remote Sens. 2024, 16, 2979. [Google Scholar] [CrossRef]
  36. Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobotics 2013, 7, 21. [Google Scholar]
  37. Wan, Y.; Chen, F.; Fan, L.; Sun, D.; He, H.; Dai, Y.; Li, L.; Chen, Y. Conversion of surface CH4 concentrations from GOSAT satellite observations using XGBoost algorithm. Atmos. Environ. 2023, 301, 119697. [Google Scholar]
  38. Ding, C.L.; Zheng, H.B. Study of PM2.5 concentration prediction model based on improved machine learning. J. Dalian Univ. Technol. 2024, 64, 353–360. [Google Scholar]
Figure 1. Data processing procedure.
Figure 1. Data processing procedure.
Sensors 25 01974 g001
Figure 2. Framework diagram of Stacking ensemble learning.
Figure 2. Framework diagram of Stacking ensemble learning.
Sensors 25 01974 g002
Figure 3. Experimental results of different feature combinations.
Figure 3. Experimental results of different feature combinations.
Sensors 25 01974 g003
Figure 4. SHAP summary plot.
Figure 4. SHAP summary plot.
Sensors 25 01974 g004
Figure 5. SHAP feature importance plot.
Figure 5. SHAP feature importance plot.
Sensors 25 01974 g005
Figure 6. Density scatter plots of actual values versus predicted values for different models.
Figure 6. Density scatter plots of actual values versus predicted values for different models.
Sensors 25 01974 g006
Figure 7. Seasonal average methane concentrations.
Figure 7. Seasonal average methane concentrations.
Sensors 25 01974 g007
Figure 8. Monthly average methane concentration.
Figure 8. Monthly average methane concentration.
Sensors 25 01974 g008
Figure 9. Different model predictions of methane concentration distribution.
Figure 9. Different model predictions of methane concentration distribution.
Sensors 25 01974 g009
Figure 10. Comparison of true values and predicted values for the MFF-SEM.
Figure 10. Comparison of true values and predicted values for the MFF-SEM.
Sensors 25 01974 g010
Table 1. Data information and sources.
Table 1. Data information and sources.
NameSpatial ResolutionTemporal ResolutionData Source
Meteorological features (u10, v10, t2m, d2m, cdir, alnid, sp, ssr, tco3, blh)0.25° × 0.25°1 hERA5 dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF)
Auxiliary features7 km × 7 km1 dTropospheric Monitoring Instrument (TROPOMI)
CH47 km × 7 km1 dTropospheric Monitoring Instrument (TROPOMI)
Table 2. Partial model parameters.
Table 2. Partial model parameters.
ModelParameter Setting
XGBoostn_estimators = 200
learning_rate = 0.01
max_depth = 10
GradientBoostingn_estimators = 200
max_depth = 5
learning_rate = 0.1
Random Forestn_estimators = 200
max_depth = 15
LightGBMnum_leaves = 200
max_depth = 30
learning_rate = 0.05
Lassoalpha = 1.0
max_iter = 1500
Table 3. Three different feature combinations.
Table 3. Three different feature combinations.
Feature CombinationFeature Description
F1meteorological factors
F2meteorological factors and auxiliary data
F3meteorological factors, auxiliary data, and latitude and longitude
Table 4. Comparison of evaluation metrics for different models.
Table 4. Comparison of evaluation metrics for different models.
ModelR2RMSEMAE
LSTM0.671811.91989.0477
1DCNN0.88856.91115.2564
GBDT0.86437.62445.7510
LightGBM0.94354.92063.7154
RF0.94794.06372.1072
XGBoost0.96733.22211.7284
MFF-SEM0.97472.82941.5299
Table 5. Comparison of model extrapolation evaluation metrics.
Table 5. Comparison of model extrapolation evaluation metrics.
ModelR2RMSEMAE
LSTM0.430014.032411.2621
1DCNN0.405914.326611.5291
GBDT0.536512.654210.2950
LightGBM0.571512.16719.9433
RF0.453813.736111.1318
XGBoost0.529512.749310.3821
MFF-SEM0.583811.99039.8294
Table 6. Prediction performance metrics for different seasons (R2).
Table 6. Prediction performance metrics for different seasons (R2).
ModelSpringSummerAutumnWinter
1DCNN0.23860.1224−0.07940.5078
LSTM0.38860.12340.14590.4343
RF0.33020.33000.01460.5174
LightGBM0.41500.48750.22620.6075
XGBoost0.43010.38810.15180.6066
GBDT0.45590.40650.19470.5650
MFF-SEM0.47330.47550.25340.6401
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, Y.; Li, W.; Yi, C.; Song, G.; Zhang, Y. Methane Concentration Inversion Based on Multi-Feature Fusion and Stacking Integration. Sensors 2025, 25, 1974. https://doi.org/10.3390/s25071974

AMA Style

Han Y, Li W, Yi C, Song G, Zhang Y. Methane Concentration Inversion Based on Multi-Feature Fusion and Stacking Integration. Sensors. 2025; 25(7):1974. https://doi.org/10.3390/s25071974

Chicago/Turabian Style

Han, Yanling, Wei Li, Congqin Yi, Ge Song, and Yun Zhang. 2025. "Methane Concentration Inversion Based on Multi-Feature Fusion and Stacking Integration" Sensors 25, no. 7: 1974. https://doi.org/10.3390/s25071974

APA Style

Han, Y., Li, W., Yi, C., Song, G., & Zhang, Y. (2025). Methane Concentration Inversion Based on Multi-Feature Fusion and Stacking Integration. Sensors, 25(7), 1974. https://doi.org/10.3390/s25071974

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop