Next Article in Journal
A Review of Solid-State LiDAR Principles and Metasurface-Based LiDAR Sensors
Previous Article in Journal
Laser-Induced Breakdown Spectroscopy Analysis of Lithium: A Comprehensive Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Enhanced NDIR Methane Sensing Solution for Robust Outdoor Continuous Monitoring Applications

School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(24), 7691; https://doi.org/10.3390/s25247691
Submission received: 19 November 2025 / Revised: 6 December 2025 / Accepted: 12 December 2025 / Published: 18 December 2025

Abstract

This work presents the development of a low-cost and high-performance multi-sensory gas detection instrument named the AIMNet Sensor, with the integration of a machine learning-based data processing method. The compact and low-power instrument (8.5 × 11.5 cm, 1.4 W) houses the core sensing hardware module, Senseair K96, that integrates both a non-dispersive infrared (NDIR)-based gas sensing unit and a BME280 environmental sensing unit. To address the outdoor operation challenges caused by environmental fluctuation due to the varying temperature, humidity, and pressure, from the software aspect, multiple machine learning-based regression models were trained in this work on 13,125 calibration data points collected under controlled laboratory conditions. Among ten tested algorithms, the Multilayer Perceptron (MLP) and Elastic Net models achieved the highest accuracy, with R-squared coefficient R 2 > 0.8 on both indoor and outdoor scenarios, and with inter-sensor root mean square error (RMSE) within 1.5 ppm across four identical instruments. Moreover, field mobile validation was performed near a wastewater management facility using this solution, confirming a strong correlation with LI-COR reference measurements and a reliable detection of CH 4 leaks with concentrations up to 18 ppm at the test site. Overall, this machine learning-integrated NDIR sensing solution (i.e., AIMNet) offers a practical and scalable solution towards a more robust distributed CH 4 monitoring network for real-world field-deployable applications.

1. Introduction

Methane ( CH 4 ) is the second most important anthropogenic greenhouse gas after carbon dioxide ( CO 2 ), but contributes disproportionately to near-term climate forcing due to its strong infrared absorption bands and high radiative efficiency [1,2,3]. Although its atmospheric abundance (∼1.9 ppm) is far lower than that of CO 2 , its per-molecule warming effect is more than twenty times stronger on a 100-year timescale and over eighty times stronger on a 20-year timescale [3]. Because of its relatively short atmospheric lifetime of about 9–12 years, reducing CH 4 emissions can yield substantial and rapid climate benefits [4]. Globally, CH 4 emissions mainly originate from leaks and intentional releases associated with anthropogenic activities such as oil and gas operations, industrial production, and wastewater treatment [5,6]. Given its potent warming potential and short atmospheric lifetime, accurate detection and quantification of CH 4 emissions are essential for effective climate mitigation, environmental monitoring, and regulatory compliance [7,8].
Various CH 4 monitoring platforms have been explored and successfully deployed especially in the past decade. Among these, satellite, airborne, and ground-based observations have proven to be the most widely utilized and reliable for large-scale applications. Satellite-based instruments such as the Tropospheric Monitoring Instrument (TROPOMI) and GHGSat provide unparalleled global coverage and consistent long-term datasets for atmospheric CH 4 retrieval [9,10]. They are effective for identifying large-scale regional emission patterns and supporting global CH 4 inventories [11]. However, their limited spatial resolution (e.g., kilometers scale) and low revisit times (e.g., days or weeks scale) limit their ability to resolve emissions at individual facilities or transient emission events [10,12]. Airborne CH 4 monitoring bridges the gap between satellite and ground-based systems by offering higher spatial resolution and flexible deployment. Instruments mounted on aircraft or UAVs, such as AVIRIS-NG, have detected CH 4 plumes with meter-scale precision over industrial sites and landfills [13,14]. These platforms can operate below cloud cover and provide enhanced signal-to-noise performance relative to satellite sensors, enabling targeted surveys of specific facilities [14]. However, their operational cost, dependency on favorable weather conditions, and logistical complexity constrain them to episodic campaigns rather than continuous deployment [13].
Compared with satellite or airborne platforms, ground-based systems offer superior temporal coverage and detection sensitivity. Fixed-site analyzers, tower-based or distributed sensing networks, have been widely deployed across production fields, transmission and storage facilities, and urban areas to characterize emission variability [15,16,17,18]. Such near-field observations complement top-down remote-sensing approaches and are essential for validating regional CH 4 inventories and evaluating mitigation effectiveness [10,18]. However, most of them require frequent calibration, maintenance, and reliable power and communication infrastructure for long-term operation [16,19]. This is particularly true when it comes to the distributed point sensors for on-site continuous monitoring purposes.
Regarding the continuous monitoring distributed sensing network, it is also worth noting that the selection of an appropriate sensing technology is essential to achieve reliable monitoring outcomes. Common CH 4 detection principles include catalytic combustion (pellistor), metal oxide semiconductors (MOSs), thermal conductivity (TCD), and optical methods such as non-dispersive infrared (NDIR) and laser absorption spectroscopy (e.g., TDLAS, CRDS) [20,21,22,23,24]. Catalytic and MOS sensors are inexpensive and respond quickly to high CH 4 concentrations but suffer from limited selectivity, humidity- and temperature-dependent drift, and poisoning effects that reduce their long-term stability in outdoor conditions [20]. TCD sensors quantify gas thermal properties and are robust but generally lack the sensitivity required for ambient CH 4 monitoring and are strongly affected by environmental variations [21]. Laser-based systems, including TDLAS and CRDS, provide sub-ppm precision and excellent stability, yet their cost, size, and power consumption make them unsuitable for distributed, low-power field networks [22]. In contrast, NDIR sensors, which operate based on the Beer–Lambert absorption law near the 3.3 µm CH 4 band, offer an optimal compromise between precision, durability, and cost for continuous outdoor operation [23,24,25]. Therefore, in comparison, the NDIR CH 4 sensor is regarded as the preferable solution to create the ground-based distributed sensing network for outdoor infrastructure emission monitoring applications.
Nevertheless, when it comes to the outdoor CH 4 detection, the device is still inevitably affected by environmental factors due to the fact that not only each sensing component including light source, sampling chamber and photodetectors are temperature and humidity sensitive, but also the gas absorption coefficient has a complex dependency to the environment, while these influences and challenges will be discussed in detail in the following chapter, it is pointed out that to develop an advanced data processing algorithm that can de-convolute the device’s sampled signal from such environmental dependency is critically important to mitigate environmental interferences and improve measurement accuracy. With that in mind, this work thus focuses on developing such a robust software algorithm based on machine learning models, and integrating it with a low cost NDIR CH 4 sensing hardware, to achieve continuous and reliable outdoor measurements to support ground-based CH 4 plume observation under diverse environmental conditions.

2. Methodology of the Integrated Sensing Instrument

In this section, the methodology presents the development of the machine learning-based calibration framework for a low-cost NDIR CH 4 sensor intended for field-deployable outdoor monitoring. The methodology is divided into two parts. The first describes the hardware system, including the AIMNet gas sensing hardware design and controlled calibration setup for the machine learning training. The second part mainly focuses on the model selection and validation method for the machine learning.

2.1. AIMNet NDIR Gas Sensing Hardware

In terms of the hardware, the NDIR gas sensor we selected for the study is manufactured by Senseair AB (Delsbo, Sweden) called K96, within which a BME280 environmental sensor is integrated to record ambient pressure, humidity, and temperature, providing necessary environmental context for CH 4 concentration analysis, and the physical detection limit of the K96 module is on the order of 0.5 ppm for CH 4 . This K96 functions as a core sensing module within our fully operational gas monitoring instrument. As illustrated in Figure 1, our complete sensor device was designed and assembled around this K96 module together with a custom embedded PCB control board. The embedded controller, based on an STM32U083RCT6 microcontroller mounted directly on the PCB, eliminates unnecessary I/O ports and unused circuitry that commercial microcontroller development boards usually carry, thereby maintaining the overall system integrity while reducing both cost and power consumption. Specifically, the instrument measures 8.5 × 11.5 cm and houses all components within an environmentally sealed enclosure for extended field deployment. The total power consumption is approximately 1.4 W, which enables long-term autonomous operation when powered by a solar panel system in remote outdoor environments. In addition, an air-inlet filter was installed at the sampling port to prevent dust and insect ingress, ensuring reliable performance and sensor longevity in outdoor conditions. To ensure a stable and sufficient gas intake, we integrated an air pump (ZR370-02PM), which can provide up to 2.5 L min 1 flow rate to capture sufficient sample gas when needed. The pump flow rate is controlled via pulse-width modulation (PWM), allowing us to adjust the sampling flow for different operating modes, such as long-term and short-term monitoring, power-saving mode, and driving mode. For data communication and remote control, the device incorporates a SIM7070G cellular modem and an on-board LTE antenna (model A1004795), enabling near real-time transmission of CH 4 measurements and device status to a central server when needed. The sensor is sampled internally at 1 Hz, while the data-upload interval is configurable and can be set from 1 Hz down to much slower reporting rates (e.g., ten minutes or hours) depending on power and bandwidth constraints. This architecture allows near real-time signal processing (within 1 ms per data), data management, and GIS-based visualization to be performed in the cloud rather than being limited by the on-board hardware. A microSD card slot is also integrated inside the device to provide local data backup in case of temporary network or data-transmission failures, and the entire electronics are implemented using surface-mount technology (SMT) to improve robustness and manufacturability [26,27,28].
The integrated system acquires data from the K 96 CH4 sensor at a frequency of 1 Hz. Figure 2 presents one representative dataset collected during a field test. The left panel displays the BME280 measurements of temperature, humidity, and pressure, while the right panel shows the raw infrared absorption signals corresponding to H2O, CH 4 , and CO 2 channels. The observed signal variation is considerably larger than what would be expected for accurate gas concentration readings, indicating that the raw sensor output could be influenced by environmental factors. From Figure 2, it can be seen that the H2O channel strongly follows the trend of humidity and temperature changes, and the CH4 channel partially follows the variations of the H2O signal and ambient pressure. It is obvious that even though the device is placed inside a sealed and water proof enclosure, its optical measurements are still affected by surrounding environmental fluctuations.
To understand this environmental impact, We can start with the operating principle of the NDIR CH4 sensor—the Beer–Lambert law, which describes the exponential attenuation of infrared radiation as it passes through an absorbing medium. Based on the Beer–Lambert Law, the relationship between gas concentration and transmitted light intensity can be expressed as
I = I 0 exp ( α c L )
In practice, the absorption coefficient α of CH 4 is not constant, as it varies with gas temperature, pressure, and the presence of interfering absorbers such as water vapor molecules (H2O) [22,29,30,31]. Temperature changes modify the molecular population distribution and consequently alter the absorption line strength, while humidity introduces overlapping absorption bands near the CH4 fundamental at 3.3 µm, leading to cross-sensitivity and baseline drift. As a result, uncorrected variations in environmental temperature and humidity can distort the measured transmittance I / I 0 and produce biased CH4 readings. Furthermore, the performance of each sensing components within the NDIR module including light source, sampling chamber and photodetectors are of temperature or humidity dependency. With all these being considered, the fluctuations of the CH4 signal observed in the field measurements is therefore of a strong tie back to the environmental temperature and humidity fluctuations. As a result, implementing effective temperature and humidity compensation algorithm is essential to minimize such environmental influences and ensure accurate CH4 quantification in outdoor conditions.
It is important to note that, as shown in Figure 2, the CH4 signal exhibits a very complicated non-linear relationship with temperature and humidity for this sensor. Moreover, the magnitude of the environmental noise is significantly larger than the variation in the CH4 signal itself, making it difficult to isolate and remove such interference using conventional signal-processing techniques. Several recent studies have demonstrated the successful application of machine learning methods in gas sensing tasks, achieving improved prediction accuracy under varying environmental conditions [32,33]. However, limited research has specifically focused on applying such approaches to NDIR-based CH4 sensing. Motivated by these findings, in this research, multiple machine learning models commonly used in gas sensing applications were implemented and evaluated to determine whether they could effectively compensate for environmental interferences and improve CH4 concentration estimation accuracy.

2.2. Data Calibration

To correctly and comprehensively analyze the sensor data, it is essential to first obtain a sufficiently large dataset that captures a wide range of outdoor environmental conditions. Several similar experimental setups for NDIR-type gas sensors have been reported in previous studies [32,33]. The most important requirement is to establish a controlled experimental environment in which the gas concentration, humidity, and temperature can be systematically varied.
Figure 3 illustrates the laboratory setup used for data collection to evaluate the performance of the CH4 sensor under different environmental conditions. To get the absolute value of the CH4 reading, we mixed the CH4 and N2 calibration gas for data collection. As shown in the diagram, two mass flow controllers (MFCs) from Alicat Scientific (Tucson, Arizona, USA) are employed to regulate the flow rates of CH4 and N2 calibration gases, thereby controlling the CH4 concentration in the test mixture. The calibration gases provide a dry and high-purity baseline. One of the gas lines is passed through a water bubbler to increase the humidity level, while the ratio between the humidified and dry gas streams determines the relative humidity of the gas mixture entering the sensor chamber. The AIMNet CH 4 sensor is then connected downstream and placed inside a temperature-controlled oven to precisely adjust the testing temperature. Furthermore, as illustrated in the figure, the gas outlet of the AIMNet sensing unit was directly connected to a high-precision LI-COR 7810 analyzer (LI-COR Biosciences, Lincoln, NE, USA), which served as the reference instrument for accurate CH 4 concentration measurements, with a nominal precision of approximately 0.25 ppb (1 σ ) at ambient CH 4 levels.
To maximize the effectiveness of the data analysis, a comprehensive dataset was constructed to cover a wide range of environmental conditions. As shown in Figure 4, the dataset includes measurements collected under five distinct humidity levels, with relative humidity ranging from 0% to 80%. For each humidity condition, tests were conducted at five temperature settings between 25 °C and 55 °C, with an increment of 5 °C. Within each temperature and humidity combination, the CH 4 concentration was gradually varied from 0 to 15 ppm in 3 ppm increments, allowing several minutes at each step to ensure signal stabilization. Subsequently, the concentration was increased from 15 to 65 ppm in 10 ppm steps. In total, over 15,000 data points were recorded during the calibration process.

2.3. Data Preprocessing and Split

Before model training, several preprocessing steps were applied to ensure data quality and consistency. First, all records containing invalid or corrupted sensor outputs were removed, including communication errors, negative readings, and physically impossible values. Data collected during the sensor warm-up period were also discarded, as the NDIR module exhibits unstable absorption signals immediately after powering on. In addition, continuously repeated data points were filtered out to prevent oversampling of identical values. These preprocessing steps discard only a small fraction of the data. And based on inspection of the CH 4 and environmental-variable histograms before and after filtering, these steps do not materially alter the underlying data distributions.
After filtering out invalid signals and duplicate entries, a total of 13,125 valid data points were retained for model development. To ensure fairness and avoid temporal bias, the complete dataset was randomly shuffled before splitting, and all normalization parameters were computed strictly from the training subset to prevent information leakage. All models were trained with fixed random seeds for reproducibility, and their performance was evaluated using multiple datasets described below.
The shuffled dataset was then randomly divided into two subsets: 80% for training (Data-train) and 20% for validation (Data-val). Because the calibration dataset was collected using a pure N 2 carrier gas, additional tests were conducted to assess model robustness under more realistic conditions. Two separate test datasets were created for this purpose. The first test set (Data-test1) corresponds to indoor measurements using a pump to deliver CH 4 calibration gas, while the second test set (Data-test2) was collected outdoors under stable environmental conditions using portable CH 4 calibration gas. In all experiments, the reference analyzer provided the ground-truth CH 4 concentration values.

2.4. Performance Evaluation

According to the U.S. Environmental Protection Agency’s (EPA) recommended performance evaluation framework for low-cost air sensors [34], the performance of the developed CH 4 sensor was assessed using two widely adopted quantitative metrics: the root-mean-square error (RMSE) and the coefficient of determination ( R 2 ). These metrics are defined as
RMSE = 1 n i = 1 n ( y ^ i y i ) 2
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
where y i represents the reference measurements, y ^ i denotes the sensor-estimated concentrations, and y ¯ is the mean of the reference data. To ensure external validity beyond controlled laboratory conditions, collocated field measurements were conducted in two distinct outdoor environments. Multiple identical sensing units were deployed in parallel to quantify unit-to-unit variability, while a high-precision reference analyzer provided traceable ground-truth CH 4 concentrations.

2.5. Machine Learning Algorithm

To comprehensively evaluate the performance of various machine learning models on the regression task for this CH 4 sensor, we tested more than ten widely used algorithms that have shown potential in gas-sensing and environmental-prediction applications. The models exhibiting the best overall performance, as well as those employing specialized learning mechanisms, are summarized below.

2.5.1. Multiple Linear Regression

Multiple Linear Regression (MLR) is a fundamental statistical approach used to describe the relationship between one dependent variable and several independent variables. It assumes that the target gas concentration can be represented as a linear combination of multiple predictors, such as sensor signals and environmental factors (temperature, humidity, and pressure). The model can be written as
y = β 0 + β 1 x 1 + β 2 x 2 + + β n x n + ϵ ,
where y is the predicted concentration, x i are the input features, β i are the regression coefficients, and ϵ is the residual error. MLR offers a transparent and computationally efficient framework for gas-sensor calibration, allowing environmental parameters to be directly integrated for compensation. Previous studies have demonstrated that applying MLR to low-cost or NDIR-based gas sensors can effectively improve accuracy and stability in variable outdoor environments [35,36,37].

2.5.2. Elastic Net Regression

Elastic Net Regression (ENR) is a regularized linear modeling approach that integrates the strengths of both Ridge ( L 2 ) and Lasso ( L 1 ) regression [38]. It overcomes the limitations of traditional linear models in handling correlated predictors by combining variable selection and coefficient shrinkage within a unified framework. The objective function of the Elastic Net can be formulated as
L ( β ) = y X β 2 2 + λ 1 β 1 + λ 2 β 2 2 ,
where y represents the vector of observed values, X is the predictor matrix, β is the coefficient vector, and λ 1 and λ 2 are regularization parameters controlling the L 1 and L 2 penalties, respectively. This hybrid penalty encourages sparse feature selection while maintaining stability in the presence of multicollinearity. Due to its balance between interpretability and predictive power, Elastic Net has become one of the most commonly used linear regularization techniques in statistical learning and data-driven modeling.

2.5.3. Support Vector Regression

Support Vector Regression (SVR) is an extension of the Support Vector Machine (SVM) framework for solving regression problems [39,40]. It aims to find a regression function that fits the data within a predefined tolerance while maintaining maximum flatness. The standard linear SVR model is expressed as
f ( x ) = w , x + b ,
where w is the weight vector and b is the bias term. The optimization seeks to minimize w 2 under the constraint
| y i f ( x i ) | ϵ ,
where ϵ defines the maximum allowable deviation from the true values. By using kernel functions such as the radial basis function (RBF), SVR can efficiently capture nonlinear relationships between input variables and the target gas concentration while preserving high generalization ability. SVR is particularly advantageous in environmental sensing tasks due to its robustness to noise and its ability to handle complex, multidimensional feature spaces.

2.5.4. CatBoost Regression

CatBoost Regression (CBR) is a gradient boosting algorithm that builds an ensemble of decision trees while efficiently handling categorical features and reducing prediction bias [41,42]. It extends the principles of gradient boosting by introducing an ordered boosting mechanism that prevents target leakage and overfitting when training with categorical data. The model iteratively constructs weak learners to minimize a differentiable loss function, typically expressed as
L = i = 1 n ( y i , f m 1 ( x i ) + γ m h m ( x i ) ) ,
where is the loss function, f m 1 is the current ensemble model, h m is the newly added tree, and γ m is the learning rate controlling update strength. CatBoost employs symmetric (oblivious) trees and an advanced encoding scheme for categorical features, enabling it to deliver high accuracy even with limited data preprocessing. It provides strong generalization performance, excellent handling of non-linear dependencies, and fast convergence, making it particularly suitable for environmental prediction and sensor calibration tasks where data are heterogeneous and partially categorical in nature.

2.5.5. Random Forest Regression

Random Forest Regression (RFR) is an ensemble learning algorithm that constructs a large number of decision trees and combines their predictions through averaging to improve accuracy and reduce overfitting [43]. Each tree in the forest is trained on a bootstrap sample of the original dataset, and at each split, a random subset of features is selected to ensure model diversity. The regression output of the ensemble is given by
y ^ = 1 N i = 1 N f i ( x ) ,
where N is the number of trees and f i ( x ) represents the prediction from the ith decision tree. By aggregating multiple weak learners, RFR achieves high predictive stability, robustness to noise, and the ability to model nonlinear relationships between variables. Its interpretability through feature-importance analysis and resilience to multicollinearity make it particularly suitable for environmental data modeling and gas-sensor calibration, where complex interactions often exist among temperature, humidity, and sensor responses.

2.5.6. Multilayer Perceptron Regression

Multilayer Perceptron (MLP) regression is a feedforward artificial neural network model capable of approximating complex nonlinear mappings between input and output variables [44,45]. An MLP consists of an input layer, one or more hidden layers, and an output layer, where each neuron applies a nonlinear activation function to a weighted sum of its inputs. The model can be expressed as
y ^ = f ( x ) = σ W 2 ϕ ( W 1 x + b 1 ) + b 2 ,
where W 1 , W 2 and b 1 , b 2 are the weight matrices and bias vectors, ϕ ( · ) is the activation function (e.g., ReLU or tanh), and σ ( · ) denotes the output mapping function. During training, the model parameters were optimized using the backpropagation algorithm to minimize a loss function such as mean squared error (MSE). Because of its ability to capture nonlinear dependencies and interactions among multiple input variables, MLP regression provides superior flexibility for modeling sensor data and compensating for environmental variations in complex real-world scenarios.

3. Results and Discussion

3.1. Training and Validation

To obtain a comprehensive and reliable assessment, all models were trained using the cleaned and standardized dataset described previously. Hyperparameters for the linear models (MLR and Elastic Net), Support Vector Regression, Random Forest, and CatBoost were tuned through grid search or built-in cross-validation routines, while the Multilayer Perceptron (MLP) network was trained using the Adam optimizer with tuned learning rate, weight decay, and dropout rate. We monitored the prediction performance on both Data-Val and Data-Test1 during training. An early-stopping strategy was employed for the neural network to prevent overfitting and training drift: once the monitored metric on the validation stream began to deteriorate (i.e., a sustained decrease in R 2 or an increase in the validation loss), training was halted and the best checkpoint was retained. This joint monitoring of Data-Val and Data-Test1 ensures that the selected models do not overfit to a single split while still capturing stable structure in the data. After systematic hyperparameter exploration and feature-combination screening, we applied a selection criterion requiring each candidate to achieve R 2 > 0.8 on both Data-Val and Data-Test1. Models satisfying this threshold were then ranked by stability and accuracy, yielding six top-performing configurations summarized in Table 1.
To further verify robustness under varying environmental conditions (e.g., outdoor variability and potential covariate shift), we subsequently evaluated the shortlisted models on Data-Test2 as an out-of-distribution (OOD) test set. This additional probe is designed to confirm whether gains observed on Data-Val and Data-Test1 translate to stronger generalization beyond the training regime.
Even though all models performed remarkably well on Data-Test 1 , their accuracy consistently decreased when evaluated on Data-Test 2 , with several models even dropping below an R 2 value of 0.8. This decline indicates that the outdoor dataset contains data points outside the training range, exposing limitations in each model’s ability to extrapolate under unseen environmental conditions. Among the tested algorithms, the Random Forest (RFR) and Multiple Linear Regression (MLR) models exhibited the lowest performance, with R 2 values falling below 0.7 on Data-Test 2 , confirming their restricted capability to capture nonlinear and cross-dependent environmental effects. Support Vector Regression (SVR) and CatBoost Regression (CBR) demonstrated noticeable improvements compared to the baseline models; however, CatBoost still experienced a moderate degradation under outdoor conditions, while SVR remained less stable overall, with R 2 values only slightly above 0.7 on Data-Test 2 . As illustrated in Figure 5, CatBoost, Elastic Net, and MLP were selected as the top three models based on their combined performance on Data-Test 1 and Data-Test 2 , and represent the overall best performers, displaying the highest predictive accuracy across multiple environments.
The CatBoost model was trained using a refined configuration with parameters: depth = 8, learning rate = 0.055, iterations = 4000, subsample = 0.8, L2 leaf regularization = 3.0, border_count = 254, and grow_policy = Lossguide. This gradient-boosting ensemble effectively captured nonlinear dependencies among the sensor features. The resulting model achieved an R 2 of approximately 0.9304 on Data-Test 1 and 0.7384 on Data-Test 2 .
The Elastic Net regression model combined a StandardScaler preprocessor with ElasticNetCV regularization, using parameters: L 1 _ratio = [0.2, 0.4, 0.6, 0.8, 0.95], cross-validation = 5 folds, and a maximum of 5000 iterations. This hybrid regularization approach efficiently balanced bias and variance while maintaining interpretability. It achieved an R 2 of ≈ 0.9194 on Data-Test 1 and ≈ 0.8042 on Data-Test 2 .
The feedforward Multilayer Perceptron (MLP) network consisted of two fully connected hidden layers with 256 and 128 neurons, respectively, each followed by ReLU activation and a dropout layer ( p = 0.12 ). The model was trained using the Adam optimizer with a learning rate of 8 × 10 4 , weight decay of 3 × 10 4 , batch size = 256, and an early-stopping patience of 35 over a maximum of 550 epochs. The MLP achieved excellent predictive accuracy, with R 2 0.9480 on Data-Test 1 and R 2 0.9076 on Data-Test 2 , demonstrating strong generalization under variable environmental conditions.
Based on the comparative results, the top two models Elastic Net and MLP were selected for further validation, as they exhibited strong and consistent performance across both datasets. In contrast, the CatBoost model achieved an R 2 below 0.8 on the outdoor dataset, indicating insufficient generalization under real-world conditions; therefore, it was excluded from subsequent analysis. The following section focuses on validating and evaluating the performance of the selected models under various environmental scenarios.
According to the U.S. EPA’s performance recommendations for air sensors, it is essential to ensure consistent behavior across all sensing units. To verify the model’s general applicability, we validated its performance using four AIMNet CH 4 sensors fabricated under identical configurations. Each sensor was tested following the same experimental procedure illustrated in the block diagram and was connected to the LI-COR reference analyzer for accuracy assessment. The resulting RMSE values were used to quantify the prediction error of each sensor.
As shown in Figure 6, the MLP model outputs indicate that the four sensors exhibit slightly different absolute readings but maintain highly consistent response trends. The overall prediction errors for all units remain within 1 ppm, confirming excellent model stability and sensor-to-sensor reproducibility. The Elastic Net regression model showed slightly higher deviations, with overall RMSE values remaining within 1.5 ppm, but still demonstrated consistent performance across all sensors; therefore, only the MLP results are presented for brevity.
These results confirm that the proposed calibration framework can be reliably applied to multiple AIMNet units, ensuring stable and transferable performance in field deployments.

3.2. Outdoor Validation Result

To validate the performance of the developed CH 4 sensor, a series of outdoor experiments were conducted under real environmental conditions. The device was deployed on the balcony outside our laboratory, as shown in Figure 7. To obtain accurate reference measurements, the sensor output was continuously connected to the input of a LI-COR reference analyzer.
To evaluate the sensor’s capability to detect CH 4 emissions at different distances, portable calibration gas cylinders were used to release CH 4 at multiple positions relative to the device. The test was performed at distances of 0.5 ft, 1 ft, and 1.5 ft, with calibration gas concentrations of 10 ppm, 50 ppm, and 100 ppm. Several environmental scenarios were examined, as summarized in Table 2. The experiments were carried out during spring in Norman, Oklahoma, where daytime temperatures reached approximately 28 °C, while nighttime temperatures dropped to 0–5 °C. Additional tests were conducted on rainy days (high humidity) and during a thunderstorm event characterized by high wind speed and rapidly fluctuating pressure.
From the results summarized in Table 2, the sensor demonstrated stable and accurate performance under clear weather conditions, regardless of daytime or nighttime testing. The nighttime RMSE increased slightly, likely due to temperatures below 20 °C, outside the model’s original training range, which caused minor variance in the predictions. Under rainy conditions, the Elastic Net model showed greater sensitivity to humidity changes, with the RMSE increasing to 2.73. During the thunderstorm test, the Elastic Net RMSE further increased to 3.91, suggesting that rapid variations in humidity and atmospheric pressure may have affected the sensor readings.
From Figure 8, it can be observed that although the RMSE values are relatively high, both prediction models successfully follow the trend of the reference LI-COR sensor readings. The models are able to capture large CH 4 emission events across all tested distances. The observed decrease in concentration with increasing distance is likely due to CH 4 dilution in ambient air, as the released amount was insufficient to maintain a stable plume. The strong correlation between the predicted and reference readings indicates that the models maintain good overall performance under these dynamic outdoor conditions.
The Elastic Net model exhibited noticeable fluctuations that did not correspond to real CH 4 variations, suggesting potential sensitivity to changes in atmospheric pressure or other environmental factors. In addition, both models showed a small offset relative to the LI-COR reference, implying that humidity effects were not fully compensated. In contrast, the Neural Network (MLP) model demonstrated smoother predictions with fewer spurious fluctuations and better agreement with the LI-COR reference readings.

3.3. Driving Test Result

Since the ultimate goal of this work is to develop an application-ready device for real-world CH 4 monitoring in oil and gas fields, the system must be capable of detecting emissions from considerable distances relative to actual leakage sources. The City of Norman Water Reclamation Facility, located near the University of Oklahoma campus, continuously emits small amounts of CH 4 as a byproduct of its wastewater treatment processes. Therefore, this site was selected as a suitable field-testing location for outdoor validation.
To perform the field measurement, the AIMNet sensor was mounted on the roof of a vehicle, as shown in Figure 9, to capture CH 4 emissions while driving near the facility. Similar to the laboratory and balcony validation tests, a high-precision reference instrument was used to verify measurement accuracy. The LI-COR 7700 open-path CH 4 analyzer, with ppb-level sensitivity (nominal precision of about 5 ppb at ambient CH 4 levels), served as the reference sensor. To ensure both instruments sampled the same air mass, the inlets of the LI-COR and AIMNet systems were positioned in close proximity. Furthermore, to preserve high-frequency measurements without using transmission intervals, we configured the system to upload data once every hour. The field measurements were performed in the early morning, with the vehicle traveling at an average speed of 15 mph around the facility area and approximately 50 mph on the highway located at the upper boundary of the site. For consistency, the pump flow rate of the AIMNet sensor is set to 250 sccm, identical to the flow rate used for the LI-7810 in our previous tests [46].
As shown in Figure 10, which presents the GIS plot of the entire driving test, the wastewater treatment facility is located in the southeast corner of the map. During the test, the prevailing wind direction was toward the southwest, meaning that elevated CH 4 concentrations were expected in that region. Consistent with this expectation, the LI-COR reference readings indicated higher CH 4 levels—reaching approximately 18 ppm around the southwest corner, while concentrations near the facility ranged from 3 to 5 ppm and remained around 2–3 ppm elsewhere, representing the background level of ambient air.
The MLP model successfully captured all major CH 4 peaks that coincided with the southwest corner of each driving path, closely matching the reference data. However, it missed several minor emission peaks in the 5–10 ppm range, which are represented by orange and yellow regions on the map. The Elastic Net regression model exhibited weaker performance, detecting significant peaks only within the facility boundaries while failing to capture elevated readings along the southwest road. Based on the results of the previous balcony test, this degradation between the AIMNet and open-path measurements can likely be attributed to a combination of pressure fluctuations associated with vehicle speed variations and structural differences between the two systems. The open-path analyzer directly samples the passing CH 4 plume, whereas the AIMNet sensor draws air through an enclosed inlet with a relatively low pump flow rate. Under certain wind directions, the CH 4 plume may not be efficiently captured by the AIMNet inlet, so that the open-path sensor records strong enhancements while the AIMNet system measures only background air, leading to a bias between the two instruments. A point-to-point comparison with the reference measurements shows that the RMSE of the Elastic Net model increased to 4.73 ppm, whereas the MLP model exhibited a smaller increase to 2.98 ppm. Overall, the MLP model demonstrated robust performance in detecting large emission events, though both models displayed limitations when operating under high-speed driving conditions, such as on highways.

4. Conclusions

This study presented the design, calibration, and validation of a compact, low-power NDIR-based CH 4 sensing system (AIMNet) for continuous ground-based monitoring in real-world environments such as oil and gas fields and wastewater facilities. The system integrates a Senseair K 96 CH4 sensing module within an environmental-proof, energy-efficient enclosure. Through machine learning-based calibration, particularly using Elastic Net and Multilayer Perceptron (MLP) models, the sensor achieved consistent performance with R 2 > 0.8 across both indoor and outdoor datasets and inter-sensor variations within 1–1.5 ppm. Field and mobile tests verified reliable detection of CH 4 emission events, with the MLP model closely matching LI-COR reference data and accurately identifying concentration peaks under diverse atmospheric conditions. Although minor performance degradation was observed under rapid airflow, the system maintained robust operation across multiple environments.
Future work will focus on extending the training dataset to cover broader temperature, humidity, and pressure ranges, enabling improved generalization under varying conditions. Additional developments will explore model structure optimization, adaptive recalibration for long-term drift correction, and integration of lightweight edge inference for on-device model execution(e.g., TensorFlow Lite or TinyML). Furthermore, future iterations will enhance network scalability through low-power wide-area communication (e.g., LoRaWAN or mesh networking) and implement cloud-based analytics for autonomous, distributed CH4 monitoring and source localization [47].

Author Contributions

Conceptualization, Y.Y. and B.W.; methodology, Y.Y., L.M., T.B., A.H., B.M., and B.W.; software, Y.Y., L.M., and T.B.; validation, Y.Y., L.M., and B.M.; formal analysis, Y.Y.; investigation, Y.Y., L.M., T.B., A.H., and B.M.; resources, B.W.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.Y., L.M., T.B., A.H., B.M., and B.W.; visualization, Y.Y.; supervision, B.W.; project administration, B.W.; funding acquisition, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the University of Oklahoma’s “Big Idea Challenge” strategic initiative program, and the U.S. Department of Energy’s “Innovative Methane Measurement, Monitoring, and Mitigation Technologies” (iM4) initiative under award numbers DE-FE0032285 and DE-FE0032292.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

During the preparation of this manuscript, the author used ChatGPT (GPT-5, OpenAI, 2025 version) for language editing and figure caption refinement. The author has carefully reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Forster, P.; Storelvmo, T.; Armour, K.; Collins, W.; Dufresne, J.-L.; Frame, D.; Lunt, D.J.; Mauritsen, T.; Palmer, M.D.; Watanabe, M.; et al. The Earth’s Energy Budget, Climate Feedbacks and Climate Sensitivity. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S.L., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M.I., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2021; pp. 923–1054. [Google Scholar] [CrossRef]
  2. Etminan, M.; Myhre, G.; Highwood, E.J.; Shine, K.P. Radiative forcing of carbon dioxide, methane, and nitrous oxide: A significant revision of the methane radiative forcing. Geophys. Res. Lett. 2016, 43, 12614–12623. [Google Scholar] [CrossRef]
  3. Shindell, D.T.; Faluvegi, G.; Koch, D.M.; Schmidt, G.A.; Unger, N.; Bauer, S.E. Improved attribution of climate forcing to emissions. Science 2009, 326, 716–718. [Google Scholar] [CrossRef]
  4. Nisbet, E.G.; Fisher, R.E.; Lowry, D.; France, J.L.; Allen, G.; Bakkaloglu, S.; Broderick, T.J.; Cain, M.; Coleman, M.; Fernandez, J.; et al. Methane mitigation: Methods to reduce emissions, on the path to the Paris agreement. Rev. Geo. 2020, 58, e2019RG000675. [Google Scholar] [CrossRef]
  5. Brandt, A.R.; Heath, G.A.; Cooley, D. Methane leaks from natural gas systems follow extreme distributions. Environ. Sci. Technol. 2016, 50, 12512–12520. [Google Scholar] [CrossRef] [PubMed]
  6. Dlugokencky, E.J.; Nisbet, E.G.; Fisher, R.; Lowry, D. Global atmospheric methane: Budget, changes and dangers. Philos. Trans. R. Soc. A 2011, 369, 2058–2072. [Google Scholar] [CrossRef] [PubMed]
  7. Alvarez, R.A.; Zavala-Araiza, D.; Lyon, D.R.; Allen, D.T.; Barkley, Z.R.; Brandt, A.R.; Davis, K.J.; Herndon, S.C.; Jacob, D.J.; Karion, A.; et al. Assessment of methane emissions from the US oil and gas supply chain. Science 2018, 361, 186–188. [Google Scholar] [CrossRef]
  8. Schwietzke, S.; Pétron, G.; Conley, S.; Pickering, C.; Mielke-Maday, I.; Dlugokencky, E.J.; Tans, P.P.; Vaughn, T.; Bell, C.; Zimmerle, D.; et al. Improved mechanistic understanding of natural gas methane emissions from spatially resolved aircraft measurements. Environ. Sci. Technol. 2017, 51, 7286–7294. [Google Scholar] [CrossRef] [PubMed]
  9. Veefkind, J.P.; Aben, I.; McMullan, K.; Förster, H.; De Vries, J.; Otter, G.; Claas, J.; Eskes, H.J.; De Haan, J.F.; Kleipool, Q.; et al. TROPOMI on the ESA Sentinel-5 Precursor: A GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications. Remote Sens. Environ. 2012, 120, 70–83. [Google Scholar] [CrossRef]
  10. Jacob, D.J.; Turner, A.J.; Maasakkers, J.D.; Sheng, J.; Sun, K.; Liu, X.; Chance, K.; Aben, I.; McKeever, J.; Frankenberg, C. Satellite observations of atmospheric methane and their value for quantifying methane emissions. Atmos. Chem. Phys. 2016, 16, 14371–14396. [Google Scholar] [CrossRef]
  11. Saunois, M.; Stavert, A.R.; Poulter, B.; Bousquet, P.; Canadell, J.G.; Jackson, R.B.; Raymond, P.A.; Dlugokencky, E.J.; Houweling, S.; Patra, P.K.; et al. The global methane budget 2000–2017. Earth Syst. Sci. Data 2020, 12, 1561–1623. [Google Scholar] [CrossRef]
  12. Irakulis-Loitxate, I.; Guanter, L.; Liu, Y.N.; Varon, D.J.; Maasakkers, J.D.; Zhang, Y.; Chulakadabba, A.; Wofsy, S.C.; Thorpe, A.K.; Duren, R.M.; et al. Satellite-based survey of extreme methane emissions in the Permian basin. Sci. Adv. 2021, 7, eabf4507. [Google Scholar] [CrossRef]
  13. Thorpe, A.K.; Frankenberg, C.; Roberts, D.A. Retrieval techniques for airborne imaging of methane concentrations using high spatial and moderate spectral resolution: Application to AVIRIS. Atmos. Meas. Tech. 2014, 7, 491–506. [Google Scholar] [CrossRef]
  14. Frankenberg, C.; Thorpe, A.K.; Thompson, D.R.; Hulley, G.; Kort, E.A.; Vance, N.; Borchardt, J.; Krings, T.; Gerilowski, K.; Sweeney, C.; et al. Airborne methane remote measurements reveal heavy-tail flux distribution in Four Corners region. Proc. Natl. Acad. Sci. USA 2016, 113, 9734–9739. [Google Scholar] [CrossRef]
  15. Allen, D.T.; Torres, V.M.; Thomas, J.; Sullivan, D.W.; Harrison, M.; Hendler, A.; Herndon, S.C.; Kolb, C.E.; Fraser, M.P.; Hill, A.D.; et al. Measurements of methane emissions at natural gas production sites in the United States. Proc. Natl. Acad. Sci. USA 2013, 110, 17768–17773. [Google Scholar] [CrossRef]
  16. Zimmerle, D.J.; Williams, L.L.; Vaughn, T.L.; Quinn, C.; Subramanian, R.; Duggan, G.P.; Willson, B.; Opsomer, J.D.; Marchese, A.J.; Martinez, D.M.; et al. Methane emissions from the natural gas transmission and storage system in the United States. Environ. Sci. Technol. 2015, 49, 9374–9383. [Google Scholar] [CrossRef] [PubMed]
  17. McKain, K.; Down, A.; Raciti, S.M.; Budney, J.; Hutyra, L.R.; Floerchinger, C.; Herndon, S.C.; Nehrkorn, T.; Zahniser, M.S.; Jackson, R.B.; et al. Methane emissions from natural gas infrastructure and use in the urban region of Boston, Massachusetts. Proc. Natl. Acad. Sci. USA 2015, 112, 1941–1946. [Google Scholar] [CrossRef]
  18. Brandt, A.R.; Heath, G.A.; Kort, E.A.; O’Sullivan, F.; Pétron, G.; Jordaan, S.M.; Tans, P.; Wilcox, J.; Gopstein, A.M.; Arent, D.; et al. Methane leaks from North American natural gas systems. Science 2014, 343, 733–735. [Google Scholar] [CrossRef]
  19. Rella, C.W.; Tsai, T.R.; Botkin, C.G.; Crosson, E.R.; Steele, D. Measuring emissions from oil and natural gas well pads using the mobile flux plane technique. Environ. Sci. Technol. 2015, 49, 4742–4748. [Google Scholar] [CrossRef]
  20. Aldhafeeri, T.; Tran, M.K.; Vrolyk, R.; Pope, M.; Fowler, M. A review of methane gas detection sensors: Recent developments and future perspectives. Inventions 2020, 5, 28. [Google Scholar] [CrossRef]
  21. Gardner, E.L.W.; Gardner, J.W.; Udrea, F. Micromachined thermal gas sensors—A review. Sensors 2023, 23, 681. [Google Scholar] [CrossRef] [PubMed]
  22. Hodgkinson, J.; Tatam, R.P. Optical gas sensing: A review. Meas. Sci. Technol. 2012, 24, 012004. [Google Scholar] [CrossRef]
  23. Dinh, T.V.; Choi, I.Y.; Son, Y.S.; Kim, J.C. A review on non-dispersive infrared gas sensors: Improvement of sensor detection limit and interference correction. Sens. Actuators B 2016, 231, 529–538. [Google Scholar] [CrossRef]
  24. Ye, W.; Tu, Z.; Xiao, X.; Simeone, A.; Yan, J.; Wu, T.; Wu, F.; Zheng, C.; Tittel, F.K. A NDIR mid-infrared methane sensor with a compact pentahedron gas-cell. Sensors 2020, 20, 5461. [Google Scholar] [CrossRef]
  25. Xia, L.; Liu, Y.; Chen, R.T.; Weng, B.; Zou, Y. Advancements in miniaturized infrared spectroscopic-based volatile organic compound sensors: A systematic review. Appl. Phys. Rev. 2024, 11, 031306. [Google Scholar] [CrossRef]
  26. Weng, B. The road to climate change mitigation via methane emissions monitoring. Nat. Rev. Electr. Eng. 2024, 1, 69–70. [Google Scholar] [CrossRef]
  27. Zhou, Z.; Wang, X.; Yan, Y.; Mijiddorj, L.; Ding, Y.; Beringer, T.; Masnadi Khiabani, P.; Jentner, W.G.; Hu, X.-M.; Wang, C.; et al. AIMNET: An IoT-Empowered Digital Twin for Continuous Gas Emission Monitoring and Early Hazard Detection. arXiv 2025, arXiv:2512.06148. [Google Scholar]
  28. Yan, Y. Development of a Machine-Learning Enhanced High Performance Methane Sensing Instrument for Field Applications. Master’s Thesis, University of Oklahoma, Graduate College, Norman, OK, USA, 2025. [Google Scholar]
  29. Swinehart, D.F. The Beer-Lambert law. J. Chem. Edu. 1962, 39, 333. [Google Scholar] [CrossRef]
  30. Rothman, L.S.; Gordon, I.E.; Babikov, Y.; Barbe, A.; Benner, D.C.; Bernath, P.F.; Birk, M.; Bizzocchi, L.; Boudon, V.; Brown, L.R.; et al. The HITRAN2012 molecular spectroscopic database. J. Qual. Spec. Rad. Trans. 2013, 130, 4–50. [Google Scholar] [CrossRef]
  31. Furuta, D.; Sayahi, T.; Li, J.; Wilson, B.; Presto, A.A.; Li, J. Characterization of inexpensive metal oxide sensor performance for trace methane detection. Atmos. Meas. Tech. 2022, 15, 5117–5128. [Google Scholar] [CrossRef]
  32. Andrews, B.; Chakrabarti, A.; Dauphin, M.; Speck, A. Application of machine learning for calibrating gas sensors for methane emissions monitoring. Sensors 2023, 23, 9898. [Google Scholar] [CrossRef]
  33. Dubey, R.; Telles, A.; Nikkel, J.; Cao, C.; Gewirtzman, J.; Raymond, P.A.; Lee, X. Low-cost CO2 NDIR sensors: Performance evaluation and calibration using machine learning techniques. Sensors 2024, 24, 5675. [Google Scholar] [CrossRef]
  34. US EPA. Performance Testing Protocols, Metrics, and Target Values for Fine Particulate Matter Air Sensors: Use in Ambient, Outdoor, Fixed Site, Non-Regulatory Supplemental and Informational Monitoring Applications. Technical Report, U.S. Environmental Protection Agency, Office of Research and Development, 2021. Available online: https://cfpub.epa.gov/si/si_public_record_Report.cfm?dirEntryId=350785&Lab=CEMM (accessed on 17 December 2021).
  35. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 6th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  36. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  37. Han, P.; Mei, H.; Liu, D.; Zeng, N.; Tang, X.; Wang, Y.; Pan, Y. Calibrations of low-cost air pollution monitoring sensors for CO, NO2, O3, and SO2. Sensors 2021, 21, 256. [Google Scholar] [CrossRef] [PubMed]
  38. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. Royal Stat. Soc. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  39. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  40. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
  41. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
  42. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  43. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  44. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  45. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 10 December 2025).
  46. Hu, X.-M.; Ding, Y.; Yan, Y.; Wang, C.; Weng, B.; Hardeman, S.; Xue, M. Multi-Sensor and Multi-Model Investigation of Methane Plumes from a Wastewater Treatment Plant to Improve Emission Inversion during the Morning Boundary Layer Transition. In Proceedings of the 106th AMS Annual Meeting, Houston, TX, USA, 25–29 January 2026; American Meteorological Society: Houston, TX, USA, 2026. Available online: https://ams.confex.com/ams/106ANNUAL/meetingapp.cgi/Paper/473678 (accessed on 11 December 2025).
  47. Hu, X.-M.; Honeycutt, W.T.; Wang, C.; Weng, B.; Zhou, B.; Xue, M. Observation and simulation of methane plumes during the morning boundary layer transition. J. Geophys. Res. Atmos. 2025, 130, e2024JD042317. [Google Scholar] [CrossRef]
Figure 1. Senseair K96 NDIR CH 4 sensor integrated in the self-designed AIMNet field device for continuous outdoor monitoring.
Figure 1. Senseair K96 NDIR CH 4 sensor integrated in the self-designed AIMNet field device for continuous outdoor monitoring.
Sensors 25 07691 g001
Figure 2. Field test data from the integrated CH 4 sensing system. The upper panel shows temperature, humidity, and pressure recorded by the BME280 sensor, and the lower panel displays the raw infrared absorption signals of H2O, CH 4 , and CO2 measured by the K96 sensor.
Figure 2. Field test data from the integrated CH 4 sensing system. The upper panel shows temperature, humidity, and pressure recorded by the BME280 sensor, and the lower panel displays the raw infrared absorption signals of H2O, CH 4 , and CO2 measured by the K96 sensor.
Sensors 25 07691 g002
Figure 3. Custom-built environmental chamber controlling CH 4 concentration, relative humidity, and temperature for NDIR sensor calibration experiments.
Figure 3. Custom-built environmental chamber controlling CH 4 concentration, relative humidity, and temperature for NDIR sensor calibration experiments.
Sensors 25 07691 g003
Figure 4. Overview of calibration dataset covering variations in humidity, temperature, and CH 4 concentration.
Figure 4. Overview of calibration dataset covering variations in humidity, temperature, and CH 4 concentration.
Sensors 25 07691 g004
Figure 5. Training results comparison among the top three models: CatBoost (left), Elastic Net Regression (middle), and Multilayer Perceptron (MLP) Neural Network (right).
Figure 5. Training results comparison among the top three models: CatBoost (left), Elastic Net Regression (middle), and Multilayer Perceptron (MLP) Neural Network (right).
Sensors 25 07691 g005
Figure 6. Boxplot of CH 4 prediction residuals from four identical AIMNet sensors using the MLP model. Each box shows the distribution of prediction errors relative to the LI-COR reference measurements.
Figure 6. Boxplot of CH 4 prediction residuals from four identical AIMNet sensors using the MLP model. Each box shows the distribution of prediction errors relative to the LI-COR reference measurements.
Sensors 25 07691 g006
Figure 7. Balcony Validation Test of the AIMNet Sensor Against the LI-COR 7810 at Three Separation Distances (0.5 ft, 1 ft, and 1.5 ft).
Figure 7. Balcony Validation Test of the AIMNet Sensor Against the LI-COR 7810 at Three Separation Distances (0.5 ft, 1 ft, and 1.5 ft).
Sensors 25 07691 g007
Figure 8. Comparison of CH 4 concentration readings from the developed sensor and the LI-COR reference instrument under thunderstorm conditions. The plot shows predicted CH 4 concentrations from the Elastic Net and MLP models alongside reference measurements, illustrating the sensor response to varying emission levels and environmental fluctuations.
Figure 8. Comparison of CH 4 concentration readings from the developed sensor and the LI-COR reference instrument under thunderstorm conditions. The plot shows predicted CH 4 concentrations from the Elastic Net and MLP models alongside reference measurements, illustrating the sensor response to varying emission levels and environmental fluctuations.
Sensors 25 07691 g008
Figure 9. Driving Test Setup: The left panel shows the AIMNet sensor mounted on the mobile test platform, while the right panel displays the LI-COR 7700 reference instrument used for performance comparison.
Figure 9. Driving Test Setup: The left panel shows the AIMNet sensor mounted on the mobile test platform, while the right panel displays the LI-COR 7700 reference instrument used for performance comparison.
Sensors 25 07691 g009
Figure 10. Comparison of CH 4 concentration distributions from GIS plots for the Elastic Net model, the MLP Neural Network, and the LI-COR reference instrument during the mobile driving test.
Figure 10. Comparison of CH 4 concentration distributions from GIS plots for the Elastic Net model, the MLP Neural Network, and the LI-COR reference instrument during the mobile driving test.
Sensors 25 07691 g010
Table 1. Comparison of model performance in terms of R 2 score for validation and test datasets.
Table 1. Comparison of model performance in terms of R 2 score for validation and test datasets.
ModelData-Val ( R 2 ) Data-Test 1 ( R 2 )
Multiple Linear Regression (MLR)0.9120.805
Elastic Net Regression (ENR)0.9210.919
Support Vector Regression (SVR)0.9350.887
Random Forest Regression (RFR)0.9480.942
CatBoost Regression (CBR)0.9540.930
Multilayer Perceptron (MLP)0.9570.948
Table 2. Comparison of model performance in terms of RMSE for Balcony Test.
Table 2. Comparison of model performance in terms of RMSE for Balcony Test.
SituationElastic Net (RSME)Multilayer Perceptron (RMSE)
Sunny Day (noon)1.24 ppm0.92 ppm
Night1.44 ppm1.57 ppm
Raining Day2.73 ppm1.17 ppm
Thunderstorm3.91 ppm1.55 ppm
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, Y.; Mijiddorj, L.; Beringer, T.; Mijiddorj, B.; Ho, A.; Weng, B. Machine Learning-Enhanced NDIR Methane Sensing Solution for Robust Outdoor Continuous Monitoring Applications. Sensors 2025, 25, 7691. https://doi.org/10.3390/s25247691

AMA Style

Yan Y, Mijiddorj L, Beringer T, Mijiddorj B, Ho A, Weng B. Machine Learning-Enhanced NDIR Methane Sensing Solution for Robust Outdoor Continuous Monitoring Applications. Sensors. 2025; 25(24):7691. https://doi.org/10.3390/s25247691

Chicago/Turabian Style

Yan, Yang, Lkhanaajav Mijiddorj, Tyler Beringer, Bilguunzaya Mijiddorj, Alex Ho, and Binbin Weng. 2025. "Machine Learning-Enhanced NDIR Methane Sensing Solution for Robust Outdoor Continuous Monitoring Applications" Sensors 25, no. 24: 7691. https://doi.org/10.3390/s25247691

APA Style

Yan, Y., Mijiddorj, L., Beringer, T., Mijiddorj, B., Ho, A., & Weng, B. (2025). Machine Learning-Enhanced NDIR Methane Sensing Solution for Robust Outdoor Continuous Monitoring Applications. Sensors, 25(24), 7691. https://doi.org/10.3390/s25247691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop