1. Introduction
Methane (CH
4) is a potent greenhouse gas with a global warming potential significantly higher than that of carbon dioxide over a 20-year period, impacting the greenhouse effect approximately 100 times more intensely [
1]. The rapid increase in atmospheric methane concentrations has become a critical concern for climate change mitigation efforts [
2]. Among various anthropogenic sources, the dairy industry significantly contributes to methane emissions, primarily through enteric fermentation in cattle and manure management practices [
3]. Methane emissions related to the dairy industry have increased fourfold over the past 130 years [
4], highlighting the urgent need for effective mitigation strategies [
5].
In Canada, the dairy industry plays a crucial role in the economy but also represents a substantial source of methane emissions. Benchmarking methane emissions in the Canadian dairy industry is essential for establishing accurate baselines, setting realistic emission reduction targets, and tracking progress toward sustainability goals. Accurate benchmarking enables the identification of best practices across different farms and regions, facilitating the implementation of effective mitigation strategies [
6]. Without clear benchmarks, efforts to reduce emissions may lack direction and efficacy.
Collecting accurate methane emission data across multiple locations presents significant challenges. Methane concentrations can vary widely due to factors such as wind direction, atmospheric conditions, and local emission sources. Traditional ground-based measurement methods are often limited in scope and resource-intensive and may not effectively capture the spatial variability of methane emissions [
7]. To overcome these challenges, innovative approaches that provide comprehensive and accurate methane emission data across large geographical areas are needed.
Satellite remote sensing offers a promising solution to these challenges. Instruments like the Sentinel-5P satellite provide unprecedented spatial coverage and temporal resolution, allowing for continuous, large-scale monitoring of atmospheric methane concentrations [
7,
8]. This approach captures variations that might be missed by conventional methods, enabling more accurate assessments of emission sources and trends [
9]. By leveraging satellite observations, researchers can obtain consistent and frequent measurements, enhancing the understanding of methane emission patterns.
Integrating satellite data with advanced data analysis and machine learning techniques introduces powerful predictive capabilities. Machine learning algorithms can process vast amounts of complex data, identifying patterns and trends that may not be apparent through traditional statistical analyses [
10]. In particular, Long Short-Term Memory (LSTM) networks are well suited for modeling time-series data and capturing temporal dependencies [
11]. The LSTM model’s ability to account for long-term dependencies is valuable in understanding the complex, seasonally variable nature of dairy industry emissions [
12]. By incorporating multiple variables—such as seasonal factors, yearly trends, and other environmental variables—the model enhances the precision of forecasts, which is crucial for developing effective mitigation strategies.
Despite the potential benefits, the application of machine learning to methane emission benchmarking in the dairy industry remains relatively unexplored. Previous studies have demonstrated the potential of machine learning in agricultural applications, including crop yield prediction, livestock monitoring, and environmental impact assessments [
13,
14,
15]. However, there is a gap in the literature regarding the integration of satellite data and machine learning for methane emission benchmarking specifically in the dairy sector.
This study aims to fill this gap by proposing the use of satellite data combined with data analysis and machine learning techniques to establish a methane concentration baseline and predict future trends in the Canadian dairy industry. By combining Sentinel-5P satellite data with LSTM models, we aim to observe the overall trend of methane concentrations and contribute to reducing future increases in emissions. This innovative approach not only enhances the accuracy and comprehensiveness of methane emission benchmarking but also provides a scalable, cost-effective solution for ongoing monitoring.
The main contributions of this paper are as follows:
Satellite Data Collection: We utilized satellite data to collect regional methane concentration data over eight years at sample dairy farm and dairy processor locations in Canada. This extensive dataset provides a robust foundation for analysis and modeling.
Seasonal Comparison: We compared regional methane concentrations from dairy farms and dairy processors across different seasons. This comparison helps us to understand temporal variations and identify periods with higher emission levels.
Provincial Benchmarking: We built a weekly methane concentration benchmark for dairy farms and dairy processors in each Canadian province for different seasons. This benchmarking allows for regional analysis and the identification of areas requiring targeted interventions.
COVID-19 Impact Analysis: We explored trends and differences in regional methane concentrations from dairy farms and dairy processors before and after the onset of the COVID-19 pandemic. This analysis provides insights into how external factors and disruptions affect emission patterns.
LSTM Model Construction: We constructed an LSTM model for predicting future regional methane concentrations from dairy farms and dairy processors in Canada. The model incorporates multiple features, including seasonal and yearly trends, to enhance prediction accuracy.
2. Related Work
2.1. Satellite Data
Researchers measure atmospheric methane concentrations using satellite sensors [
7]. Sentinel-5P completes one orbit around Earth every 100 min from an altitude of 824 km [
16]. This orbit allows the satellite to partake in consistent daily planet observation and it produces highly consistent data [
17]. Sentinel-5P’s Tropospheric Monitoring Instrument (TROPOMI) can precisely detect air pollutants—like nitrogen dioxide, ozone, methane, carbon monoxide, formaldehyde, sulfur dioxide and aerosols—at a high spatial and spectral resolution [
18]. TROPOMI captures high-resolution images of the Earth’s surface (7 km
2) at a temporal resolution of 1 day of coverage per day using ultraviolet, visible, near-infrared and shortwave infrared imaging [
8]. Hu presented changes to vegetation composition using TROPOMI system data [
9]. Kort also examined regional methane concentrations in California using satellite data [
19]. The researchers observed a significant share of pollution from oil and gas industry emissions. The researchers suggested that local governments should address the negative environmental impact of industrial emissions.
Researchers used various data processing and analysis techniques to accurately extract regional methane concentration data from satellite images. Jacob used an inversion model to estimate atmospheric methane concentrations using satellite and ground observations [
7]. Balasus observed that integrating machine learning with methane monitoring techniques significantly enhances methane emission estimation [
20]. These advancements, along with the application of machine learning algorithms to closely track emissions data, have improved the precision of methane monitoring. For instance, deep learning techniques applied to TROPOMI data have noticeably accelerated processing and increased accuracy in assessing methane concentration changes. This shows the effectiveness of the techniques for large-scale monitoring.
2.2. COVID-19
The COVID-19 pandemic slowed global economic activity and worsened environmental conditions. The effects can be observed in the form of fluctuating levels of regional methane concentrations from dairy farming during the pandemic [
21,
22]. In addition, research has focused on the impact of the pandemic on production in the dairy industry.
Perrin and Martin researched production in dairy farms [
23]. The researchers discovered that numerous dairy producers reduced operations because of uncertain market demand and logistics. Rahman found that regional methane concentration has temporarily declined because of reduced cow numbers and feed consumption [
24]. In 2021, studies revealed different feeding management practices in dairy farms during lockdowns to address labor shortages and unstable feed supplies. These practices indirectly increased regional methane concentration levels [
25]. The dairy industry faced supply chain disruption during the COVID-19 pandemic. Esfahani report similar challenges affecting their dairy businesses and emphasize that logistics and transportation constraints impede the timely distribution of dairy products to the market, leading to dairy farmers having to discard excess milk at disposal sites, resulting in regional methane concentration increases [
26]. Furthermore, Coluccia [
27] pointed out the significant role that logistics and transportation play in the late arrival of many products; consequently, many dairy products do not make it onto store shelves on time and, thus, dairy farmers needing to dispose of them leads to methane emissions from disposal processes at landfill sites. Changes in dairy farm management during the COVID-19 pandemic have had a direct impact on regional methane concentrations [
28]. Acosta examined adjustments made to dairy feeding patterns to lower costs and adapt to market demands; these practices directly impact digestion processes as well as subsequent methane production by cows and regional methane concentration [
29]. Galyean and Hales studied how some farms increased their production efficiency and decreased their emissions through optimized pasture management techniques or by improving feeding techniques, which mitigates the negative effects of the COVID-19 pandemic [
30].
The COVID-19 pandemic significantly impacted regional methane concentration from the dairy industry [
31]. The emissions fluctuated at farm and industry levels because of disruptions in production and supply chains [
32]. Studies reveal the key factors driving regional methane concentration in dairy farms and possible ways to develop effective mitigation strategies.
2.3. LSTM
Researchers increasingly use Long Short-Term Memory (LSTM) networks to understand time-series prediction and detect anomalies. The authors of [
11,
33] observed that researchers mainly use LSTM networks to analyze and predict methane emission data. Researchers initially used traditional statistical approaches such as autoregressive moving average models (ARIMA), Support Vector Machines (SVMs), and Random Forest algorithms for methane emission prediction before applying LSTM networks to regional methane concentration data [
34]. Traditional statistical methods can capture regional methane concentration trends; however, these methods are ineffective when handling complex nonlinear time-series data [
35]. As deep learning technology has advanced, more scholars have researched LSTM networks, which can handle long-term dependence and nonlinear relationships within time-series data [
12]. Furthermore, LSTM networks are superior in forecasting regional methane concentrations and are efficient at processing data in the long run [
12].
Researchers use LSTM models to predict regional methane concentrations. Meng studied the model’s superior performance over an extended period in an industrial zone [
35]. The LSTM model accurately displayed emission trends and periods. Researchers also use LSTM models to detect anomalies in methane emission data [
36]. Pang used an LSTM model to detect anomalies [
37]. The researcher distinctly identified abnormal emissions during industrial processes. Researchers have also constructed an LSTM multivariable model that incorporates various factors—such as meteorological conditions or industrial activities—that affect regional methane concentration [
38]. Huang developed an LSTM multivariable prediction system that combines multiple input variables and gives accurate and reliable data used for predictions [
39].
Despite significant progress in data analysis for regional methane concentration when using LSTM networks, challenges persist [
40]. Training an LSTM model is complex and demands more computing resources [
41]. Hao [
42] indicates that future research should focus on effective ways to integrate external sources—such as meteorological sources— to enhance the prediction performance of the LSTM network model [
43]. Overall, methane concentration data analysis is crucial for environmental monitoring and pollution control. By improving model structures and optimizing algorithms, methane concentrations can be effectively controlled in the future.
3. Methodology
3.1. Data Collection
This study focuses on assessing methane concentrations associated with the Canadian dairy industry. We utilized the geographical coordinates of dairy farms and processors collected by Parma [
44]. From this dataset, we selected 575 dairy farms and 384 dairy processors across Canada. To ensure regional representativeness and enhance the generalizability of our findings to the entire Canadian dairy industry, we employed a stratified random sampling approach. The sampling frame was stratified based on provincial distribution and the density of dairy operations within each province. Provinces were categorized into strata reflecting their level of dairy production, such as high, medium, and low. Samples were allocated proportionally to each stratum based on the number of dairy operations in that province. This proportional allocation ensured that regions with more dairy activity contributed more samples, maintaining the industry’s structural diversity. Within each stratum, dairy farms and processors were randomly selected using random number generators. This process ensured that each operation within a stratum had an equal chance of being included, maintaining randomness and independence in the sampling process.
Figure 1 provides a visual representation of the spatial distribution of dairy operations, highlighting the concentration of facilities in different provinces. Major provinces with significant dairy activity, such as Ontario and Quebec, are prominently featured, demonstrating areas with higher densities of dairy operations.
This stratified random sampling method captured a wide variety of farm sizes and processor capacities, balancing large-scale and small-scale operations. Although the sample is not entirely random due to stratification, careful attention was paid to maintain diversity across geographical locations and production scales, making it broadly representative of the Canadian dairy industry. The geographical coordinates (latitude and longitude) of the selected dairy farms and processors were input into the Google Earth Engine (GEE) to extract methane concentration data from the Sentinel-5P satellite. We extracted the bias-corrected volume mixing ratio of dry air column CH4, which provides accurate and reliable atmospheric methane concentration data after adjusting for systematic measurement errors. Data were collected for an eight-year period from 2016 to 2023, allowing for comprehensive temporal analysis.
The extracted dataset included the following dimensions: ID, date of acquisition, longitude, latitude, name of dairy farm or processor, province, and season (determined based on the official Canadian season corresponding to the date of data collection). While Sentinel-5P provides high-quality atmospheric methane measurements, certain limitations must be acknowledged. The satellite’s spatial resolution of approximately 7 × 7 km2 may introduce spatial averaging errors, limiting the precision in pinpointing emissions at smaller scales like individual farms or processors, especially in areas with complex emission patterns. Adverse weather conditions, cloud cover, and satellite coverage constraints can result in missing data, necessitating careful data preprocessing and interpretation. Retrieval accuracy can be affected by factors such as atmospheric aerosols and surface reflectance, leading to uncertainties quantified through error margins that vary by location and atmospheric conditions. Given these challenges and the impracticality of using Chemical Transport Models (CTMs) due to the large number of locations and resolution constraints, we directly used the methane concentration data from Sentinel-5P as proxies for emissions from the dairy industry, understanding the limitations of this approach.
3.2. Statistical Analysis
3.2.1. Seasonal Analysis
To explore seasonal variations in methane concentrations from dairy farms and processors, we established baseline values for each location to account for differences in latitude, longitude, and surrounding environments. Due to potential missing values from satellite data, such as those caused by cloud cover or weather conditions, we included only available data for each season at each location to avoid biases. For each location, we calculated the average methane concentration for each season—spring, summer, autumn, and winter—based on the official Canadian seasonal calendar. The baseline methane concentration for each location was determined by averaging the seasonal averages. By comparing the average methane concentration of each season with the baseline for that location, we obtained the seasonal deviations. This method allowed us to analyze the concentration patterns of dairy farms and processors across different seasons while mitigating the impact of data imbalance due to missing values.
3.2.2. Provincial Analysis
To assess regional differences across Canadian provinces, we calculated weekly average methane concentrations. Due to missing values in the satellite data, focusing on weekly averages helped reduce data imbalance and provided a more reliable measure of concentration differences. For each province, we aggregated weekly methane concentration data across all dairy farms and processors. Additionally, we separately calculated weekly averages for dairy farms and dairy processors within each province. This approach provided both holistic and specific insights into provincial methane concentration patterns.
3.3. Impact of COVID-19
We investigated the impact of the COVID-19 pandemic on methane concentrations by comparing data before and after 31 December 2019, which we designated as the time point separating the pre-COVID-19 and post-COVID-19 periods. The dataset was divided into pre-pandemic (2018–2019) and post-pandemic (2020–2023) periods. Each year was standardized into 52 or 53 weeks (accounting for leap years), ensuring consistent temporal alignment across years. For each week, we calculated the average methane concentration over the four years in each period for all locations, as well as separately for dairy farms and processors. This approach allowed us to observe trends and differences in regional methane concentrations before and after the onset of the pandemic. By comparing the weekly averages, we identified changes in methane emissions that could be attributed to factors such as alterations in production methods and supply chain disruptions during the pandemic.
3.4. Future Prediction
To predict future methane concentrations, we utilized Long Short-Term Memory (LSTM) neural networks, modeling dairy farms and processors separately. Methane emissions in dairy farms exhibit temporal patterns influenced by environmental factors (e.g., weather conditions, seasons) and operational practices (e.g., manure management, feeding strategies). LSTM networks are capable of learning complex, nonlinear relationships and capturing both short-term fluctuations and long-term trends in time-series data. Unlike traditional recurrent neural networks, LSTMs overcome the vanishing gradient problem, allowing them to retain information over extended periods, which is particularly suitable for our seasonally varying data.
3.5. Model Architecture
The LSTM model was designed to capture temporal dependencies in the methane concentration data. Each input sequence consisted of data from the previous four weeks, including methane concentration (bias-corrected volume mixing ratio of dry air column CH4), a seasonal indicator (categorical variable representing the season), and a yearly trend (continuous variable representing the year).
The network structure included four LSTM layers with decreasing units:
The first LSTM layer had 128 units with a Rectified Linear Unit (ReLU) activation function.
The second layer had 64 units with ReLU activation.
The third layer had 32 units with ReLU activation.
The fourth layer had 8 units with ReLU activation.
Dropout layers with a rate of 20% were applied after each LSTM layer to prevent overfitting by randomly setting 20% of the layer’s output units to zero during training. The final output layer was a fully connected dense layer with a linear activation function to predict the methane concentration for the next week. The depth and width of the network were empirically determined to balance model complexity and computational efficiency. This multi-layered structure allowed the model to learn hierarchical temporal features, capturing both short-term fluctuations and long-term trends in methane emissions.
3.6. Hyperparameter Optimization
Hyperparameter tuning was performed using a combination of grid search and manual adjustments to find the optimal settings for the model. We systematically explored different combinations of hyperparameters and evaluated the model’s performance to identify the optimal configuration. The key hyperparameters optimized included the learning rate (tested values: 0.01, 0.001, 0.0001; optimal value: 0.0001), batch size (tested values: 32, 64, 128; optimal value: 64), and number of epochs (tested values: 50, 100, 150; optimal value: 100 epochs). The Adam optimizer was chosen for its adaptive learning rate capabilities, which help in handling sparse gradients and adjusting the learning rate during training. The Mean Squared Error (MSE) loss function was selected due to its sensitivity to larger errors, which is important in environmental data where extreme values can be significant. The optimization process involved conducting a grid search over the specified hyperparameter values and evaluating model performance using time-series cross-validation. This approach maintained the temporal order of data, ensuring that the model was always validated on future data relative to the training set.
3.7. Training Process
The data were split into training (70%), validation (15%), and test (15%) sets in chronological order to prevent information leakage from future data points. This method preserved the temporal structure of the data, which is crucial for time-series forecasting models. Due to missing values in the satellite data, we adopted a sliding window approach to create training samples while addressing missing values. We first identified data blocks with at least five consecutive weeks of data in the methane concentration series for the same location. These data blocks were then sampled using the sliding window method, where each sample consisted of the first four weeks as input features and the fifth week as the target output. This method increased the amount of data available for training and mitigated the impact of missing values on the model’s performance.
Data scaling was performed using the Min–Max scaler to normalize the input features, facilitating efficient training and convergence of the neural network. The training algorithm employed the Adam optimizer with default parameters (β1 = 0.9, β2 = 0.999), except for the learning rate, which was set to 0.0001 as determined during hyperparameter tuning. The loss function minimized during training was Mean Squared Error (MSE). Regularization techniques were applied to enhance model generalization and prevent overfitting. Dropout layers with a 20% rate were used after each LSTM layer, as previously described. Early stopping was implemented by monitoring the validation loss during training and halting training if the validation loss did not improve for 10 consecutive epochs. This prevented the model from overfitting the training data by halting training when performance on unseen data ceased to improve.
3.8. Model Validation
We employed time-series cross-validation to validate the model, which is essential for preserving the temporal dependencies in the data. In each fold of the cross-validation, the model was trained on a sequence of data and validated on a subsequent sequence, ensuring that the model was always tested on data that came after the training data in time, preserving temporal causality. Data augmentation was performed using the sliding window approach, as previously mentioned, to create multiple samples from the time-series data. Each input sequence consisted of methane concentrations and features from the previous four weeks to predict the concentration in the next week. This method increased the dataset size and improved the model’s ability to learn temporal patterns. The final evaluation was conducted on the test set, which was entirely unseen during training and validation. This provided an unbiased assessment of the model’s predictive performance on new data.
3.9. Performance Metrics
To evaluate the model’s predictive accuracy, we used the following performance metrics:
Root Mean Squared Error (RMSE):
RMSE measures the average magnitude of the prediction errors, providing insight into the model’s accuracy in the same units as the target variable (parts per billion of methane concentration).
Mean Absolute Error (MAE):
MAE offers a linear score that is easy to interpret, representing the average absolute difference between predicted and actual values.
R-squared (Coefficient of Determination):
R-squared indicates the proportion of variance in the dependent variable that is predictable from the independent variables, offering a normalized measure of the model’s explanatory power. RMSE and MAE were chosen because they provide direct insight into the model’s prediction accuracy in the context of methane concentrations, where precise measurements are critical for environmental monitoring. R-squared provides a relative measure of how well the observed outcomes are replicated by the model.
3.10. Robustness and Reliability
To ensure the robustness and reliability of our model, we conducted multiple training and evaluation runs. The model was trained and evaluated over 10 iterations, and performance metrics were averaged to assess stability. This process helped to mitigate the effects of random initialization and stochastic training processes. We performed sensitivity analysis by testing the model’s performance with variations in hyperparameters. This assessed the robustness of the chosen settings and ensured that the model’s performance was not unduly sensitive to specific hyperparameter values. Although we explored ensemble methods to potentially improve performance, we found that combining multiple LSTM models did not significantly enhance predictive accuracy. Therefore, we proceeded with the single, optimized LSTM model.
3.11. Comparison with Baseline Models
To justify the choice of the LSTM model and address the suggestion by Reviewer 3 to consider other machine learning algorithms, we compared its performance with simpler baseline models. We evaluated the Autoregressive Integrated Moving Average (ARIMA) model and the Prophet model. The ARIMA model is a traditional statistical model for time-series forecasting, suitable for univariate data with trends and seasonality. We optimized the ARIMA parameters (p, d, q) using the Akaike Information Criterion (AIC) to find the best-fitting model. The Prophet model, developed by Facebook, is designed for forecasting time-series data with multiple seasonality and is robust to missing data and shifts in the trend. Our results demonstrated that the LSTM model outperformed both the ARIMA and Prophet models in all performance metrics. The LSTM’s superior ability to capture complex temporal dependencies and nonlinear patterns inherent in methane concentration data accounts for this improved performance. Traditional models were less effective due to their limitations in handling the nonlinearities and multiple influencing factors present in our dataset.
4. Results
4.1. Observed Trends in Methane Emissions Across Canadian Dairy Operations
4.1.1. Seasonal Variations in Methane Emission Patterns
The highest level of methane emission is found in winter (
Figure 2), while the other three seasons are all lower than the local baseline, among which the methane concentration in summer and spring is slightly lower than the baseline, while the methane concentration in winter is the highest. If looking at dairy farms and dairy processors separately, the dairy processors’ regional methane concentration is above the baseline and higher than dairy farms’ regional methane concentration in the summer, while the dairy processors’ regional methane concentration is lower than the farms’ regional methane concentration in the other three seasons. While seasons influence regional methane concentration from both dairy farms and dairy processors, the effect is slightly greater on dairy processors (
Figure 3) than on dairy farms. According to Bell [
45], seasonal effects on regional methane concentration from dairy farms and dairy processors can be divided into many aspects, and the reasons for the different levels of regional methane concentration from two Canadian dairy farms and dairy processors in different seasons are described below. The first is the difference in the way the farm treats excrement; while enteric fermentation is the primary source of methane emissions on dairy farms, manure management also contributes significantly. According to Crill [
46], the increase in manure diffusion in autumn may release a large amount of methane, and there is a decrease in methane emission in winter due to the cold temperature, so the microbial activity in manure and soil during cold weather will decrease, resulting in a decrease in emissions [
47]. Differences in cow management patterns for different seasons can also affect regional methane concentration, as cows often switch from grazing to stored feed when they are outdoors in the summer compared to feed stored indoors, and different feeds are fermented differently in the stomach [
48,
49]. While cows are usually housed indoors during the winter and they only go outside to exercise, regional methane concentrations may be better managed and captured in a controlled environment [
50].
4.1.2. Regional Disparities in Methane Emissions by Province
The province of ON has the highest regional methane concentration from dairy farms and dairy processors, while NB has the lowest (
Figure 4). Ontario, Quebec, and British Columbia have a higher density of dairy farms compared to other provinces, resulting in an increase in total emissions. Climate conditions such as temperature and humidity in different provinces will affect microbial activities in manure and affect regional methane concentration [
51]. Differences in manure management practices, such as how manure (cow manure) is stored and disposed of, can lead to different levels of regional methane concentration, such as the presence of smaller farms and fewer cows in Atlantic Canada; provinces with smaller or fewer dairy processors are less likely to accumulate and release methane, especially New Brunswick [
52,
53]. Both NL and MB are worth mentioning, where the difference between dairy farms and dairy factories is large, with dairy processors’ concentration ranking higher than the overall ranking and dairy processors’ concentration ranking lower than the overall ranking. This is because the industrial structure of the two provinces is related, their factory production processes and product types are less prone to methane release, and the distribution of farms is more concentrated [
54,
55].
In
Figure 5, the ranking of regional methane concentration in each province is the same as that in the summer, which may be because summer emissions are closer to the average emissions. Similarly, we can see in the fall methane data that the fall regional methane concentrations have the same trend as a whole, except that NL province is very noteworthy: its average weekly emissions in autumn are significantly lower than other provinces but significantly higher than other provinces in winter, which has a lot to do with the geographical location of NL province. Compared with other provinces, the latitude of NL province is significantly higher than other latitudes, which leads to different dairy farm and dairy processor management strategies in the same season [
56,
57].
Figure 6 illustrates the correlation between areas of high dairy activity and elevated atmospheric methane concentrations, providing insights into regional emission patterns.
Figure 7 demonstrates the spatial association between concentrated dairy operations and observed methane plumes, emphasizing the impact of the dairy industry on regional methane emissions.
4.1.3. COVID-19 Impact
It can be seen from
Figure 8 that the overall regional methane concentration of the COVID-19 period and the regional methane concentration of dairy farms and dairy processors, respectively, are significantly lower than those after COVID-19. This may be because many dairy farmers and processors may have adopted more intensive production methods during the pandemic in response to labor shortages and changes in market demand [
58]. This type of intensive production often leads to more cattle being raised intensively, which increases regional methane concentration [
59]. And supply chain disruptions caused by COVID-19 may have affected the availability of feed and other essential resources, forcing farms to adopt alternative feeding methods or feeds that could lead to more methane production by livestock [
3,
60]. During and after the pandemic, instability in market demand may have led to a surplus of milk products, resulting in more milk products being wasted or mishandled, increasing regional methane concentrations in waste management [
61]. Due to fluctuating market demand, dairy farmers and dairy processors may have to deal with more waste milk and by-products [
62]. If this waste is not properly disposed of (such as inadequate composting or improper storage), it can lead to more methane release [
63].
Looking at
Figure 8, although there were large differences in regional methane concentrations from dairy farms and dairy factories before and after COVID-19, the overall trend of emissions from dairy farms and dairy processors and the whole population did not change over the course of the year. This indicates that the seasonal withdrawal activities of dairy agriculture and the dairy industry have not changed from before to after COVID-19. Mid-year and year-end emissions increase, and these peaks correspond to specific agricultural practices, such as harvest time and increased farm activities during harvest, such as tilling and manure spreading (cow manure is spread on the soil in summer rather than winter), which leads to increased regional methane concentration [
64]. Our results suggest that policies targeting manure management practices during winter months could significantly reduce methane emissions in high-density dairy regions like Ontario.
4.1.4. LSTM Future Prediction
At the very beginning, the LSTM model was simply modeled (
Figure 9) by using a feature called bias-corrected volume mixing ratio of dry air column CH4, and the RMSE of the final model was 18.9. After adding more features (season and year), the RMSE growth of the model was 14.8, which reflects the importance of season and year in predicting methane concentration models and confirms the huge impact of these two features on methane concentration from the side. Forecasts from the LSTM model for farms and processors can effectively capture concentration trends, can be used to plan and mitigate regional methane concentrations from dairy farms and processors, and can be used to anticipate and implement proactive measures before emissions rise; for example, farms and processors can use forecasts to proactively implement methane reduction practices during expected methane emission peaks.
The findings of this study have significant implications for sustainability policies and practices in the Canadian dairy industry. By providing a comprehensive benchmark of methane emissions using satellite data and machine learning, policymakers can set more precise and achievable emission reduction targets. This approach enables the identification of high-emission areas and periods, allowing for targeted interventions that can optimize resource allocation and enhance the effectiveness of mitigation strategies. Additionally, the predictive capabilities of the LSTM model offer a proactive tool for anticipating future emission trends, enabling the industry to implement preventative measures and adjust practices accordingly. This not only supports Canada’s climate commitments but also enhances the industry’s competitiveness in a market increasingly focused on sustainability.
5. Limitations and Future Work
Using satellite data to measure atmospheric methane concentrations as a proxy for regional emissions from dairy farms and processors is an efficient and cost-effective method. However, several limitations are associated with this approach that need to be addressed to enhance accuracy and reliability.
Firstly, the spatial resolution of the Sentinel-5P satellite is limited to approximately 7 km × 7 km. At this scale, a single pixel may encompass multiple methane emission sources, including non-dairy activities such as other agricultural practices, industrial operations, or natural emissions from wetlands and vegetation. This spatial aggregation can interfere with accurately attributing methane concentrations solely to dairy operations, leading to potential misinterpretation of the data. Secondly, despite the daily overpass of Sentinel-5P, there are frequent missing values due to cloud cover, aerosols, and adverse weather conditions, which significantly impact data collection and completeness. These gaps in data can hinder the temporal continuity required for accurate trend analysis and model training.
Moreover, satellite-based methane concentration measurements can be influenced by atmospheric conditions such as wind patterns, temperature inversions, and atmospheric mixing processes, which can transport methane away from its emission source before detection. This atmospheric transport can obscure or alter the spatial distribution of methane concentrations, complicating the direct association between observed concentrations and specific emission sources. To address these limitations and improve the accuracy of satellite-based monitoring systems, future studies should consider integrating atmospheric transport models such as the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model or the Stochastic Time-Inverted Lagrangian Transport (STILT) model. These models simulate the dispersion and transport of atmospheric pollutants by calculating air parcel trajectories based on meteorological data [
65].
By coupling satellite observations with atmospheric transport modeling, it is possible to perform inverse modeling to better attribute observed methane concentrations to specific surface emissions. For instance, HYSPLIT can be used to generate back trajectories from satellite-observed methane plumes, tracing them back to potential emission sources on the ground [
66]. This approach can help disentangle the contributions of multiple sources within a satellite pixel and improve the spatial resolution of emission estimates.
In practical terms, implementing HYSPLIT involves several steps.
Meteorological Data Integration: Obtain high-resolution meteorological data (e.g., wind fields, temperature profiles) from sources such as the Global Data Assimilation System (GDAS) to drive the transport model.
Trajectory Calculation: Use HYSPLIT to calculate backward trajectories from the satellite measurement points at different altitudes, accounting for atmospheric mixing and transport processes.
Source Attribution: Analyze the trajectories to identify potential emission sources upstream of the measurement location, considering the time lag due to atmospheric transport.
Emission Estimation: Combine the trajectory information with emission inventories or ground-based measurements to estimate the contributions from different sources.
Optimizing existing methods can also involve improving the temporal and spatial coverage of data. One way to achieve this is by integrating data from multiple satellites, such as combining Sentinel-5P with other satellites like GOSAT or TROPOMI, to increase data availability and reduce gaps due to cloud cover [
67].
Additionally, incorporating more granular data from ground-based measurements can enhance model accuracy. Deploying ground-based remote sensing instruments like Fourier Transform Infrared Spectrometers (FTIRs) or using Unmanned Aerial Vehicles (UAVs) equipped with methane sensors can provide high-resolution data to validate and calibrate satellite observations and atmospheric models. Collecting detailed information from dairy farms and processors, such as monthly feed usage, herd sizes, manure management practices, and implemented emission reduction measures, would provide valuable input parameters for the models. Incorporating these operational variables into the machine learning models can improve the precision of methane concentration predictions by accounting for source-specific emission factors.
Future work should also explore the integration of advanced data assimilation techniques, such as ensemble Kalman filters, to optimally combine observations from different sources and models. This could enhance the robustness of emission estimates and provide real-time monitoring capabilities. Furthermore, refining machine learning models by incorporating additional environmental variables (e.g., temperature, humidity, wind speed) can improve prediction accuracy. Implementing hybrid models that combine physical atmospheric transport models with data-driven machine learning approaches may offer a synergistic advantage, capturing both the underlying physical processes and complex patterns in the data.
6. Conclusions
By innovatively integrating satellite data, statistical analysis, and machine learning techniques, this study offers a comprehensive evaluation of regional methane concentrations in the Canadian dairy industry. A key novel contribution of this work is the establishment of an extensive eight-year dataset, derived from Sentinel-5P satellite observations, encompassing 575 dairy farms and 384 dairy processors across Canada. This dataset enabled us to uncover significant temporal and spatial variations in methane emissions, revealing that concentrations peak during winter and are highest in Ontario compared to other provinces. Notably, we identified an increase in methane concentrations since the onset of the COVID-19 pandemic, likely linked to changes in production methods, supply chain disruptions, and market fluctuations. Furthermore, we developed Long Short-Term Memory (LSTM) neural network models that accurately forecast future regional methane concentrations by incorporating seasonal and annual factors, significantly enhancing prediction accuracy. This predictive capability represents a substantial advancement in environmental monitoring, demonstrating the potential of machine learning to capture complex emission patterns and provide actionable insights.
Based on our findings, we recommend that policymakers utilize the established benchmarks to set realistic, data-driven emission reduction targets and develop targeted regulations that address the identified regional and seasonal disparities. For example, implementing policies that support emission mitigation strategies during winter months and focusing on high-emission provinces like Ontario could prove particularly effective. Additionally, addressing the factors contributing to increased emissions post-pandemic—such as production scaling and supply chain inefficiencies—should be a priority to reverse this upward trend. For dairy industry stakeholders, the predictive insights from the LSTM models offer a valuable tool for proactive planning. By anticipating periods of elevated emissions, farmers and processors can implement best practices and emission reduction strategies precisely when they are most needed. Embracing data-driven decision-making can lead to more efficient operations, reduced environmental impact, and enhanced competitiveness in markets that increasingly value sustainability. Overall, this study not only elucidates the temporal and spatial dynamics of methane emissions within the dairy sector but also highlights the transformative potential of combining satellite data with machine learning for environmental monitoring. By advancing our ability to accurately benchmark and predict methane emissions, we provide a robust scientific foundation for developing more effective emission reduction strategies. Future enhancements to the model structure and the incorporation of more granular data could further refine predictions, contributing significantly to global efforts to mitigate climate change and lessen the greenhouse effect’s impact on our planet.