Application of Machine Learning to Estimate Ammonia Atmospheric Emissions and Concentrations

: This paper describes an innovative method that recursively applies the machine learning Random Forest to an assumed homogeneous aerographic domain around measurement sites to predict concentrations and emissions of ammonia, an atmospheric pollutant that causes acidification and eutrophication of soil and water and contributes to secondary PM 2.5 . The methodology was implemented to understand the effects of weather and emission changes on atmospheric ammonia concentrations. The model was trained and tested by hourly measurements of ammonia concentrations and atmospheric turbulence parameters, starting from a constant emission scenario. The initial values of emissions were calculated based on a bottom-up emission inventory detailed at the municipal level and considering a circular area of about 4 km radius centered on measurement sites. By comparing predicted and measured concentrations for each iteration, the emissions were modified, the model’s training and testing were repeated, and the model converged to a very high performance in predicting ammonia concentrations and establishing hourly time-varying emission profiles. The ammonia concentration predictions were extremely accurate and reliable compared to the measured values. The relationship between NH 3 concentrations and the calculated emissions rates is compatible with physical atmospheric turbulence parameters. The site-specific emissions profiles, estimated by the proposed methodology, clearly show a nonlinear relation with measured concentrations and allow the identification of the effect of atmospheric turbulence on pollutant accumulation. The proposed methodology is suitable for validating and confirming emission time series and defining highly accurate emission profiles for the improvement of the performances of chemical and transport models (CTMs) in combination with in situ measurements and/or optical depth from satellite observation.


Introduction
Atmospheric emissions of ammonia (NH 3 ) can react with nitrogen and sulfur oxides, contributing significantly to the formation of secondary inorganic PM 2.5 and leading to the acidification and eutrophication of soil and water [1].The importance of monitoring atmospheric ammonia is well recognized, defining ammonia as one of the most crucial substances to monitor among greenhouse gases and particulate matter [2][3][4][5].International and national regulations on air pollution require a reduction in atmospheric emissions of ammonia, as for nitrogen oxides (NO x ), non-methane volatile organic compounds (NMVOCs), sulfur dioxide (SO 2 ), and fine particulate matter (PM 2.5 ) [6][7][8].Emission inventories play a fundamental role in the estimation of emission reduction; their accuracy is a determinant in supporting air quality plans and policy makers [9,10].
In Europe, the agriculture sector contributes around 94% of total ammonia emissions [11], and this datum is confirmed in Italy [12] and in the Po River area, where the level is 97% [13].The Po Valley, located in the northern part of Italy, is surrounded by mountains and is often affected by atmospheric stagnation and thermal inversion conditions.It is characterized by areas with high population density interspersed with heavily industrialized and intensive farming areas [14].According to the national veterinary records office [15], about 80% of cows, swine, and poultry are bred in the regions of the Po Basin, determining a higher relative emission density of ammonia compared to the rest of Italy and EU-27 [13].In northern Italy, livestock contribute around 83% of the total ammonia emissions, and the use of mineral fertilizers contributes 15%.NH 3 emissions from livestock occur during animal housing, manure storage, spreading, and grazing, though the latter phase is relatively negligible, considering the intensive level of farming in northern Italy.The contribution of NH 3 emissions to the formation of secondary particulate matter is highlighted for the Po Basin by different studies [16,17].
In national and local emission inventories, the estimates of total annual NH 3 emissions are based on animal numbers, fertilizer consumption, and emission factors.Emission factors aim to describe how nitrogen (N) in manure and in fertilizers is lost as NH 3 in the atmosphere.Several factors, including the concentration of N components in manure, the concentration of NH 3 at the exchanging surface and in the atmosphere above the manure, air turbulence conditions, temperature, and pH, can influence the rate of NH 3 emission.Emissions occur primarily after spreading and are also influenced by the viscosity and dry matter content of the manure applied on a land surface [13].Emissions from synthetic fertilizers are influenced by the application technique, chemical composition, and atmospheric turbulence conditions, which can affect the interphase NH 3 concentration [18].As a matter of fact, atmospheric turbulence seems to affect the release of ammonia in more than one way.The time modulation of ammonia emissions in a chemistry transport model (CTM) can be estimated based on time-varying meteorological variables, as reported by different studies [1,[19][20][21][22][23][24][25].
The above-mentioned variables are used in several detailed models [26][27][28][29][30][31][32], and the use of machine learning approaches has been investigated for estimating time-varying ammonia emissions [2,[33][34][35].As reported by Hempel et al. [2], the development and release of new algorithms and the increase in data availability also support the implementation of machine learning approaches in different sectors of agriculture [36].
Machine learning has been employed to differentiate the effects of weather and emission changes on air quality, such as PM 1 composition alterations due to Beijing's Clean Air Action Plan [37].It has also been used to study the variation in NO 2 , O 3 , and PM 2.5 levels during the COVID-19 lockdown [38][39][40].This methodology can be extended to reactive species like NH 3 , enabling the analysis of atmospheric impacts and gas-to-particle conversion influences on NH 3 concentration [41].
Neural-network-based chemical transport models learn the complex correlation between emissions and atmospheric concentrations, and they have been used to enhance the accuracy of emission inventories and the performance of air quality models through a back-propagation approach that adjusts the gradient of the loss function, which measures the deviation between predicted and observed contaminant concentrations [42].
As previously mentioned, there are two primary ways for predicting ammonia concentrations: physical methods and machine learning techniques.Physical techniques involve a lot of variables and can be used at the farm level or for specific manure management tasks.CTMs are defined from a physical perspective as well and are used on a bigger scale.They include the chemical reactions and the transport of pollutants, emission models, and all available information regarding emissions provided by various inventories, but, generally, they simplify data on emission temporal variation.The estimations of ammonia emissions for the entire year are disaggregated in CTMs based on overall temporal profiles.Only in a subsequent reanalysis phase can the emissions be recalculated by comparing simulation results with observed data.Machine learning has been used in both CTM combinations [37][38][39][40][41] and single farm resolutions [2].
To estimate ammonia concentrations and emissions with high accuracy, this work introduces a novel approach that applies the machine learning Random Forest iteratively to an assumed homogenous aerographic region surrounding measurement sites.This method is interesting since, to our knowledge, there is not a forecast model for emissions and con-centrations relating to this subject that is as accurate and that can still preserve complete compatibility with atmospheric turbulence parameters.The proposed model will be used to estimate ammonia emission trends, allowing the validation of the annual emission estimates of emission inventories and the obtainment of useful temporal profiles for CTMs.
This paper is divided into five sections, beginning with this introduction.Section 2 provides information on the ground theory of the methodology, presenting the main physical hypothesis, the inputs to the model, and the methods of iterating Random Forest.The validity and reliability of the methodology are shown in Section 3. In Section 4, the relations between ammonia concentrations, emissions, and atmospheric turbulence parameters are widely discussed.Finally, the conclusion, limitations, and future direction of the study are summarized in Section 5.

Materials and Methods
The implemented approach is based on some physical and phenomenological assumptions about ammonia emission rates and dispersion.As stated in the fundamental principles of pollutant dispersion modeling, the observed concentrations of ammonia at the measuring stations were determined by emissions from various sources close to the stations, by pollutant transport, deposition, and reactions, as well as meteorological conditions.
It is well known that gaseous ammonia in the atmosphere tends to convert very rapidly to ammoniacal compounds (NH 4 +) and that NH 3 concentrations decrease quickly within the first 1-2 km from the sources [15].Therefore, it is conceivable that relatively nearby sources determined the gaseous ammonia quantities that were detected.
The following fundamental assumptions form the basis of this work.Firstly, an area with a radius of 3.6 km was considered for each measurement site, due to the maximum distance that air travels in an hour with a wind velocity of 1 m/s, which is typical for the region (Table 1).Secondly, a Random Forest model was trained and tested on the measured hourly ammonia concentrations, using measured turbulence parameters and a first guess of the total emission of NH 3 as input variables.The first-guess emission value was defined from the local emission inventory, considering an average local value within the circled area around the site.Finally, the testing and training of the Random Forest model was reiterated, correcting the hourly emissions by the ratio between the measured and estimated concentrations.

Measurement Sites and Data
In this work, a dataset for ten measurement sites located in the Po Basin was developed, considering the following:
The dataset covers a nine-year period (start of 2014-end of 2022), except for the location of Moggio (7_RB), where the time series is from 2014 to the beginning of 2021 due to a lack of data.
Table 1 presents the identification codes for the different measurement sites, along with the average values of ammonia concentration and wind speed measured for the entire dataset.For each site, the table also reports the emission estimates for the surrounding area of the measurement station for the years 2014, 2017, and 2019.The locations of the sites are shown in Figure 1.The dataset covers a nine-year period (start of 2014-end of 2022), except for the location of Moggio (7_RB), where the time series is from 2014 to the beginning of 2021 due to a lack of data.

9_UI
Table 1 presents the identification codes for the different measurement sites, along with the average values of ammonia concentration and wind speed measured for the entire dataset.For each site, the table also reports the emission estimates for the surrounding area of the measurement station for the years 2014, 2017, and 2019.The locations of the sites are shown in Figure 1.

Ammonia Measurement Sites
The ammonia measurement sites considered in this study belong to the Air Quality Monitoring Network of the Regional Environmental Protection Agency of Lombardy (ARPA Lombardia).The Monitoring Network consists of both permanent stations and mobile samplers, the former providing, by means of automatic analyzers, continuous data at regular time intervals.Ammonia atmospheric concentrations are mainly detected by NO x analyzers, which are based on the principle of chemiluminescence.Thus, starting from the standardized method (UNI EN 14211:2012) for NO x , besides the molybdenum converter heated to 315-325 • C to convert NO 2 into NO, an ammonia analyzer was also equipped with a converter heated to 750-825 • C, which transformed NH 3 into NO.

Meteorological Parameters of the Measurement Sites
In this study, several meteorological variables were considered that could potentially affect the accumulation processes of ammonia in the atmosphere.These variables include wind direction [ • ], precipitation [mm], global solar radiation [W/m 2 ], ambient temperature [ • C], relative humidity [%], and wind velocity [m/s].These data were obtained from the monitoring network of ARPA Lombardia.
Since not all the ammonia monitoring sites had the complete set of meteorological parameters, it was necessary to consider those from the closest meteorological station/stations if meteorological data were not available.
Figure 1 also shows the meteorological stations used for the possible completion of the data.They can be considered representative for the ammonia measurement sites because they are all located a few kilometers away and are characterized by homogeneous conditions of use and land cover and altimetry.No additional stations were indicated for the ammonia measurement sites for which the meteorological dataset was already complete.

Annual Emission Estimates
Annual emission estimates in the surrounding NH 3 measurement sites refer to the years 2013, 2017, and 2019 and were obtained from the common air emission datasets developed by ARPA Lombardia in the frame of the "LIFE PREPAIR inventory" (LPi) [13].Since no emission assessment was available for 2014, the year of the beginning of the time series considered in this study, the closest data from the inventory edition for 2013 were used for this year.
The work of updating Po Basin inventories with a high spatial resolution scale at the municipal level was carried out by environmental protection agencies and the regions of Lombardy, Emilia-Romagna, Piedmont, Veneto, Friuli Venezia Giulia, Valle d'Aosta, and Bolzano and the province of Trento.Figure 1 shows the emission density map of NH 3 of the LPi referred to 2017 and the positions of the ten ammonia measurement sites.
NH 3 emission sources in the LPi are described according to the SNAP (Selected Nomenclature for sources of Air Pollution) classification, where several categories may be identified as: fertilizer application, livestock, traffic, residential/commercial, industry, and other; and several subcategories may be identified as: animal subcategories, vehicle subcategories, domestic combustion, agricultural soils, etc.
In the present study and in Table 1, the ammonia sources of interest are grouped into three categories: "agriculture", "road transport", and "other sources".Other sources include minor contributions to ammonia emissions from: other sources and absorptions, non-industrial combustion, waste treatment and disposal, energy production, and fuel transformation.The bottom-up emission inventory was obtained by very detailed geographic data (e.g., municipal number of livestock units); these were then implicitly considered in the emission estimates, with more details reported in Marongiu et al. [13].The LPi was obtained by multiplying the activity levels by the corresponding emission factors and aggregating the values for all municipalities, all sources, and all fuel types during a full year.The specific equation is reported in the following: where: E m = NH 3 annual emission for the municipality; s = source type; f = fuel type; I s,f,m = activity indicator; EF s,f = NH 3 emission factor.
The compilation of inventories on a municipal scale is comparable, despite the many subjects involved, thanks to the use of the same "INEMAR database" modeling system, which follows the guidelines of the EEA [9,10].
An estimation of emissions was possible thanks to high-resolution maps from the LPi, developed at a municipal level.A circle with a radius of 3.6 km was set around the measuring station, and the quantity of ammonia emitted relative to that area was extracted.By intersecting the map of the LPi municipal areas with the circle area, it was possible to calculate the portion of the municipal area reported that is located within the area of the measuring station.The total emission for each station (E m ) is given by the sum of the LPi emissions of individual municipalities calculated in proportion to how much of their territorial area falls within the area around the measuring station: where: E s = emission for each station; E m = total municipal NH 3 emission; A m = total municipal area; A C m = municipal area within circle area station.
The overall dataset examined in the present study was characterized, with respect to previous experiences of data collection for Agrimonia [43], for the use of the bottom-up LPi, together with the hourly data on ammonia concentrations and the main meteorological parameters, in long-term time series.

Machine Learning Method and Random Forest
The entire dataset encompasses 626,646 valid hourly observations of ammonia concentrations and atmospheric turbulence parameters.The correlation parameters were calculated both for the entire dataset and considering each site separately.This analysis was extended by considering different quartiles for both single stations and the entire dataset (Table S1 in the Supplementary Materials).The correlation analysis did not reveal any important correlation between concentrations and atmospheric turbulence indicators.This suggested the implementation of a more sophisticated machine learning approach.
Hempel et al. have investigated how the selection of training data and modelling approach affects the estimation of ammonia emissions from a naturally ventilated dairy barn [2].In their work, they concluded that ensemble methods of gradient boosting and Random Forest gave the best predictions for emissions, confirming that machine learning approaches can improve emissions predictions.
This study was based on the Random Forest method: randomForestSRC [44][45][46], implemented in a CRAN-compliant R-package [47], using fast OpenMP parallel processing to construct forests for regression; classification; survival analysis; competing risk analysis; multivariate, unsupervised, quantile regression; and class imbalanced q-classification [47].The package implements Breiman Random Forests [48] in a variety of problems.
The approach called Random Forest (RF) can improve ensemble learning by injecting randomization into the base learning process [48].In RF, the predictions are obtained by means of the trees on feature subsets [49].This approach has been extended in Random Survival Forest (RSF), developed by Ishwaran et al. [46][47][48][49][50]. RF is a method that averages trees and develops an ensemble by a randomization in the learning process in two ways: random sampling of the data to grow a tree and random feature selection.

Sampling of Training Data and Cross-Validation
The hourly emission flux of ammonia F_NH 3 [kg/h] must be considered as an input variable of the machine learning model together with atmospheric turbulence parameters.The Random Forest was applied, simulating each measuring site, as reported in Figure 1, considering measured hourly values of: NH 3 concentration, temperature, precipitation, wind intensity and direction, solar radiation, humidity, and first-guess hourly ammonia emissions.The initial value for the ammonia emission flux was estimated for each location by the LPi in 2017, as detailed in Section 2.1.3,and is reported in Table 1.The dataset was filtered, omitting missing values and according to NH 3 concentrations less than the 99th percentile.For training and testing operations, the dataset was randomly divided into two subsets: one containing 70% of the data for training and the other containing the remaining 30% for testing.Figure 2, for the Schivenoglia site (10_RB), illustrates the initial assumption and subsequent nine iterations of the estimated ammonia flux, "emi_1", during a sample period.The iterations adjust the initial ammonia flux, which is multiplied by the ratio between the measured and predicted concentrations for each iteration, as displayed in the lower section.

Results
Figure 3 illustrates the iterative enhancement of the correlation between predicted and observed concentrations at each of the ten stations.The starting point is the constant emission scenario, and it is clear that all sites exhibit similar trends through the model's iterations.Notably, the accuracy of the comparison between calculated and predicted values significantly improves by the second and third iterations.In the elaboration of training and testing for each measurement site, the model performances for the predictions on the test subsets were quite like those obtained in the training phases.The procedure develops as previously described, selecting and reincorporating new random subsets for training and testing.With each iteration, the predictive performance progressively enhances due to the refinement of the emissive input.During the first iteration, the algorithm identifies a set of atmospheric turbulence parameters favorable to ammonia accumulation in the atmosphere.Consequently, the concentrations, "conc_1", in some cases tend to overestimate the concentration compared to the measured values.
The predictions estimated in the first iteration depend only on the variability in the atmospheric conditions and do not consider real activity levels and other source-specific variables.From the second iteration, the model corrects the hourly varying emissions, and the predictions vary, smoothing some peaks.

Results
Figure 3 illustrates the iterative enhancement of the correlation between predicted and observed concentrations at each of the ten stations.The starting point is the constant emission scenario, and it is clear that all sites exhibit similar trends through the model's iterations.Notably, the accuracy of the comparison between calculated and predicted values significantly improves by the second and third iterations.In the elaboration of training and testing for each measurement site, the model performances for the predictions on the test subsets were quite like those obtained in the training phases.Table 2 presents the model's performance metrics, including R-squared estimates and error rates, for each monitoring site, the training and testing datasets, and each iterative step.The data clearly show a very similar behavior for all the monitoring sites, with R-squared values higher than 0.9 from the second or third iteration.The ammonia daily mean concentrations during the years 2014-2022 in site 3_RB are shown in Figure 4.The comparison of the predictions and the data obtained by the measurements shows very good agreement.On the other hand, the decoupling between emission rates and concentrations is more evident in some periods of the years 2020 and 2021.The comparison between emission rates and concentrations is based on real valid data; no data completion procedures are applied in Figure 4.The developed methodology is not affected by the absence of valid data, which can be a more relevant issue in the calculation of annual and monthly total emission rates.
The comparison between emission rates and concentrations is based on real valid data; no data completion procedures are applied in Figure 4.The developed methodology is not affected by the absence of valid data, which can be a more relevant issue in the calculation of annual and monthly total emission rates.The proposed methodology allows the calculation of annual, monthly, and daily variation in emission rates.As reported by Asman et al. [51], emission rates can show a peak in the afternoon related to warmer temperatures and higher turbulence.Farming operations can vary during the year, reasonably showing peaks in Spring and Autumn.
The estimated hourly emission rates and measured mean concentrations for all the sites are reported in Figure 5 and in Figure S1 of the Supplementary Materials and clearly show a maximum during the afternoon or late morning in a majority of the sites.The emissions profiles can also be more complex, considering that ammonia emissions can occur several times after specific operations in livestock activities.The proposed methodology allows the calculation of annual, monthly, and daily variation in emission rates.As reported by Asman et al. [51], emission rates can show a peak in the afternoon related to warmer temperatures and higher turbulence.Farming operations can vary during the year, reasonably showing peaks in Spring and Autumn.
The estimated hourly emission rates and measured mean concentrations for all the sites are reported in Figure 5 and in Figure S1 of the Supplementary Materials and clearly show a maximum during the afternoon or late morning in a majority of the sites.The emissions profiles can also be more complex, considering that ammonia emissions can occur several times after specific operations in livestock activities.In Figure 6 and in Figure S2 of the Supplementary Materials are shown the monthly average emissions profiles for each site compared with similar elaborations for measured ammonia concentrations, confirming the presence of the peaks in Spring and Autumn.
Air 2024, 2, FOR PEER REVIEW 13 In Figure 6 and in Figure S2 of the Supplementary Materials are shown the monthly average emissions profiles for each site compared with similar elaborations for measured ammonia concentrations, confirming the presence of the peaks in Spring and Autumn.A data completion calculation is considered in Figure 7.The total annual emission rate is calculated by applying a coefficient defined as the ratio between the total hours in the year and the number of valid data.Figure 7 clearly shows how the calculated emissions obtained by the methodology described in Section 2 are in very good agreement with the LPi.The spatial variation in emissions seems to show a better agreement than time series for certain sites.A data completion calculation is considered in Figure 7.The total annual emission rate is calculated by applying a coefficient defined as the ratio between the total hours in the year and the number of valid data.Figure 7 clearly shows how the calculated emissions obtained by the methodology described in Section 2 are in very good agreement with the LPi.The spatial variation in emissions seems to show a better agreement than time series for certain sites.The motivation can be difficult to define; the emission inventory is affected by different levels of uncertainties regarding emission factors and by the annual and intra-annual fluctuation in the number of animals bred.The field application of manure can occur in different days of the year, even confined to the cultivation seasons, but can also occur not in the same municipality of the farm in which the animals have produced the excreta nitrogen flow.
CTMs need the allocation of the annual emission inventory to hourly timesteps.The applied temporal patterns play an important role, affecting the simulation results both in diagnostic and scenario elaborations.Veratti et al. [17] report an overview of the temporal distribution in northern Italy of NH3 emissions, applying four different air quality modelling systems based on three chemical transport models (CHIMERE, FARM, and CAMx) [52][53][54][55][56][57][58].The minimum and maximum monthly NH3 emissions are reported in Figure 8 and compared to the calculations obtained by this study for the site 3_RB, considering the whole time series.Table 3 summarizes all the data used for the comparison and the details about the different sources.
The site 3_RB can be considered as representative of the area with a higher emission density in the domain.The modelling emission profiles are in quite good agreement with the calculations obtained by this study.The main peaks are visible according to the period of field application of manure.The emissions calculated in this paper can show a wide variability in the years with the same order of magnitude as the range reported in the CTM modelling systems.The motivation can be difficult to define; the emission inventory is affected by different levels of uncertainties regarding emission factors and by the annual and intra-annual fluctuation in the number of animals bred.The field application of manure can occur in different days of the year, even confined to the cultivation seasons, but can also occur not in the same municipality of the farm in which the animals have produced the excreta nitrogen flow.
CTMs need the allocation of the annual emission inventory to hourly timesteps.The applied temporal patterns play an important role, affecting the simulation results both in diagnostic and scenario elaborations.Veratti et al. [17] report an overview of the temporal distribution in northern Italy of NH 3 emissions, applying four different air quality modelling systems based on three chemical transport models (CHIMERE, FARM, and CAMx) [52][53][54][55][56][57][58].The minimum and maximum monthly NH 3 emissions are reported in Figure 8 and compared to the calculations obtained by this study for the site 3_RB, considering the whole time series.Table 3 summarizes all the data used for the comparison and the details about the different sources.
The site 3_RB can be considered as representative of the area with a higher emission density in the domain.The modelling emission profiles are in quite good agreement with the calculations obtained by this study.The main peaks are visible according to the period of field application of manure.The emissions calculated in this paper can show a wide variability in the years with the same order of magnitude as the range reported in the CTM modelling systems.Guevara et al. report that the CAMS-REG-TEMPO monthly and daily profiles for livestock are assumed to be dependent on temperature and ventilation rates, while the hourly profiles are based on fixed weight factors due to a data limitation issue [64].Figure S3 shows for each measurement site the comparison of NH 3 monthly emissions variability between this study (2014-2022) (blue bars), CAMS-GLOB-ANT (2014-2022) (orange) [61,65,66], and Livestock CAMS-REG-TEMPO (2014-2020) (black) [64,65].The comparison demonstrates how the profiles derived in this study using in situ measurements have greater variability for each month and can exhibit multiple relative peaks throughout the year.The temporal profiles reported by Veratti et al. [17] were obtained from four different air quality modelling systems applied to the Po Basin.They are fixed temporal profiles derived from an estimation of the potential activities associated with livestock and manure management throughout a given year.In comparison to the findings of this study, the presence of a double peak during the year appears to be confirmed at numerous sites.The substantial variability in ammonia emissions found in this work shows that developing modelling systems with dynamic emission input will lead to improved modelling of ammonia concentrations and, in the future, a better understanding of the formation mechanisms of particulate matter.

Discussion
The developed methodology consists in the solution of the inverse problem of estimating the ammonia emission rate in a restricted area nearby a monitoring station.The approach of inverse modelling allows the quantification of ammonia emissions using observed atmospheric concentrations and turbulence parameters.
The calculation of the emission rates in conjunction with measured atmospheric concentrations is a common goal in solving inverse problems using a Bayesian framework [67].Also, in this case, the authors restricted the domain to short-range transport using a Gaussian plume-type solution as a forward solver for the transport of particles from fugitive sources.In the application of a Bayesian framework, it was reported that the authors would avoid so-called "inverse crimes" [68].Inverse crimes happen when numerical methods yield unrealistically optimistic results.
To evaluate the reliability of the estimates, the link between emissions and concentrations was examined, showing that the NH 3 concentrations and the calculated emissions are compatible with physical atmospheric turbulence parameters.
A decision tree was applied for describing observed concentrations with air turbulence parameters time-varying with NH 3 emission rates.The applied methodology is available in the R package: "rpart" [69].The decision tree learning method applied to each site aims to construct a model that describes ammonia concentration using atmospheric turbulence parameters.In the simulated tree structures, leaves represent class labels for various NH 3 concentrations, while branches reflect feature combinations on atmospheric turbulence parameters that result in those class labels.Each point in Figure 9 is defined by ammonia atmospheric concentrations and the estimated emission rate.The tree learning was applied, assigning to each point the corresponding class labels due to the range of possible turbulence parameters.
Figure 9 shows for each site the variation in atmospheric concentrations of ammonia with the increase in the calculated emission rates.The decision tree identified different levels of possible NH 3 atmospheric accumulation at a fixed emission level, highlighting the possible role of meteorological conditions.As will be discussed below, the results are reasonable from a physical point of view.
The data in Figure 9 are most dispersed, with an increase in the importance of the atmospheric turbulence parameters with respect to emissions.This analysis is also confirmed by the correlation between measured ammonia concentrations and ammonia emissions reported in Table S2 in the Supplementary Materials.An R-squared value around 0.9 was calculated for station 9_UI (industrial sites), which confirms, in Figure 9, a direct relation between emissions and concentrations.In the case of the industrial site 9_UI, the analysis was not able to identify a specific turbulence pattern.Figure 9 shows for each site the variation in atmospheric concentrations of ammonia with the increase in the calculated emission rates.The decision tree identified different The lowest correlation between emissions and concentrations was obtained for station 7_RB (R 2 = 0.2), which shows a very wide dispersion of data in Figure 9. Considering the results of the LPi in Table 1, sites 7_RB and 9_UI have specific peculiarities: the first is placed in the lower emission density area of the domain and the second is characterized by very different emissions sources (industrial) compared to the other sites (agriculture).
For the remaining measuring stations, higher temperatures, higher solar radiations, and lower wind velocities seem to favor the accumulation of gaseous ammonia in the atmosphere.
Considering station 1_RB, an emission rate of 40 kg NH 3 /h determines atmospheric concentrations in the range of 10-40 µg/m 3 .This variation can be explained considering the role of solar radiation and wind velocity.Higher values of solar radiation seem to determine higher concentrations with constant emissions.The role of wind velocity was identified, as a second actor, playing in a different way.At a fixed range of thermal radiation, higher wind strength will decrease the concentrations, allowing a better atmospheric dispersion.
The main variables identified show that, for the same emission, the meteorological parameters that allow a greater accumulation in the atmosphere are very similar to those used for the definition of atmospheric stability classes.Atmospheric turbulence can be categorized into six stability classes, ranging from the most unstable or turbulent (low wind speed and high thermal radiation) to the most stable or least turbulent (high wind speed and low insolation, as during the night).The results of the study do not allow us to assess whether the Gaussian plume-type solution can be used to further optimize the results, even if the low influence of wind direction is confirmed, this having been found only in two sites, 2_SU and 10_RB, and suggesting the presence of a specific source of ammonia emissions.
The scarce impact of wind direction partially confirms the lower influence of emissions at the outer range of 3.6 km around the site or the homogeneity of the emission fluxes in all the immediate neighboring areas.Also, the nature of the emissions from livestock and manure management can play a relevant role, these being emitted at ground level without stack velocity and with time delay after agriculture activities, as in manure field applications.
Calculations have shown that there are no distinct correlations between variables of atmospheric turbulence and the measured concentrations of ammonia, and these variations occur unpredictably depending on the considered site (Table S1, Supplementary Materials).Local geographical conditions and activities mainly in agriculture, as well as proximity to emission sources, play a pivotal role in atmospheric ammonia concentrations.The correlation of concentrations with emission rates (Table S2, Supplementary Materials) varies significantly based on these factors at each site.This confirms the reactive nature of ammonia, whose concentration rapidly fluctuates, which is also due to its easy conversion into NH 4 + and then into particulate matter.
The implemented methodology normalizes the input variables: atmospheric turbulence variables and emission rates.Figure 10 depicts a sensitivity analysis of the final computed concentrations and emissions achieved with two alternative first values of emissions.The results are displayed for station 3_RB with beginning emissions of 668 t/year (C_High and E_High) and 0.1 t/year (C_Low and E_Low).
The atmospheric concentrations are predicted with the same accuracy regardless of the initial inventory, confirming the reliability of the proposed approach and the high accuracy compared to predictions obtained using only meteorological parameters (C_Met) and emission rates (C_Emi).
The computed emission rates shown in Figure 10 were obtained from different first emission inventories and exhibit the same temporal trend, but with a proportionate factor due to the variable normalization.A calibration curve based on annual average measured concentrations and annual emissions could provide a plausible scaling and forecast of the initial values for emissions.
due to the variable normalization.A calibration curve based on annual average measured concentrations and annual emissions could provide a plausible scaling and forecast of the initial values for emissions.
Figure S4 in the Supplementary Materials depicts the link between the measured data and the emission estimations from the LPi utilized in this work, revealing a strong linear correlation, with an R 2 greater than 0.93.  Figure S4 in the Supplementary Materials depicts the link between the measured data and the emission estimations from the LPi utilized in this work, revealing a strong linear correlation, with an R 2 greater than 0.93.

Conclusions
In this study, a machine learning methodology for estimating atmospheric concentrations and emission rates based on atmospheric turbulence parameters has been implemented by recursive application of Random Forest.The proposed iterative process for the determination of the emission rates of NH 3 separates the effects of meteorology from the variation in space and time of the overall effects of the emission sources.
Each subsequent iteration improves the ability to predict ammonia concentrations, that is, gradually, the self-learning process inherent in the methodology unfolds and is strengthened from time to time by analyzing new data.
The proposed methodology provides very good accuracy in predicting time-varying ammonia concentrations compared with measured data in different sites of the considered domain.
The emission rates calculated in this study are compared with the main results of the emission inventories estimated for the investigated area, considering both their spatial and temporal variation.This comparison, with very encouraging results, is very important to ensure consistency between the estimated data, determined on an hourly basis, with the available and estimated independent data extracted at the same site but on an annual scale.
The goal of an emissions Inventory is to offer the most complete assessment of the sources present in a given area and year.However, it does not account for the unpredictability of emissions caused by meteorological conditions or oscillations in activity levels.The approach in this study uses data collected in situ or via satellite to estimate the variability in emissions (pressure) in relation to modulations of observed concentrations (state).
Further analysis involved the monthly emission profiles estimated by the study, which agree with the main assumptions documented in air quality modeling or in inverse calculations from satellite observations and modeling simulations and with the main physical features from previous studies.The predicted emission rate profiles are reasonable, considering seasonal variation in temperature and solar rations and the possible programs in agriculture activities.
Considering the hourly time resolution, the emission profiles can show very high variability.Only a minor part of this variability can be explained by atmospheric turbulence parameters, and it can be reasonably linked to changes in the emission sources.At a very local scale, the field application of manure can occur in a different period of the year even in the same season and can also be affected by different parameters.The local presence of a certain number of livestock units cannot be always linked to the emissions of ammonia in all the manure management phases, the possible treatment and field application not always being located in proximity to the housing structure.
As a matter of fact, the analysis of measured ammonia concentrations does not show a recursive pattern, suggesting that, at a very local scale, the ammonia emission rate and its time series can be very variable.This aspect must be considered in the development of bottom-up emission inventories and in their use and application in air quality simulations.
The presented methodology considers relevant hypotheses on the chemical and physical behavior of ammonia in the atmosphere, assuming that the measured concentrations of gaseous NH 3 were determined by sources relatively close by.The presence of transport fluxes of this pollutant outside or inside the considered area could affect the calculation of the emission rates.A possible solution of this limitation is to consider in the calculation the effect of transport fluxes for those stations showing an effect of wind direction.As reported in Figure 9, wind direction has been identified only in a very limited number of sites: 2_SU (placed in a valley) and 10_RB (probably affected by a specific source).
The method developed in this paper could be applied to CTM simulated concentrations together with in situ measurements and/or optical depth from satellite observations (e.g., Copenicus Atmospheric Monitoring Services).The proposed methodology can be extended to the points of an entire area considering the data collected by the satellite and is suitable for being extended to further applications considering the interactions of gaseous ammonia with atmospheric acid gases in the formation of ultrafine particulate matter.CTM models could benefit by applying this methodology because they could use a dynamic emissive input obtained from in situ measurements or satellite data processing, as opposed to the more common approach, which uses average monthly, weekly, and hourly time profiles.It could be used to determine time series for specific tracers or contaminants, including levoglucosan, for biomass burning sources and heavy metals.

Figure 1 .
Figure 1.Emission density map in northern Italy and ammonia concentrations at the measurement sites considered in this study.

Figure 1 .
Figure 1.Emission density map in northern Italy and ammonia concentrations at the measurement sites considered in this study.

Figure 2 .
Figure 2. Focus on a short period for station 10_RB.Emission rates: illustrate the variation in the estimated emission flux for each model iteration.Concentrations: comparison between the predicted and measured concentrations of NH3 for each model iteration.

Figure 2 .
Figure 2. Focus on a short period for station 10_RB.Emission rates: illustrate the variation in the estimated emission flux for each model iteration.Concentrations: comparison between the predicted and measured concentrations of NH 3 for each model iteration.

Air 2024, 2 , 9 Figure 3 .
Figure 3.Comparison between the measured NH3 atmospheric concentrations and the predicted ones [µg/m 3 ] at each step of the iteration for the different sites.Each color corresponds to an iteration.

Figure 3 .
Figure 3.Comparison between the measured NH 3 atmospheric concentrations and the predicted ones [µg/m 3 ] at each step of the iteration for the different sites.Each color corresponds to an iteration.

Figure 4 .
Figure 4. Time series of daily average concentrations of measured and predicted ammonia [µg/m 3 ] and total daily emissions for 3_RB [kg NH3/day].

Figure 4 .
Figure 4. Time series of daily average concentrations of measured and predicted ammonia [µg/m 3 ] and total daily emissions for 3_RB [kg NH 3 /day].

Figure 5 .
Figure 5. Average hourly emission flux [kg/h] for each hour of the day compared to average measured ammonia concentrations [µg/m 3 ].Elaboration on quantiles.

Figure 5 .
Figure 5. Average hourly emission flux [kg/h] for each hour of the day compared to average measured ammonia concentrations [µg/m 3 ].Elaboration on quantiles.

Figure 6 .
Figure 6.Average hourly emission flux [kg/h] for each month of the year compared to average measured ammonia concentrations [µg/m 3 ].Elaboration on quantiles.

Figure 6 .
Figure 6.Average hourly emission flux [kg/h] for each month of the year compared to average measured ammonia concentrations [µg/m 3 ].Elaboration on quantiles.

Figure 7 .
Figure 7.Total annual emissions of ammonia [t/year] in different sites for three years: 2013, 2017, and 2019; comparison of calculated data from this study and the LPi.

Figure 7 .
Figure 7.Total annual emissions of ammonia [t/year] in different sites for three years: 2013, 2017, and 2019; comparison of calculated data from this study and the LPi.

63 ]Figure 8 .
Figure 8.Comparison of NH3 monthly emissions variability between this study and different available sources [17,59,61-63].Values are expressed as ratios between monthly and annual emissions for a fixed year.

Figure 8 .
Figure 8.Comparison of NH 3 monthly emissions variability between this study and different available sources [17,59,61-63].Values are expressed as ratios between monthly and annual emissions for a fixed year.

Figure 9 .
Figure 9. Relation between estimated ammonia emissions expressed in kg NH3/h and predicted ammonia concentrations [µg/m 3 ] as a function of turbulence atmospheric parameters (for site 4_UB ct: lower than wind calm threshold; nct: higher than wind calm threshold).

Figure 9 .
Figure 9. Relation between estimated ammonia emissions expressed in kg NH 3 /h and predicted ammonia concentrations [µg/m 3 ] as a function of turbulence atmospheric parameters (for site 4_UB ct: lower than wind calm threshold; nct: higher than wind calm threshold).

Figure 10 .
Figure 10.Sensitivity analysis on station 3_RB with two different starting emissions: lev1 = 668 t/year and lev2 = 0.1 t/year.Atmospheric concentrations in [µg/m 3 ] and emission estimates of NH3 in [kg/h].Figure 10.Sensitivity analysis on station 3_RB with two different starting emissions: lev 1 = 668 t/year and lev 2 = 0.1 t/year.Atmospheric concentrations in [µg/m 3 ] and emission estimates of NH 3 in [kg/h].Concentrations: C_Low starting emission on lev 2 , C_High starting emission on lev 1 , C_Emi calculated concentrations only considering emissions obtained from RF, C_Met calculated by RF only considering atmospheric turbulence parameters.Emissions: E_High calculated with starting value lev 1 and E_low calculated with staring value lev 2 .

Figure 10 .
Figure 10.Sensitivity analysis on station 3_RB with two different starting emissions: lev1 = 668 t/year and lev2 = 0.1 t/year.Atmospheric concentrations in [µg/m 3 ] and emission estimates of NH3 in [kg/h].Figure 10.Sensitivity analysis on station 3_RB with two different starting emissions: lev 1 = 668 t/year and lev 2 = 0.1 t/year.Atmospheric concentrations in [µg/m 3 ] and emission estimates of NH 3 in [kg/h].Concentrations: C_Low starting emission on lev 2 , C_High starting emission on lev 1 , C_Emi calculated concentrations only considering emissions obtained from RF, C_Met calculated by RF only considering atmospheric turbulence parameters.Emissions: E_High calculated with starting value lev 1 and E_low calculated with staring value lev 2 .

Table 1 .
Ammonia measurement sites, average measured ammonia and wind velocity, annual surrounding emissions estimates, and main emissions macrosectors.NH 3 emissions refer to the total amounts emitted per year (2014, 2017, and 2019) in each circular area with a radius of 3.6 km.

Table 2 .
Performance indicators at each iteration step for training and testing in each site.

Table 3 .
Summary of characteristics of data employed in the monthly ammonia emission flow comparison.