Determination of Region of Inﬂuence Obtained by Aircraft Vertical Proﬁles Using the Density of Trajectories from the HYSPLIT Model

: Aircraft atmospheric proﬁling is a valuable technique for determining greenhouse gas ﬂuxes at regional scales (10 4 –10 6 km 2 ). Here, we describe a new, simple method for estimating the surface inﬂuence of air samples that uses backward trajectories based on the Lagrangian model Hybrid Single-Particle Lagrangian Integrated Trajectory Model (HYSPLIT). We determined “regions of inﬂuence” on a quarterly basis between 2010 and 2018 for four aircraft vertical proﬁle sites: SAN and ALF in the eastern Amazon, and RBA and TAB or TEF in the western Amazon. We evaluated regions of inﬂuence in terms of their relative sensitivity to areas inside and outside the Amazon and their total area inside the Amazon. Regions of inﬂuence varied by quarter and less so by year. In the ﬁrst and fourth quarters, the contribution of the region of inﬂuence inside the Amazon was 83–93% for all sites, while in the second and third quarters, it was 57–75%. The interquarter di ﬀ erences are more evident in the eastern than in the western Amazon. Our analysis indicates that atmospheric proﬁles from the western sites are sensitive to 42–52.2% of the Amazon. In contrast, eastern Amazon sites are sensitive to only 10.9–25.3%. These results may help to spatially resolve the response of greenhouse gas emissions to climate variability over Amazon.


Introduction
Greenhouse gas (GHG) fluxes occur at various spatial scales, from the molecular level to global cycling, meaning that analysis strongly depends on the spatial scale. The level of complexity of these GHG fluxes changes during upscaling analysis as a function of the interrelations among physical, chemical, and biological processes [1,2]. The method for upscaling leaf-to landscape-scale terrestrial flux processes is known as the bottom-up approach. In this approach, on-the-ground data may be combined and constrained to estimate GHG fluxes from the surface using process-or empirical-based vegetation models [3]. For example, in the carbon cycle, process-based models simulate the underlying biogeochemical mechanisms (e.g., respiration, photosynthesis, and fire emissions) to constrain carbon flux for each plant functional type [3]. On the other hand, top-down approaches include the mass balance method, transport inversion, and atmospheric models with ground, tower, aircraft, and satellite observations to estimate surface fluxes by minimizing the differences between simulated and observed atmospheric GHG mole fraction measurements [4,5] (Figure 1).

Figure 1.
Overview of bottom-up and top-down approaches. Aircraft observational data are representative of an intermediate (regional) spatial scale and can be used to estimate regional fluxes. These regional fluxes can be used for both validations of upscaling of local fluxes and as a starting point for flux downscaling.
Atmospheric measurements can constrain fluxes of CO2 and other GHGs at a variety of scales. At relatively small scales, eddy covariance observations constrain net fluxes of CO2 through highfrequency (>1 Hz) measurements made at a local scale measurement tower [6,7]. The region of influence of the tower covers an area of approximately 1 km 2 and depends mainly on the type of forest and tower height, and the eddy covariance data can provide insights on biophysical variables controlling exchanges [8]. At much larger scales, highly calibrated CO2 mole fraction measurements, largely from remote global locations, can constrain terrestrial CO2 fluxes primarily at continental scales [9,10]. Between the resolution and spatial coverage of eddy covariance measurements and most global inverse models, there is a large missing scale. These regional scales, which we define to be of the order of 10 4 -10 6 km 2 , are important scales over which to determine GHG fluxes because they often relate to scales of climate anomalies and ecosystem variation that can influence GHG fluxes. Thus, Figure 1. Overview of bottom-up and top-down approaches. Aircraft observational data are representative of an intermediate (regional) spatial scale and can be used to estimate regional fluxes. These regional fluxes can be used for both validations of upscaling of local fluxes and as a starting point for flux downscaling.
Atmospheric measurements can constrain fluxes of CO 2 and other GHGs at a variety of scales. At relatively small scales, eddy covariance observations constrain net fluxes of CO 2 through high-frequency (>1 Hz) measurements made at a local scale measurement tower [6,7]. The region of influence of the tower covers an area of approximately 1 km 2 and depends mainly on the type of forest and tower height, and the eddy covariance data can provide insights on biophysical variables controlling exchanges [8]. At much larger scales, highly calibrated CO 2 mole fraction measurements, largely from remote global locations, can constrain terrestrial CO 2 fluxes primarily at continental scales [9,10]. Between the resolution and spatial coverage of eddy covariance measurements and most global inverse models, there is a large missing scale. These regional scales, which we define to be of the order of 10 4 -10 6 km 2 , are important scales over which to determine GHG fluxes because they Atmosphere 2020, 11, 1073 3 of 20 often relate to scales of climate anomalies and ecosystem variation that can influence GHG fluxes. Thus, improved understanding of GHG fluxes at regional scales can improve our understanding of the processes controlling them.
Regional-scale fluxes can be determined using bottom-up approaches by upscaling the eddy covariance measurements of the Net Ecosystem Exchange (NEE) using meteorological and remote sensing data, such as those used in the FLUXCOM [11]. As mentioned earlier, another bottom-up approach is the use of process-based terrestrial biosphere models to determine GHG fluxes within a given area. These bottom-up approaches have the potential for presenting considerable uncertainties relating to imperfect parameterizations of complex ecophysiological processes (in the case of models) and errors in meteorological and remote sensing data sets. An alternative approach to determining regional-scale GHG fluxes is the use of well-calibrated GHG mole fraction measurements made on air samples collected from aircraft vertical profiles extending from the surface to the lower/middle troposphere [5,12,13]. Top-down, regional-scale fluxes determined from such profiles are sensitive to all upstream fluxes, known and unknown. This represents an advantage over bottom-up estimates, which may not account for all processes, although top-down fluxes generally do not provide much spatial resolution and instead constrain the spatial integral of fluxes. Top-down fluxes are also uncertain as a result of assumptions or parameterization about atmospheric flow and mixing. Despite some drawbacks of regional-scale top-down fluxes, they can complement the information contained within high-resolution bottom-up approaches and the information obtained from large-scale inverse model studies.
Regional fluxes based on vertical profiles may provide additional information in understanding differences between GHG fluxes estimated from bottom-up approaches and large-scale inverse models. Conflicting results of estimated CO 2 sinks and sources have been observed between bottom-up approaches and inversion models in the tropics. Recently, Kondo et al. [14] pointed out a significant C sink from Support Vector Regression (SVR) from eddy covariance FLUXNET sites (bottom-up) in the tropics. In contrast, neutral CO 2 fluxes were observed in an intercomparison among several inversion models (top-down) [14]. In this example from Kondo et al. [14], it is unclear whether the discrepancy in tropical flux arises from issues in eddy covariance NEE upscaling or possibly from seasonal gaps in the tropical coverage of the Greenhouse gases Observing SATellite (GOSAT) CO 2 due to seasonal cloudiness [15]. Regionally representative vertical profiles, if available, would help to constrain the fluxes better.
The "regions of influence" associated with vertical profiles that are described and analyzed in this study need to be distinguished from "footprints" which are described by Lin et al. [16] and Stohl et al. [17] both in their construction and use. Although both regions of influence and footprints are derived from back-trajectory models forced by meteorological input (mainly horizontal wind fields), there are some important differences. Footprints quantify the sensitivity of mole fraction at an observation point (often referred to as the "receptor") to upwind fluxes of a gas. Footprints are determined from an ensemble of randomly perturbed back-trajectories released at each measurement point. In contrast, and as described in detail below, our regions of influence are calculated from single back-trajectories released from each measurement point. Binning trajectories create the ensemble of trajectories used to define the region of influence over most of the depth of the profile over three months. Footprints are also typically used as sensitivity matrices in Bayesian estimates of GHG fluxes in inversions, where the footprints project fluxes into mole fraction deviations (and vice versa) [18,19]. In contrast, the regions of influence described here are not intended to be used to calculate GHG fluxes from atmospheric data. Regions of influence can instead be used to (a) combine and compare GHG flux estimates from areas upwind of different sites and (b) determine the average value of meteorological variables such as temperature and precipitation in the area upwind of a site.
The estimation of regional fluxes based on GHG mole fractions from aircraft vertical profile air samples [5,12,13] can help constrain and thus improve our understanding of natural and anthropogenic GHG fluxes differences between bottom-up and top-down estimates. Regional flux observations can Atmosphere 2020, 11,1073 4 of 20 also improve biosphere model performance such that source and sink processes can be more accurately parameterized [14] to support policymakers in their efforts to regulate carbon emissions [20].
The goal of our research is to describe a method to compute the region of influence for atmospheric aircraft vertical profiles observations using back-trajectories of the HYSPLIT model [21]. We analyzed regions of influence for four vertical profiling sites in the Amazon Basin on a quarterly basis, over nine years. In addition, we addressed two questions for each site: (a) how large are the regions of influence inside and outside the Amazon (in km 2 )? (b) what is the relative area of the regions of influence inside the Amazon (in %)?

Amazon Mask
We are considering the Amazon subregions of Amazonia sensu stricto, Andes, Guiana, and Gurupí that have most of the Tropical and subtropical moist broadleaf forest, and we exclude the Planalto region because it has predominantly tropical and subtropical grasslands, savannas, and shrublands ( Figure 2) [22,23]. The Amazon mask was defined by the CARBAM project (the Amazon carbon balance long-term study [24], with an area of 7,256,362 km 2 (Figure 2a), and based on the Amazon boundaries of Eva et al. [22] and the biomes of Olson et al. [23]. Although Amazon has no defined limits considering air masses, we opt to select this boundary to compute the contribution of the region of influence inside and outside the mask.
Atmosphere 2020, 11, x FOR PEER REVIEW 4 of 20 The goal of our research is to describe a method to compute the region of influence for atmospheric aircraft vertical profiles observations using back-trajectories of the HYSPLIT model [21]. We analyzed regions of influence for four vertical profiling sites in the Amazon Basin on a quarterly basis, over nine years. In addition, we addressed two questions for each site: (a) how large are the regions of influence inside and outside the Amazon (in km 2 )? (b) what is the relative area of the regions of influence inside the Amazon (in %)?

Amazon Mask
We are considering the Amazon subregions of Amazonia sensu stricto, Andes, Guiana, and Gurupí that have most of the Tropical and subtropical moist broadleaf forest, and we exclude the Planalto region because it has predominantly tropical and subtropical grasslands, savannas, and shrublands ( Figure 2) [22,23]. The Amazon mask was defined by the CARBAM project (the Amazon carbon balance long-term study [24], with an area of 7,256,362 km 2 (Figure 2a), and based on the Amazon boundaries of Eva et al. [22] and the biomes of Olson et al. [23]. Although Amazon has no defined limits considering air masses, we opt to select this boundary to compute the contribution of the region of influence inside and outside the mask. Amazon subregions used to define the CARBAM mask (with the Planalto exception) of Eva et al. [22].

Description of CARBAM Flight Collection Sites
The four CARBAM sampling sites are spread across Amazonia: one southwestern site (RBA-9.38° S, 67.6° W), one northwestern site (TAB-5.96° S, 70.1° W between 2010 and 2012, replaced by TEF-3.31° S, 65.8° W since 2013), one northeastern site (SAN-2.86° S, 54.9° W), and one southeastern site (ALF-8.80° S, 56.7° W). According to Gatti et al. [25], the SAN site is not only influenced by tropical forest, but also by nonforest biomes, mainly the Cerrado (savanna) and the Caatinga, and the major city-Belem (2.5 million people) in Para state. The ALF site is also on the eastern side and has a limit of Cerrado (Planalto region in Figure 2b).
Amazon rainfall varies spatially with relatively low variability throughout the year in the northwestern basin that receives more than 2800 mm/y, while precipitation in the southeastern basin is strongly seasonal, with peak rainfall occurring in January and a long dry season centered on July with an annual total amount of around 1600 mm/y. The south and southeastern portions of the basin have a lower mean annual precipitation and more extended dry season (4-5 months with <100 mm rainfall) [26].
TAB has no dry season (defined as any month with less than 100 mm of rainfall), while TEF has a two-month dry season, with the lowest rainfall in August and climate classification of "Af"tropical forest climate. RBA has a dry season length of three months (JJA), with a peak in July and a  [25], the SAN site is not only influenced by tropical forest, but also by nonforest biomes, mainly the Cerrado (savanna) and the Caatinga, and the major city-Belem (2.5 million people) in Para state. The ALF site is also on the eastern side and has a limit of Cerrado (Planalto region in Figure 2b).
Amazon rainfall varies spatially with relatively low variability throughout the year in the northwestern basin that receives more than 2800 mm/y, while precipitation in the southeastern basin is strongly seasonal, with peak rainfall occurring in January and a long dry season centered on July with an annual total amount of around 1600 mm/y. The south and southeastern portions of the basin have a lower mean annual precipitation and more extended dry season (4-5 months with <100 mm rainfall) [26]. TAB has no dry season (defined as any month with less than 100 mm of rainfall), while TEF has a two-month dry season, with the lowest rainfall in August and climate classification of "Af"-tropical forest climate. RBA has a dry season length of three months (JJA), with a peak in July and a climate classification of "Am"-tropical monsoon climate. ALF has a four-month dry season (JJAS), with a peak in July, climate "Aw"-tropical savanna climate; SAN has a five-month dry season (JASON), with the lowest rainfall in August and a climate classification "Am" [27].
The northwest region, where TAB and TEF are located, is the most preserved region among the four aircraft vertical profiles sampling sites. ALF and south of RBA are in the "Arc of Deforestation" where land use and cover change predominate, mostly with agriculture and grasslands in Brazil and Bolivia [28][29][30][31].

Aircraft Vertical Profile Air Sampling
The vertical profile air sampling program started in 2000 with a single site at SAN [25,32] and expanded to the four sites from January 2010 [12]. The sampling profile collection was performed mainly two times per month over the same geographical position at each site, totaling around 600 vertical profiles between 2010 and 2018. The sampling flights were performed using small aircraft between 12:00 and 13:00 local time in a descending spiral profile to avoid the aircraft emission. This time is preferred because the planetary boundary layer is more fully developed [20].
A vertical profile is typically composed of 12 to 17 air samples collected at altitudes varying from 300 m to 4420 m asl using small aircraft equipped with a semiautomatic sampling system developed by the National Oceanic Atmospheric Administration Global Monitoring Laboratory (NOAA/GML) [25]. This system consists of two units: the Programmable Compressor Package (PCP), which pressurizes ambient air into glass flasks held in the second unit, the Programmable Flask Package (PFP). Each flask is filled by remote control by the pilot at a preprogrammed altitude; the sampling altitudes from each site are shown in Table S1.
The sampling system also has a temperature, pressure, and humidity sensor, and a GPS. Figure 3 shows the three airplane models used to carry out GHG sampling in the scope of the CARBAM project, detailing the sample inlet and the PFP and PCP suitcases.
Atmosphere 2020, 11, x FOR PEER REVIEW 5 of 20 climate classification of "Am"-tropical monsoon climate. ALF has a four-month dry season (JJAS), with a peak in July, climate "Aw"-tropical savanna climate; SAN has a five-month dry season (JASON), with the lowest rainfall in August and a climate classification "Am" [27]. The northwest region, where TAB and TEF are located, is the most preserved region among the four aircraft vertical profiles sampling sites. ALF and south of RBA are in the "Arc of Deforestation" where land use and cover change predominate, mostly with agriculture and grasslands in Brazil and Bolivia [28][29][30][31].

Aircraft Vertical Profile Air Sampling
The vertical profile air sampling program started in 2000 with a single site at SAN [25,32] and expanded to the four sites from January 2010 [12]. The sampling profile collection was performed mainly two times per month over the same geographical position at each site, totaling around 600 vertical profiles between 2010 and 2018. The sampling flights were performed using small aircraft between 12:00 and 13:00 local time in a descending spiral profile to avoid the aircraft emission. This time is preferred because the planetary boundary layer is more fully developed [20].
A vertical profile is typically composed of 12 to 17 air samples collected at altitudes varying from 300 m to 4420 m asl using small aircraft equipped with a semiautomatic sampling system developed by the National Oceanic Atmospheric Administration Global Monitoring Laboratory (NOAA/GML) [25]. This system consists of two units: the Programmable Compressor Package (PCP), which pressurizes ambient air into glass flasks held in the second unit, the Programmable Flask Package (PFP). Each flask is filled by remote control by the pilot at a preprogrammed altitude; the sampling altitudes from each site are shown in Table S1.
The sampling system also has a temperature, pressure, and humidity sensor, and a GPS. Figure 3 shows the three airplane models used to carry out GHG sampling in the scope of the CARBAM project, detailing the sample inlet and the PFP and PCP suitcases. For each site, we calculated influence functions for two vertical profiles per month, filling vertical profile gaps simulating flights using the same input parameters, such as altitude, site, and hour to be used in the HYSPLIT model (Table 1). Study areas have gaps in scheduled flights either due to logistical problems (mostly rain) or the discontinuity of the project in the 2015-2016 period.  For each site, we calculated influence functions for two vertical profiles per month, filling vertical profile gaps simulating flights using the same input parameters, such as altitude, site, and hour to be used in the HYSPLIT model (Table 1). Study areas have gaps in scheduled flights either due to logistical problems (mostly rain) or the discontinuity of the project in the 2015-2016 period.

The HYSPLIT Model
Our back-trajectories are simulated using the Hybrid Single-Particle Lagrangian Integrated Trajectory Model (HYSPLIT, NOAA-ARL) downloaded package version 4, with meteorological input data from the Global Data Assimilation System (GDAS, 1 • × 1 • , 3 h. resolution) [21,34]. HYSPLIT is one of the most extensively used atmospheric transport and dispersion models in the atmospheric sciences community. Many research studies have used HYSPLIT trajectories to determine possible source regions of contributing measurements of selected pollutants or to determine air masses that may have affected a location under study [34,35]. For example, the HYSPLIT model was recently used to calculate the backward trajectory and geographic data analysis for the ATTO site in the central Amazon Basin [36]. In addition, this model is used operationally by NOAA and other federal agencies to run complex atmospheric transport simulations [37].
The model calculation method is a hybrid between the Lagrangian approach, which uses a moving frame of reference for the advection and diffusion calculations as the trajectories or air parcels move from their initial location, and the Eulerian method, which uses a fixed three-dimensional grid as a frame of reference to compute pollutant air mole fractions [34]. In HYSPLIT, there is no convective parametrization; the mixing of the particles occurs in the vertical motion field from meteorological data input [34].
Back-trajectories start from the day, time, and altitude of the collection point (in m asl) from the central point of each overflight geographic location as stated in Figure 2, and spatially locate points every hour for 13 days prior to the collection time (Figure 4), for all flights performed over the study sites. The latitude and longitude of each back-trajectory for a vertical profile are the same because the vertical profile, to within a few km, is centered above a single point on the surface. Each trajectory has approximately 312 location points (13 d × 24 h) with the position of the air masses (latitude, longitude, and altitude) relative to their origin at every hour (Figure 4), which results in between 12 and 17 individual back-trajectories per flight, stored as ASCII files. Despite collecting air samples up to 4420 m, the regions of influence are based only on back-trajectories between 300 and 3500 m asl because trajectories either starting or traveling through this altitude range are the ones most likely to be sensitive to surface fluxes. Up to 3500 m asl, surface fluxes mix into the planetary boundary layer (PBL), which is typically around 1300 m asl, and some mixing occurs above these altitudes due to processes including dry and moist convection and plume rise associated with biomass burning. The cutoff level of 3500 m is supported by two observations: first, biomass burning plumes rarely exceed 3500 m asl ( Figure S1); and second, mole fractions of CO 2 and other gases observed above 3500 m asl are very similar to gas mole fractions from measurements in the Tropical Atlantic marine boundary layer, which indicates minimal Amazonian surface influence [13]. Additionally, as we report in the discussion below, changing the upper altitude limit from 3500 to 1300 m has a minimal impact on our results. . The point on the trajectories represents the crossing point between two segments of a virtual limit, the first one is a latitude limit, from the equator southwards at 30° W, and the second segment is a line between NOAA stations (Barbados-RPB; Ascension-ASC; and Cape Point-CPT). This virtual limit is used to estimate GHG background concentrations for Amazon sites (see details at Domingues [38]).

Region of Influence
By definition, we consider the region of influence to be those areas covered by the set of backtrajectories by each profile and altitude integrated over a period. Back-trajectory models such as HYSPLIT can have high uncertainties in predicting where particles are coming from accurately [16,17], as we can note in Figure 4 with a spurious air trajectory at 900 m at the TAB site coming from the Northern Hemisphere. This deviation is due to several parametrizations for vertical and horizontal mixing in the atmosphere, including a convective boundary layer, circulation models, height integration models, and air temperature, humidity, and pressure [16]. Therefore, a reasonable method to reduce the impact of such uncertainties, as well as smaller random errors, is to consider an ensemble of back-trajectories over a period (like three months or a year). We noted several differences between the approach presented here and the estimation of regions of influence previously used by Gatti et al. [12]. In the case of Gatti et al. [12], the region of influence was determined from the northernmost and southernmost trajectory boundaries for a given site (over a two-year period), and weighting within that region was not considered. In contrast, here, regions of influence are calculated on a quarterly basis and also contain spatial information on the relative influence of the surface with each region of influence. Finally, in the present study, calculation of back-trajectories was performed using meteorology with a resolution of 0.5° × 0.5° instead of 1° × 1°.
The trajectory density, di, is determined as the sum of the number of back-trajectory points (at a one-hour frequency) within each one-degree cell. We binned the trajectories into quarters instead of years in order to understand the seasonal patterns of atmospheric circulation influencing air samples collected at each site. The density of trajectories was calculated by grouping the back-trajectories in the first quarter, second quarter, and so on by year and site. All these processing steps were performed in the R software environment for statistical computing and graphics [39].
Using IDL ® programming (IDL; ITTVis Inc.), the South American map of trajectory densities was positioned in a global one-degree grid (360° × 180°) and land grid cells were identified (and ocean grid cells not considered) using a South America grid of 1° × 1° resolution. The weighted density of trajectories (wi) at each grid cell ( ) is given by the ratio of trajectory densities in each land grid cell (d ) to the sum of all trajectory points over South America as: . The point on the trajectories represents the crossing point between two segments of a virtual limit, the first one is a latitude limit, from the equator southwards at 30 • W, and the second segment is a line between NOAA stations (Barbados-RPB; Ascension-ASC; and Cape Point-CPT). This virtual limit is used to estimate GHG background concentrations for Amazon sites (see details at Domingues [38]).

Region of Influence
By definition, we consider the region of influence to be those areas covered by the set of back-trajectories by each profile and altitude integrated over a period. Back-trajectory models such as HYSPLIT can have high uncertainties in predicting where particles are coming from accurately [16,17], as we can note in Figure 4 with a spurious air trajectory at 900 m at the TAB site coming from the Northern Hemisphere. This deviation is due to several parametrizations for vertical and horizontal mixing in the atmosphere, including a convective boundary layer, circulation models, height integration models, and air temperature, humidity, and pressure [16]. Therefore, a reasonable method to reduce the impact of such uncertainties, as well as smaller random errors, is to consider an ensemble of back-trajectories over a period (like three months or a year). We noted several differences between the approach presented here and the estimation of regions of influence previously used by Gatti et al. [12]. In the case of Gatti et al. [12], the region of influence was determined from the northernmost and southernmost trajectory boundaries for a given site (over a two-year period), and weighting within that region was not considered. In contrast, here, regions of influence are calculated on a quarterly basis and also contain spatial information on the relative influence of the surface with each region of influence. Finally, in the present study, calculation of back-trajectories was performed using meteorology with a resolution of 0.5 The trajectory density, d i , is determined as the sum of the number of back-trajectory points (at a one-hour frequency) within each one-degree cell. We binned the trajectories into quarters instead of years in order to understand the seasonal patterns of atmospheric circulation influencing air samples collected at each site. The density of trajectories was calculated by grouping the back-trajectories in the first quarter, second quarter, and so on by year and site. All these processing steps were performed in the R software environment for statistical computing and graphics [39].
Using IDL ® programming (IDL; ITTVis Inc.), the South American map of trajectory densities was positioned in a global one-degree grid (360 • × 180 • ) and land grid cells were identified (and ocean grid cells not considered) using a South America grid of 1 • × 1 • resolution. The weighted density of trajectories (w i ) at each grid cell (i) is given by the ratio of trajectory densities in each land grid cell (d i ) to the sum of all trajectory points over South America as: where k is the number of land grid cells in South America. Maps of w are the "regions of influence" and are computed for each of 36 quarters (9 years × 4 quarters) at each of the four sites. Figure 5 shows mean quarterly regions of influence for each site.
Atmosphere 2020, 11, x FOR PEER REVIEW 8 of 20 where k is the number of land grid cells in South America. Maps of w are the "regions of influence" and are computed for each of 36 quarters (9 years × 4 quarters) at each of the four sites. Figure 5 shows mean quarterly regions of influence for each site.  The percentage of the weighted points of trajectories within the Amazon, "Amz_perc", is given by the ratio of the sum of the weighted density of trajectories (w i ) placed inside the Amazon grid cells, w i (AMZ) to the sum of w i inside the South America w i (SA) by quarter, site and year, as (2): where l is the number of 1 × 1 grid cells inside the Amazon CARBAM mask (Figure 2). One value of Amz_perc is calculated for each quarter at each site. The region of influence outside the Amazon is defined by 100 − Amz_perc(%).

The Relative Area inside Amazon
We also calculated the unweighted relative area, rel area(%) , for each region of influence in the Amazon CARBAM mask, given by the ratio of the sum of grid cells with at least one trajectory point inside the Amazon and the sum of grid cells with at least one trajectory point outside Amazon, defined as the relative area (3): where (Ω i ) has a value of 1 if there is at least one trajectory point in the grid cell; otherwise, it is 0. Finally, the "weighted area" inside the Amazon is given by the product of the fractional weighted density of trajectories (Amz_perc as a fraction) and the sum of the pixel area divided by the area of the Amazon CARBAM mask (7,256,362 km 2 ) (4): where weighted area is the weighted area inside the Amazon considering the region of influence and its relative area. N is the number of pixels in the Amazon with at least one trajectory crossing for each quarter, site, and year. Apixel is the area of a one-degree pixel in km 2 (approximately 111 × 111 km 2 near the equator), and AreaAmz is the area of the Amazon mask. The unweighted area is given by the sum of pixel areas in the region of influence with at least one density trajectory point, i.e., (Apixel_i).

Quarterly Patterns of the Regions of Influence
The regions of influence follow the pattern of atmospheric circulation at each study site. Figure 5 shows the average of the region of influence by quarters given by the density of trajectories through the 2010-2018 period. The highest density of trajectories is found nearby the place where the atmospheric vertical profiles are collected (reddish cells) and spread around it, generally towards the northeast (yellowish cells).
From Figure 5, we can highlight that regions of influence are seasonally variable at the vertical profiling sites. The first and fourth quarters (generally, the rainy season) have a well-defined behavior in the eastern Amazon, whose trajectories, in some parts, originate in the Northern Hemisphere to the northeast. In the second and third quarters (dry period), the air masses that enter the continent are from the Southern Hemisphere exclusively.
Typical circulation in the Amazon involves the air masses coming from the Atlantic Ocean along the northeast coast, crossing the Amazon, and heading towards the Andes. During the periods from October to December (fourth quarter) and January to March (first quarter), the Intertropical Convergence Zone (ITCZ) is south of the equator bringing air masses from the Northern Hemisphere, shaping the region of influence northward ( Figure 5). For the remainder of the year, the ITCZ returns to the north of the equator and air masses come from the Southern Hemisphere, exclusively [40].
The ITCZ oscillation has direct importance on annual climatological and hydroclimatological variability across the Amazon where the predominant air masses direction varies from the north to the south and from the east to the west. Understanding variations on regional patterns of atmospheric circulation is a critical step to provide knowledge on the origins of the GHG fluxes. As shown in Marengo et al. [41], rainfall in the southern Amazon peaks in January-February (ALF and RBA) during the austral summer, while in the northern Amazon, it peaks in March-April (TAB/TEF and SAN) ( Figure 5). The eastern Amazon (ALF and SAN) has a longer dry season than the western Amazon (TAB/TEF and RBA), although both areas receive a similar amount of annual rainfall [42].
On the other hand, the long dry season observed in ALF and SAN is governed by the South American Monsoon System, which is subject to oscillations due to the temperature of the tropical Atlantic ocean [43]. The interannual variability of the hydroclimatological system is strongly related to the El Niño/South Oscillation (ENSO). More generally, the sea surface temperature (SST) of the tropical Atlantic and Pacific control the variability of precipitation in the Amazon, and the anomalies of the southwest Atlantic SST influence the variability of the South Atlantic Convergence Zone (SACZ) centered at 10 • S [44].

Contribution of Regions of Influence inside Amazonia
Analogous to the hydrological cycle, air masses carry gases and particles from one place to another, a process that varies by season. In the first and fourth quarters, during the rainy season, the most significant contribution of the region of influence occurs in the areas located to the north, in the region known as the Guiana Shield, in the Brazilian states of Amapá and Pará, and a small portion from the NE, from the Caatinga and Atlantic Forest biomes ( Figure 6). The Amazonian region of influence (Amz_perc, Equation (2)) for the combined period of the first and fourth quarters is 83% (sd = 8.3%) and 93% (sd = 3.3%) for the eastern and western Amazon, respectively ( Figure 6).
Atmosphere 2020, 11, x FOR PEER REVIEW 10 of 20 SAN) ( Figure 5). The eastern Amazon (ALF and SAN) has a longer dry season than the western Amazon (TAB/TEF and RBA), although both areas receive a similar amount of annual rainfall [42].
On the other hand, the long dry season observed in ALF and SAN is governed by the South American Monsoon System, which is subject to oscillations due to the temperature of the tropical Atlantic ocean [43]. The interannual variability of the hydroclimatological system is strongly related to the El Niño/South Oscillation (ENSO). More generally, the sea surface temperature (SST) of the tropical Atlantic and Pacific control the variability of precipitation in the Amazon, and the anomalies of the southwest Atlantic SST influence the variability of the South Atlantic Convergence Zone (SACZ) centered at 10° S [44].

Contribution of Regions of Influence inside Amazonia
Analogous to the hydrological cycle, air masses carry gases and particles from one place to another, a process that varies by season. In the first and fourth quarters, during the rainy season, the most significant contribution of the region of influence occurs in the areas located to the north, in the region known as the Guiana Shield, in the Brazilian states of Amapá and Pará, and a small portion from the NE, from the Caatinga and Atlantic Forest biomes ( Figure 6). The Amazonian region of influence (Amz_perc, Equation (2)) for the combined period of the first and fourth quarters is 83% (sd = 8.3%) and 93% (sd = 3.3%) for the eastern and western Amazon, respectively ( Figure 6). During the second and third quarters, which culminate with the dry season, there is a higher percentage of the region of influence from continental areas outside the Amazon in the eastern During the second and third quarters, which culminate with the dry season, there is a higher percentage of the region of influence from continental areas outside the Amazon in the eastern Amazon than in the western Amazon ( Figure 6). The region of influence encompasses Amz_perc = 57% (sd = 8.8%) of the Amazon biome in the eastern Amazon and Amz_perc = 75% (6.5%) in the western Amazon during the second and third quarters ( Figure 6).
There is also a high variability in the region of influence by quarter and site (Figure 7). For instance, the region of influence inside the Amazon at the ALF site is 49% (sd = 8.5%) during the second quarter (AMJ) (see Figure 5). On the other hand, TAB/TEF has the largest region of influence inside Amazon. During the first and fourth quarters (ONDJFM), all sites exhibit a region of influence inside the Amazon, on average, greater than 80% (sd = 5.2%). However, in the second and third quarters (AMJJAS), the region of influence inside the Amazon varies from 49% (ALF) to 81% (TAB/TEF) (Figure 7).
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 20 Amazon than in the western Amazon ( Figure 6). The region of influence encompasses _ = 57% (sd = 8.8%) of the Amazon biome in the eastern Amazon and _ = 75% (6.5%) in the western Amazon during the second and third quarters ( Figure 6).
There is also a high variability in the region of influence by quarter and site (Figure 7). For instance, the region of influence inside the Amazon at the ALF site is 49% (sd = 8.5%) during the second quarter (AMJ) (see Figure 5). On the other hand, TAB/TEF has the largest region of influence inside Amazon. During the first and fourth quarters (ONDJFM), all sites exhibit a region of influence inside the Amazon, on average, greater than 80% (sd = 5.2%). However, in the second and third quarters (AMJJAS), the region of influence inside the Amazon varies from 49% (ALF) to 81% (TAB/TEF) (Figure 7). Interannual variability is less pronounced than seasonal variability on the weighted region of influence in the Amazon, given by Equation (4) (Figure 8). During the drought years of 2010 and 2015/16, the weighted region of influence had the highest contribution on RBA ℎ = 71-75%, although it was not statistically different from other years (p > 0.99). Due to the smallest extent of the region of influence and by being closest to the ocean where the trade winds are more consistent, the SAN site showed the lowest representativeness of the Amazon and interannual variability ( ℎ = 17%; Standard Error (SE) = 3.8%). The interannual variability of the region of influence is also affected by anomalous exchanges of heating and cooling processes between sea surface temperature and atmosphere that occur at different time scales, such as the ENSO, which has a dominant frequency of oscillation every 3-4 years, on average; the Atlantic Multidecadal Oscillation (AMO), every 60-80 years; and Pacific Decadal Variability (PDV), every 1-20 years [42], which makes it difficult to predict its behavior. Interannual variability is less pronounced than seasonal variability on the weighted region of influence in the Amazon, given by Equation (4) (Figure 8). During the drought years of 2010 and 2015/16, the weighted region of influence had the highest contribution on RBA weighted area = 71-75%, although it was not statistically different from other years (p > 0.99). Due to the smallest extent of the region of influence and by being closest to the ocean where the trade winds are more consistent, the SAN site showed the lowest representativeness of the Amazon and interannual variability (weighted area = 17%; Standard Error (SE) = 3.8%). The interannual variability of the region of influence is also affected by anomalous exchanges of heating and cooling processes between sea surface temperature and atmosphere that occur at different time scales, such as the ENSO, which has a dominant frequency of oscillation every 3-4 years, on average; the Atlantic Multidecadal Oscillation (AMO), every 60-80 years; and Pacific Decadal Variability (PDV), every 1-20 years [42], which makes it difficult to predict its behavior. Atmosphere 2020, 11, x FOR PEER REVIEW 12 of 20 Figure 8. The representative area of each region of influence in the Amazon (7,256,362 km 2 ) by site and year, following Equation (4). The error bar is the confidence interval from the mean and standard error (SE) at 95%. = ̅ ± √ 95% . Table 2 shows the western Amazon (RBA and TAB/TEF sites) is most representative, with 42-52.2% of the relative area inside the Amazon, respectively. In the eastern Amazon, it varies, on average, from 10.9% (SAN) to 25.3% (ALF). There are also seasonal and interannual differences in the representativeness of the region of influence. For instance, ALF has the highest intra-annual coefficient of variation of quarterly unweighted area (CV = 100 ( ̅ ⁄ ) = 58%) with a minimum area of 617,382 km 2 in the third quarter of 2012 and a maximum of 4,704,457 km 2 in the first quarter of 2018, whereas TAB/TEF and RBA have the lowest variation of the quarterly region of influence (CV = 24%) ( Table 2). Additionally, SAN has a quarterly CV = 32%. RBA has the greatest fraction of the Amazon among all the sites, including 79.8% of the Amazon in the fourth quarter of 2016. Furthermore, its smallest quarter still represents 26.0% of the Amazon in the third quarter of 2018 (Table 2). SAN is sensitive to the smallest fraction of Amazonia. On average, its highest proportion represents only 20.9% of the Amazon (first quarter of 2011) with a minimum of 7% in the fourth quarter of 2010 (Table 2).

Representativeness of the Region of Influence in the Amazon
Interannual variability is less pronounced than seasonal variability due to several reasons, such as air mass circulation patterns, displacement of the ITCZ by Atlantic SSTs, the influence of ENSO, and other variations [45,46]. ALF also has the highest interannual variability of unweighted area, with a minimum of 1,176,114 km 2 (2014: 16.2% of Amazon) and a maximum of 2,420,141 km 2 (2017: 33.4% of Amazon). RBA has the lowest interannual variation throughout the years (CV = 11%) followed by SAN (CV = 16%) and TAB/TEF (CV = 18%).  Table 2 shows the western Amazon (RBA and TAB/TEF sites) is most representative, with 42-52.2% of the relative area inside the Amazon, respectively. In the eastern Amazon, it varies, on average, from 10.9% (SAN) to 25.3% (ALF). There are also seasonal and interannual differences in the representativeness of the region of influence. For instance, ALF has the highest intra-annual coefficient of variation of quarterly unweighted area (CV = 100 (σ/x) = 58%) with a minimum area of 617,382 km 2 in the third quarter of 2012 and a maximum of 4,704,457 km 2 in the first quarter of 2018, whereas TAB/TEF and RBA have the lowest variation of the quarterly region of influence (CV = 24%) ( Table 2). Additionally, SAN has a quarterly CV = 32%. RBA has the greatest fraction of the Amazon among all the sites, including 79.8% of the Amazon in the fourth quarter of 2016. Furthermore, its smallest quarter still represents 26.0% of the Amazon in the third quarter of 2018 (Table 2). SAN is sensitive to the smallest fraction of Amazonia. On average, its highest proportion represents only 20.9% of the Amazon (first quarter of 2011) with a minimum of 7% in the fourth quarter of 2010 (Table 2).

Representativeness of the Region of Influence in the Amazon
Interannual variability is less pronounced than seasonal variability due to several reasons, such as air mass circulation patterns, displacement of the ITCZ by Atlantic SSTs, the influence of ENSO, and other variations [45,46]. ALF also has the highest interannual variability of unweighted area, with a minimum of 1,176,114 km 2 (2014: 16.2% of Amazon) and a maximum of 2,420,141 km 2 (2017: 33.4% of Amazon). RBA has the lowest interannual variation throughout the years (CV = 11%) followed by SAN (CV = 16%) and TAB/TEF (CV = 18%). Table 2. Unweighted area of the region of influence in km 2 by quarter and year and its relative area in the Amazon biome (7,256,362 km 2  The region of influence is dependent on the trajectory density in each grid cell, as well as the extent of its area, which varies by site, quarter, and, somewhat less, by year. There is a trade-off between spatiotemporal aggregation and loss of temporal resolution. For instance, back-trajectories were binned quarterly but not by month or at each sample flask and altitude, as a typical "footprint" [21,36]. Lagrangian methods have an uncertainty of 15-30% to provide the correct location of the trajectory travel distance, meaning we only know a preferable distribution, not a specific place that an air parcel is coming from [47]. The underlying uncertainty of a particular region of influence is minimized on spatiotemporal aggregation by reducing the random errors from individual back-trajectories.
Time-span is another source of uncertainty in the region of influence method. Excluding one year of observational data, 2018, the average unweighted area decreases by only a maximum of 2%, regardless of the study site. This reduction in uncertainty is beneficial in cases where long-term average regions of influence are needed to determine the weighted average of upwind temporally static drivers, such as historical deforestation.
A more significant source of uncertainty of the Lagrangian model for a top-down estimate of GHG fluxes, however, is the planetary boundary layer (PBL) height, which has been estimated to result in 20% of estimated fluxes in an earlier study [48]. In Amazonia, the PBL height is relatively constant, reaching up to 1500 m asl, although vertical mixing of gases is influenced by deep moist convection-intense diurnal solar heating, shallow dry convection (i.e., shallow cumulus), and PBL height [49]. Indeed, the contribution of the trajectories at lower altitudes has a shorter range, meaning that they encompass a more significant weight inside Amazon, although this difference is relatively small ( Figure 9A). Atmospheric aircraft vertical profiles sampling the column from 300 to 3500 m asl are sensitive to other GHG fluxes above the PBL, such as wildfire plumes. For instance, back-trajectories binned above the PBL (i.e., below 3500 m asl and above 1300 m asl) represent, on average, a region of influence 8% higher inside the Amazon than below its height ( Figure 9B). On the other hand, the lower altitude reproduces a significant difference in the area covered by the region of influence due to the lowest number of back-trajectories computed, meaning that their relative area is condensed (less sparse) than up to 3500 m asl ( Figure 9B). PBL heights HYSPLIT and GDAS will impact the regions of influence to some extent because they will influence the vertical profiles of wind speed and direction. However, the agreement between FLEXPART and HYSPLIT, which uses similar wind fields but different PBL schemes, suggests that the details of the PBL height may not be crucial for this method, which does not consider residence time of an air parcel in the PBL like a traditional footprint.

Limitations of the Method
As mentioned above, there are strong similarities between traditional "footprints" and our much simpler "regions of influence". Using Equations (2) and (3), we compared the footprints used by Alden et al. [5] and our regions of influence for the years 2010-2012 for the same vertical profiles, although we recognize that both datasets do not have the same units. Figure S2 provides an example of comparisons between regions of influence and binned footprints, showing their similarity. To analyze the difference more thoroughly and quantitatively, footprints for each site, which were originally calculated for 12 or 17 levels per vertical profile, were totaled for the quarter, and rescaled to 1 • spatial resolution (the original was 0.5 • ). We did not observe statistically significant differences in either Amz_perc ( Figure 10A, Equation (2)) or rel area ( Figure 10B, Equation (3)). This result gives us confidence that our simple back-trajectory density approach yields robust results for the relative spatial sensitivity of vertical profiles.  (2)) and (B). Relative area inside the Amazon (Equation (4)). Error bars are the standard error at a 95% significance level.  (2)) and (B). Relative area inside the Amazon (Equation (4)). Error bars are the standard error at a 95% significance level. originally calculated for 12 or 17 levels per vertical profile, were totaled for the quarter, and rescaled to 1° spatial resolution (the original was 0.5°). We did not observe statistically significant differences in either Amz_perc ( Figure 10A, Equation (2)) or relarea ( Figure 10B, Equation (3)). This result gives us confidence that our simple back-trajectory density approach yields robust results for the relative spatial sensitivity of vertical profiles.

What Are the Implications for Generalizations of GHG Fluxes at Regional Scales?
This study highlights the importance of defining the region of influence to understand the representativeness of Amazonian vertical profile air samples. Under our Amazon boundaries of more than seven million square kilometers, the regions of influence of the western sites are representative of almost 50% of the Amazon area. For eastern sites, the region of influence is less than 25% of the Amazon CARBAM mask ( Table 2), so that extrapolation for the whole Amazon may result in inaccurate GHG fluxes. It has critical implications on representations of models across the Amazon, due to the mix of GHG flux contributions of savanna-dominated areas southward [36] and possibly a reason for differences in CO 2 source-sink estimation between bottom-up and top-down approaches [14,50].

Conclusions
Here, we present an approach to determine the region of influence of atmospheric aircraft vertical profiles to resolve the area of contribution spatially at regional scales in the Amazon using binned back-trajectories through a Lagrangian transport model. Using this method, we compute the region of influence and its representativeness (relative area) inside the Amazon at four distributed study sites in the Brazilian Amazon.
Air mass circulation patterns show significant seasonality by quarter, which is more pronounced than interannual variability. Large-scale forcing, primarily from the movement of the ITCZ, result in two main patterns: (a) the first and fourth quarters receive contributions from Northern Hemisphere air masses and the equatorial zone; and (b) the second and third quarters receive air masses exclusively with origins in the South Hemisphere. All quarters receive an important contribution from South American landmasses outside the Amazon (during the dry season), around 25-43%. The first and fourth quarters (wet season) show less influence from outside Amazonia (7-17%). Furthermore, due to the general atmospheric circulation and the proximity of the Atlantic coast, seasonal variability is more significant in the eastern Amazon (SAN and ALF sites) than in the western Amazon (RBA and TAB/TEF sites). The regions of influence in the western Amazon (TAB/TEF and RBA) represent, on average, 42-52% of the Amazon area, whereas in the eastern Amazon (SAN and ALF), they represent just 11-25% of the Amazon area, on average. This research will support the ongoing objectives of the CARBAM project, which aims to resolve GHG fluxes in Amazonia for the Basin as a whole and spatially. The patterns of back-trajectories and the resultant regions of influence help us to understand the significant patterns of atmospheric circulation, providing insight into the connections between the measurements made at vertical profiling sites and the upwind land surface. In addition, the regions of influence can help define the average value of environmental drivers such as land use change, temperature, and precipitation that potentially contribute to GHG fluxes determined from vertical profile sites.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4433/11/10/1073/s1, Table S1. Schedule flight plan by site and altitude. V2 and V3 are the system versions. Study sites are: SAN, RBA, ALF, and TAB/TEF. Figure S1. Distribution of the peak of fire plume in meters above sea level (m asl) at each study site. Figure S2. Comparison of the spatial distribution of the regions of influence provided by the density of trajectories from the method described in the text (left) and the surface sensitivity provided by FLEXPART footprints (right). Both datasets are on their original scales. In the main text Equations (1)-(3) were applied to both of them in order to compare them.