Source–Receptor Relationships and Cluster Analysis of CO2, CH4, and CO Concentrations in West Africa: The Case of Lamto in Côte d’Ivoire

: The contribution in terms of long-range transport of CO 2 , CH 4 , and CO concentrations to measurements at Lamto (5 ◦ 02 (cid:48) W–6 ◦ 13 (cid:48) N) was analyzed for the 2014–2017 period using the FLEXPART model that calculates the retro-plumes of air masses arriving at the station. The identiﬁcation of the source-receptor relationships was also studied with a clustering technique applied on those retro-plumes. This clustering technique enabled us to distinguish four categories of air mass transports arriving at Lamto site described as follows: oceanic and maritime origin ( ≈ 37% of the retro-plumes), continental origin ( ≈ 21%), and two hybrid clusters ( ≈ 42%). The results show that continental emission sources contribute signiﬁcantly to the increases in concentrations of CO 2 , CH 4 , and CO and explain ≈ 40% of their variance. These emission sources are predominantly from north and north-east directions of the measurement point, and where densely populated and economically developed areas are located. In addition, the transport of air masses from these directions lead to the accumulation of CO 2 , CH 4 , and CO. Furthermore, the ratios ∆ CO / ∆ CH 4 and ∆ CO / ∆ CO 2 observed in the groups associated with Harmattan ﬂows clearly show an inﬂuence of combustion processes on the continent. Thus, the grouping based on FLEXPART footprints shows an advantage compared to the use of simple trajectories for analyzing source–receptor relationships. D.T.T., F.Y., M.R., and A.D.; methodology, D.T.T., F.Y., M.R., and J.-D.P.; software, D.T.T., I.P., A.B., and A.R.; validation, F.Y. and J.-D.P.; formal analysis, D.T.T.; investigation, D.T.T. and M.R.; data curation, M.R.; writing—original draft preparation, D.T.T.; writing—review and editing, F.Y., M.R., I.P., A.B., A.R., J.-D.P., and A.D.; supervision, A.D. and M.R.


Introduction
CO 2 and CH 4 are the main anthropogenic greenhouse gases (GHGs) well-known for enhancing radiative forcing [1]. In turn, this radiative forcing causes climate change, leading to changes in the energy distribution in the Earth's closed system and ultimately extreme climatic events [2]. For example, frequent temperature peaks, droughts, heat waves, and important floods are recorded every year in various regions of the world [1]. Like these regions, West Africa is very sensitive to ongoing climate change. For example, it has seen a sharp decline of the order of 30% to 60% in the average annual flow of major rivers, and a significant decrease in rainfall over the Sahel since the late 1970s [3]. CO 2 and CH 4 are long lived greenhouse gases and are therefore well mixed in the atmosphere. Improving our The SRR obtained from the above methods does not take into account atmospheric turbulence and convection [30], which could have an effect on air mass dispersion and transport, unlike the clustering technique based on LPDM footprints used here.
The LPDMs have extended the capabilities of SRR studies using backward transport plumes with multiple trajectories of individual particles to capture uncertainties in transport modelling [12,31]. In addition, the evaluation of emission characteristics in source regions, although preferentially carried out using species emission factors [32], can also be obtained using concentration ratios [33,34]. Emission factors present the advantage of facilitating the calculation of emission flows, the direct measurement of which is considered long or complex. In contrast, calculation with concentration ratios should take into account the fact that they combine several signatures, such as those of fires, plant respiration and background trends, which could generate biases.
This study focused on the identification of the source-receptor relationships from the observed series of CO 2 , CH 4 , and CO concentrations using a Lagrangian dispersion model and cluster analysis. The present work was conducted over the period from 2014 to 2017. Section 2 is dedicated to the description of the study area, the material, and the clustering method used. Sections 3 and 4 present the results and discussions.

Site Description
The region of Lamto (5 • 02 W-6 • 13 N) is located in the center of Côte d'Ivoire ( Figure 1) on an area of about 2700 ha in a mosaic of Guinean forest-savanna. Its climate is of the subhumid type in the Sudano-Guinean transition zone [21]. The rainfall regime is characterized by the influence of the monsoon in the south and the Harmattan in the north [22], creating an intertropical convergence zone called the ITCZ. The south-north and north-south movements of this ITCZ define the climatic seasons during the year. In addition, the mean annual rainfall is about 1200 mm [21,35] spread over four seasons ( Figure 2) including a main dry season from December to February, a main wet season from March to July, a short dry season in August, and a short wet season from September to November. The local agricultural practices are associated with bush fires in the middle of the long dry season (i.e., mid-season fire).
Atmosphere 2020, 11, x FOR PEER REVIEW 3 of 23 atmospheric turbulence and convection [30], which could have an effect on air mass dispersion and transport, unlike the clustering technique based on LPDM footprints used here. The LPDMs have extended the capabilities of SRR studies using backward transport plumes with multiple trajectories of individual particles to capture uncertainties in transport modelling [12,31]. In addition, the evaluation of emission characteristics in source regions, although preferentially carried out using species emission factors [32], can also be obtained using concentration ratios [33,34]. Emission factors present the advantage of facilitating the calculation of emission flows, the direct measurement of which is considered long or complex. In contrast, calculation with concentration ratios should take into account the fact that they combine several signatures, such as those of fires, plant respiration and background trends, which could generate biases.
This study focused on the identification of the source-receptor relationships from the observed series of CO2, CH4, and CO concentrations using a Lagrangian dispersion model and cluster analysis. The present work was conducted over the period from 2014 to 2017. Section 2 is dedicated to the description of the study area, the material, and the clustering method used. Sections 3 and 4 present the results and discussions.

Site Description
The region of Lamto (5°02′ W-6°13′ N) is located in the center of Côte d'Ivoire ( Figure 1) on an area of about 2700 ha in a mosaic of Guinean forest-savanna. Its climate is of the subhumid type in the Sudano-Guinean transition zone [21]. The rainfall regime is characterized by the influence of the monsoon in the south and the Harmattan in the north [22], creating an intertropical convergence zone called the ITCZ. The south-north and north-south movements of this ITCZ define the climatic seasons during the year. In addition, the mean annual rainfall is about 1200 mm [21,35] spread over four seasons ( Figure 2) including a main dry season from December to February, a main wet season from March to July, a short dry season in August, and a short wet season from September to November. The local agricultural practices are associated with bush fires in the middle of the long dry season (i.e., mid-season fire).

Measurement of CO2, CH4, and CO
The CO2, CH4, and CO continuous measurement data of LTO are times series from August 2008 to May 2018 for both CO2 and CH4, and from March 2014 to May 2018 for CO concentrations. Here we focus only on the 2014-2017 period where measurements of CH4, CO2, and CO are available. Continuous measurements were made from CRDS (cavity ring-down spectroscopy) analyzers with model G2401 (Figure 3b) [37][38][39]. The air analyzed is taken continuously at the top of a 50 m tower ( Figure 3a). The measuring system, data processing, and calibration strategy are explained by Tiemoko et al. [20]. CO2, CH4, and CO measurement data presented here were calibrated using gases measured by the Laboratoire des Sciences du Climat et de l'Environnement (LSCE/IPSL) in Gif-sur-Yvette, France and are traceable to World Meteorological Organization (WMO) scales (CO2: WMO X2007; CH4: WMO X2004A; CO: WMO X2014A) [6]. The quality control process (regular measurement of a target gas) indicates precisions below 0.1 ppm, 0.5 ppb, and 16 ppb for CO2, CH4, and CO measurements respectively (see [20]). In addition, the species CO2, CH4, and CO have been the subject of many studies [40,41] to establish the emission maps. As an illustration, the emission maps based on GFEDS data are shown in Figure 4. The months of January and September selected here correspond to the fire regimes in the equatorial and southern part of Africa, respectively. It has been observed that during these periods of the year significant amounts of carbon are emitted by fires. Fires are the main source of carbon emissions in Africa, accounting for 50% of global carbon emissions from fire burning.   Here we focus only on the 2014-2017 period where measurements of CH 4 , CO 2 , and CO are available. Continuous measurements were made from CRDS (cavity ring-down spectroscopy) analyzers with model G2401 (Figure 3b) [37][38][39]. The air analyzed is taken continuously at the top of a 50 m tower ( Figure 3a). The measuring system, data processing, and calibration strategy are explained by Tiemoko et al. [20]. CO 2 , CH 4 , and CO measurement data presented here were calibrated using gases measured by the Laboratoire des Sciences du Climat et de l'Environnement (LSCE/IPSL) in Gif-sur-Yvette, France and are traceable to World Meteorological Organization (WMO) scales (CO 2 : WMO X2007; CH 4 : WMO X2004A; CO: WMO X2014A) [6]. The quality control process (regular measurement of a target gas) indicates precisions below 0.1 ppm, 0.5 ppb, and 16 ppb for CO 2 , CH 4 , and CO measurements respectively (see [20]). In addition, the species CO 2 , CH 4 , and CO have been the subject of many studies [40,41] to establish the emission maps. As an illustration, the emission maps based on GFEDS data are shown in Figure 4. The months of January and September selected here correspond to the fire regimes in the equatorial and southern part of Africa, respectively. It has been observed that during these periods of the year significant amounts of carbon are emitted by fires. Fires are the main source of carbon emissions in Africa, accounting for 50% of global carbon emissions from fire burning.  (Figure 3b) [37][38][39]. The air analyzed is taken continuously at the top of a 50 m tower ( Figure 3a). The measuring system, data processing, and calibration strategy are explained by Tiemoko et al. [20]. CO2, CH4, and CO measurement data presented here were calibrated using gases measured by the Laboratoire des Sciences du Climat et de l'Environnement (LSCE/IPSL) in Gif-sur-Yvette, France and are traceable to World Meteorological Organization (WMO) scales (CO2: WMO X2007; CH4: WMO X2004A; CO: WMO X2014A) [6]. The quality control process (regular measurement of a target gas) indicates precisions below 0.1 ppm, 0.5 ppb, and 16 ppb for CO2, CH4, and CO measurements respectively (see [20]). In addition, the species CO2, CH4, and CO have been the subject of many studies [40,41] to establish the emission maps. As an illustration, the emission maps based on GFEDS data are shown in Figure 4. The months of January and September selected here correspond to the fire regimes in the equatorial and southern part of Africa, respectively. It has been observed that during these periods of the year significant amounts of carbon are emitted by fires. Fires are the main source of carbon emissions in Africa, accounting for 50% of global carbon emissions from fire burning.

FLEXPART Model
The Lagrangian Particle Dispersion Model (LPDM) FLEXPART version 9.0 used in this study is driven by ECMWF (European Centre for Medium-Range Weather Forecasts) wind fields with 1° × 1° horizontal resolution and 3 h time steps [30,42]. In order to analyze the atmospheric transport pathways from the potential source regions to the receptor position, and also to identify the different source-receptor relationships, the FLEXPART model was run in "Backward" mode [43,44]. The inverse simulation releases 1000 particles once every 24 h (at 12 local time) over the 2014-2017 period from the LTO sampling inlet position; the particles are followed ten days backward in time. Prior positions and residence times of these particles near the surface compose the potential emission sensitivity (PES), in the form of a spatialized map (see e.g., [43,45]). PES stored on a 3D grid is an indicator of where and when the air mass composition has probably been modified by surface emissions. It is a response function of the influence of emissions on concentrations at the receptor location through atmospheric transport. In this study, we consider that an air parcel can be affected by surface emissions when it is below 2000 m agl. Our threshold of 2000 m can be compared to (1) the maximum daytime atmospheric boundary layer (ABL) heights estimated at 1600 m in West Africa following the studies of Aryee et al. [46] and (2) the daytime monsoon and Harmattan layer depth estimated at 1900 m following the studies of Kalthoff et al. [47] over the region (West Africa). Our altitude threshold for sensitivity to surface emissions should entail most situations of well mixed ABL, and potential sources having significant injection heights such as biomass burning pyroconvection. We performed a sensitivity test with two other selected thresholds to ensure that our choice did not introduce a significant bias. The results of the distribution of PES with these two values are overall similar and are shown in Figures S1 and S2.

FLEXPART Model
The Lagrangian Particle Dispersion Model (LPDM) FLEXPART version 9.0 used in this study is driven by ECMWF (European Centre for Medium-Range Weather Forecasts) wind fields with 1 • × 1 • horizontal resolution and 3 h time steps [30,42]. In order to analyze the atmospheric transport pathways from the potential source regions to the receptor position, and also to identify the different source-receptor relationships, the FLEXPART model was run in "Backward" mode [43,44]. The inverse simulation releases 1000 particles once every 24 h (at 12 local time) over the 2014-2017 period from the LTO sampling inlet position; the particles are followed ten days backward in time. Prior positions and residence times of these particles near the surface compose the potential emission sensitivity (PES), in the form of a spatialized map (see e.g., [43,45]). PES stored on a 3D grid is an indicator of where and when the air mass composition has probably been modified by surface emissions. It is a response function of the influence of emissions on concentrations at the receptor location through atmospheric transport. In this study, we consider that an air parcel can be affected by surface emissions when it is below 2000 m agl. Our threshold of 2000 m can be compared to (1) the maximum daytime atmospheric boundary layer (ABL) heights estimated at 1600 m in West Africa following the studies of Aryee et al. [46] and (2) the daytime monsoon and Harmattan layer depth estimated at 1900 m following the studies of Kalthoff et al. [47] over the region (West Africa). Our altitude threshold for sensitivity to surface emissions should entail most situations of well mixed ABL, and potential sources having significant injection heights such as biomass burning pyroconvection. We performed a sensitivity test with two other selected thresholds to ensure that our choice did not introduce a significant bias. The results of the distribution of PES with these two values are overall similar and are shown in Figures S1 and S2. Figure 5 shows the time series of atmospheric concentrations of CO 2 , CH 4 , and CO measured at LTO from 2014 to 2017. The CO 2 and CH 4 concentrations show an increasing trend with pronounced seasonal variations. Over the 2014-2017 period, annual means of CO 2 and CH 4 concentrations increased by a factor of 1.023 and 1.021 at LTO station respectively. These coefficients lead to growth rates of about 2.3 ppm.year −1 for CO 2 and 9.4 ppb.year −1 for CH 4 , which are comparable to the global trends estimated at 2.5 ppm year −1 for CO 2 and 9.7 ppb year −1 for CH 4 over the same period, based on National Oceanic and Atmospheric Administration observing stations (www.esrl.noaa.gov/gmd/ccgg/ Atmosphere 2020, 11, 903 6 of 23 trends/gl_gr.html; [48]). High values of CO 2 (>450 ppm, Figure 5a), CH 4 (>2100 ppb, Figure 5b), and CO (>500 ppb, Figure 5c) systematically occur during the Great Dry Season (GDS) from November to February.

Time-Series and Background Signals of CO 2 , CH 4 , and CO
Africa, North Africa).
where X is the concentration value in ppm (or ppb). Seasonal variations of the background signals of CO2, CH4, and CO show low values during May to October and high values during December to March. CO2 has a different seasonal cycle lagging by one months after CH4, and CO. The background CO value in June and July (≈117 ppb) is comparable to that obtained by Denjean et al. [49] (≈180 ppb) during the DACCIWA (Dynamics-Aerosol-Chemistry-Clouds Interactions in West Africa) campaign in June and July 2016 in polluted coastal cities (e.g., Abidjan, Accra and Lomé).

Clustering Method
Clustering is a multivariate statistical technique designed to explore a structure within a dataset with unknown prior properties [52]. The technique aims at affecting data to significant classes by The evaluation of the CO 2 , CH 4 , and CO background signals is important to infer the concentration increases due to regional or local emissions, e.g., [45,49,50]. We have defined the background (in blue in Figure 5) as the concentrations measured when CO concentrations [18,34,51] are below their fifth percentile within a 7-day moving-window. To ensure that these two choices did not introduce a significant bias, we calculated background mole fraction levels based on moving-windows of less than 7-days and lower percentiles. The results obtained were similar. Therefore, the percentile choice and the length of moving-window did not affect cluster analysis results. The backgrounds obtained were subtracted from the hourly average concentrations to determine the excess concentrations ∆X (see equation below), which are attributed to regional emissions (Tropical Africa, North Africa).
where X is the concentration value in ppm (or ppb). Seasonal variations of the background signals of CO 2 , CH 4 , and CO show low values during May to October and high values during December to March. CO 2 has a different seasonal cycle lagging by one months after CH 4 , and CO. The background CO value in June and July (≈117 ppb) is comparable to that obtained by Denjean et al. [49] (≈180 ppb) during the DACCIWA (Dynamics-Aerosol-Chemistry-Clouds Interactions in West Africa) campaign in June and July 2016 in polluted coastal cities (e.g., Abidjan, Accra and Lomé).

Clustering Method
Clustering is a multivariate statistical technique designed to explore a structure within a dataset with unknown prior properties [52]. The technique aims at affecting data to significant classes by maximizing similarity in each cluster while maximizing differences between clusters. In this study, the clustering method was applied to classify PES. Although different clustering algorithms exist, we used "k-means" which is a well-known non-hierarchical algorithm. It groups points in N dimensions into a predefined number of clusters (i.e., classes) [53]. This iterative algorithm minimizes the Euclidean distance between the elements to be classified and the cluster centers. At each iteration, the class-K centers change until convergence (centroids stable) is obtained. The k-means algorithm advantage is that it is easy to use, especially from a numerical point of view and because of relatively low calculation's requirements. However, the k-means is less-effective when applied over a large number of dimensions; and its results improve when dimensions are reduced [54]. Here, the number of dimensions is reduced by averaging PES over large regions of interest presented in Figure 6; see, e.g., Paris et al. [12]. These regions are considered a priori as regions with different characteristics to be explored, including species' emission intensity [55] or sink potential. The choices of the numbers of regions, their boundaries and sizes induce prior information and hence possible biases in our results. To reduce biases due to adding residence times over very different region sizes, normalization by the region's area was applied to the time-series of regionally averaged PES. maximizing similarity in each cluster while maximizing differences between clusters. In this study, the clustering method was applied to classify PES. Although different clustering algorithms exist, we used "k-means" which is a well-known non-hierarchical algorithm. It groups points in N dimensions into a predefined number of clusters (i.e., classes) [53]. This iterative algorithm minimizes the Euclidean distance between the elements to be classified and the cluster centers. At each iteration, the class-K centers change until convergence (centroids stable) is obtained. The k-means algorithm advantage is that it is easy to use, especially from a numerical point of view and because of relatively low calculation's requirements. However, the k-means is less-effective when applied over a large number of dimensions; and its results improve when dimensions are reduced [54]. Here, the number of dimensions is reduced by averaging PES over large regions of interest presented in Figure 6; see, e.g., Paris et al. [12]. These regions are considered a priori as regions with different characteristics to be explored, including species' emission intensity [55] or sink potential. The choices of the numbers of regions, their boundaries and sizes induce prior information and hence possible biases in our results. To reduce biases due to adding residence times over very different region sizes, normalization by the region's area was applied to the time-series of regionally averaged PES.
In addition, the determination of the optimal number of cluster K is very important in a clustering analysis. Many methods have been proposed by Kalkstein et al. [56] and Yan [57], among which the "weighted-gaps/elbow criteria" method showed high performance. Indeed, this method deriving from the k-means algorithm itself is automated and its application on voluminous and multi-dimensional datasets is robust [58]. The presented curve in Figure A1 for a series of k groups (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14), of the time series of PES was finally obtained using the "weighted-gaps/elbow criteria" method. The appropriate number of clusters that minimize the weighted deviations between the centroid and each element belonging to the cluster is 4. It should be noted at this point, that the explained variance method and silhouette statistics were attempted to determine the number of clusters. In every case, we found 4 to always be an optimum number of clusters. Anthropogenic CH4 emissions for 2015 from the EDGAR v5.0 database with a spatial algorithm. All main anthropogenic sources, e.g., waste treatment, industrial and agricultural sources, are included. Selected regions were: Temperate Atlantic (here Atlantic_Temp); South Africa (here South_Afri); Atlantic Tropical (here Atlantic_Trop); local; Tropical Africa (here tropical_Afri); North Africa (here North_Afri); Europe-Mediterranean (here Euro_Med). The star (in black) in the local area indicates the sampling location. All main anthropogenic sources, e.g., waste treatment, industrial and agricultural sources, are included. Selected regions were: Temperate Atlantic (here Atlantic_Temp); South Africa (here South_Afri); Atlantic Tropical (here Atlantic_Trop); local; Tropical Africa (here tropical_Afri); North Africa (here North_Afri); Europe-Mediterranean (here Euro_Med). The star (in black) in the local area indicates the sampling location.
In addition, the determination of the optimal number of cluster K is very important in a clustering analysis. Many methods have been proposed by Kalkstein et al. [56] and Yan [57], among which the "weighted-gaps/elbow criteria" method showed high performance. Indeed, this method deriving from the k-means algorithm itself is automated and its application on voluminous and multi-dimensional datasets is robust [58]. The presented curve in Figure A1 for a series of k groups (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14), of the time series of PES was finally obtained using the "weighted-gaps/elbow criteria" method. The appropriate number of clusters that minimize the weighted deviations between the centroid and each element belonging to the cluster is 4. It should be noted at this point, that the explained variance method and silhouette statistics were attempted to determine the number of clusters. In every case, we found 4 to always be an optimum number of clusters.

Correlations between PES over Each Region and CO 2 , CH 4 , and CO
Firstly, we aimed at identifying the main source regions affecting the concentrations observed at LTO. Table 1 presents the statistical relationships between integrated PES over each region separately and the daily means of CO 2 , CH 4 , and CO concentrations. The statistical parameters calculated are the Pearson correlation (r) and Kendall's rate (tau). The simulated PES is calculated once per day, and does therefore not capture the diurnal dynamics of the CO 2 , CH 4 , and CO concentrations measured at LTO Therefore, the use of Kendall's rate to complement the Pearson statistic makes it possible to find rank correlations in smaller signals. We observe a lack of relationship between CO 2 , CH 4 , CO concentrations and averaged PES in the "local" region, defined as the area 300 km around Lamto. This result does not exclude local influences; instead it reflects the challenge of representing near receptor influence with the LPDM and its global driving wind fields [59,60]. For three continental regions (Tropical Africa, North Africa, Europe and the Mediterranean), positive and significant correlations are observable. In addition, the three-variable linear model analysis (see Table A1) shows that these three regions explain 40% (p-value < 2 × 10 −16 ) of CO 2 concentration variance, 74% (p-value < 2 × 10 −16 ) of CH 4 concentration variance, and 66% (p-value < 2 × 10 −16 ) of CO concentration variance. The positive correlation coefficients indicate that the 10-days of cumulative exposure of air masses to continental flows explains at least 40% of the variance of the increase in concentration of CO 2, CH 4 , and CO observed at LTO. However, correlation values are higher (R > 0.50) for the Tropical Africa (i.e., Tropical_Afri) region. This result clearly shows that residence of air masses in the boundary layer over tropical Africa significantly affect the CO 2 , CH 4 , and CO concentrations. Air masses transiting over Europe present ≈14% of the retro-plumes in the cluster B (single retro-plumes not shown), which could also play a significant role in the CO 2 , CH 4 , and CO concentration levels with Pearson correlation coefficients (R > 0.26) and Kendall correlation rates (tau > 0.30). Besides, the correlation values obtained in these continental regions are significantly less in the case of CO 2 . Over the Atlantic zone (i.e., Atlantic Temperate and Atlantic Tropical), CO 2 , CH 4 , and CO mixing ratios are significantly (p < 0.001) anti-correlated with PES in both Atlantic Temperate (R < −0.48, tau < −0.38) and Atlantic Tropical (R < −0.08, tau < −0.15).

PES Clustering Applied to CO 2 , CH 4 , and CO Concentrations
With the clustering of the PESs being independent of the observed concentrations, the statistical separation of the concentrations between the different clusters is interpreted as confirmation of the influences of the source air mass and associated surface flows. Figures 7-9 show, respectively, the average PES for each cluster and box plots of the median and interquartile range of Relative Humidity and CO 2 , CH 4 and CO concentrations associated with each cluster. In addition, Table 2 reports the correlations between CO 2 , CH 4 , and CO atmospheric concentrations and PES within each of the four clusters ( Figure A1). The correlation coefficients between the trace gases and PES are positive and significant for all the clusters, except cluster A, which presents, by contrast, negative and insignificant correlation values. High (R > 0.5) and significant (p-value < 0.001) correlations in clusters B, C, and D indicate that the transport-related factors controlling CO 2 , CH 4 , and CO variabilities could be the same, especially as their trends evolve synchronously.        − Cluster A − Cluster B Figure 9. Average CO2 (a,b), CH4 (c,d), and CO (e,f) concentrations associated with each cluster. Box plots indicate the median and inter-quartile range.

− Cluster C
Cluster C's main area of particle residence is common with cluster A. However, air masses classified in cluster C originate from two privileged directions: south (trajectory of Atlantic origin) similarly to cluster A, and east (trajectory of continental origin), conferring it a "mixed" status. It is specifically related to strong local influences reflected by a high residence time nearby the station (dark red in Figure 7c). Unlike other clusters, it is more sensitive to air masses from all directions. Only 11% of data are associated with this cluster. Air masses have median CO, CO 2 and CH 4 concentrations of 221, 417, and 1879 ppb respectively, lower than the observations in cluster B, C most likely due to the presence of marine air masses. However, these median concentrations are higher than the average (see Section 2.4). This cluster shows an excess above background levels of ≈68 ppb for CO, ≈4 ppm for CO 2 and ≈22 ppb for CH 4 . The values of the correlation coefficient between CO and CH 4 , and between CO and CO 2 are high (R > 0.60) and significant (p-value < 0.001) in this cluster. These strong correlations indicate that the factors controlling emissions and variability of these species could be similar.

− Cluster D
This hybrid cluster represents 31% of data. It is comparable to clusters A and C because it also combines trajectories from both Atlantic Ocean (southern flow) and continent (northeastern flow). Comparatively, cluster D has a higher residence time above the Atlantic Ocean than cluster Atmosphere 2020, 11, 903 13 of 23 Figure 11 shows the seasonal frequency for each of the four clusters. Clusters A and B represent two opposed poles of atmospheric transport patterns. Thus, we observed that these two clusters are much more frequent (at least 50% of the data) from June to September for cluster A, and from November to January for cluster B. The meteorological situations which correspond to the occurrences of these clusters are marked by the presence of the Saharo-Libyan anticyclone (strong activity in December to February) for cluster B and the Saint-Hélène anticyclone (strong activity in April to September) for cluster A. Monsoon flows are frequent from May to September [63,64], consistent with the seasonality of cluster A. Cluster B is sensitive to the advection of air masses due to Harmattan flows [19,65]. On the other hand, cluster C shows peaks (February-March and October) which are observed during changes in rainfall regimes at Lamto (i.e., from dry season to wet season and wet season to dry season). Cluster D is ubiquitous throughout the year with significant peaks observed in May and October, corresponding to the months when the station records significant peaks of precipitation ( Figure 2) [36]. Figure 11 shows the seasonal frequency for each of the four clusters. Clusters A and B represent two opposed poles of atmospheric transport patterns. Thus, we observed that these two clusters are much more frequent (at least 50% of the data) from June to September for cluster A, and from November to January for cluster B. The meteorological situations which correspond to the occurrences of these clusters are marked by the presence of the Saharo-Libyan anticyclone (strong activity in December to February) for cluster B and the Saint-Hélène anticyclone (strong activity in April to September) for cluster A. Monsoon flows are frequent from May to September [63,64], consistent with the seasonality of cluster A. Cluster B is sensitive to the advection of air masses due to Harmattan flows [19,65]. On the other hand, cluster C shows peaks (February-March and October) which are observed during changes in rainfall regimes at Lamto (i.e., from dry season to wet season and wet season to dry season). Cluster D is ubiquitous throughout the year with significant peaks observed in May and October, corresponding to the months when the station records significant peaks of precipitation (Figure 2) [36]. Figure 12 shows annual cycles of excess concentrations of CO2, CH4, and CO, associated with each cluster. Analysis of the variance of seasonal cycles shows significant differences in excess CO2 concentrations amongst clusters for the months of October, December, January, March, April, and May (maximum differences up to 7 ppm). For CO and CH4, the differences were observed in the months of December and January (maximum differences between cluster up to: 38 ppb for CH4 and 100 ppb for CO). These differences could be explained by the fact that each cluster is associated with specific retro-plumes (i.e., continental and/or oceanic). In particular, continental retro-plumes are rich in CO, CO2, and CH4, unlike oceanic ones. This difference is well observed for CO and CH4 but not for CO2. Indeed, air masses associated with continental retro-plumes crossing vegetation zones during the day contribute to the absorption of CO2 (photosynthesis phenomenon), and thus to a strong depletion of CO2 in slow-moving air masses. The variability within each cluster is relatively small for CO2 (average variability: 1%), marked for CH4 (average variability: 2.4%), and very marked for CO (average variability: 36%). For all species, most of the clusters have minimum concentration levels between May and October and maximum in November to March, likely due to frequent changes (length of retro-plumes, Figure 7) in transport pathways, each with a different regional impact over the study period. This variability is less important for CO2 due to the stronger influence of the biosphere as noted earlier. The highest concentrations are associated with continental air masses (clusters B, C, D). From May to October no air masses were associated with Cluster B. CO2, CH4, and CO concentrations are lower in Cluster A due to the origin in the marine boundary layer of the remote Atlantic Ocean shown in the Figure 7a.   Figure 12 shows annual cycles of excess concentrations of CO 2 , CH 4 , and CO, associated with each cluster. Analysis of the variance of seasonal cycles shows significant differences in excess CO 2 concentrations amongst clusters for the months of October, December, January, March, April, and May (maximum differences up to 7 ppm). For CO and CH 4 , the differences were observed in the months of December and January (maximum differences between cluster up to: 38 ppb for CH 4 and 100 ppb for CO). These differences could be explained by the fact that each cluster is associated with specific retro-plumes (i.e., continental and/or oceanic). In particular, continental retro-plumes are rich in CO, CO 2 , and CH4, unlike oceanic ones. This difference is well observed for CO and CH 4 but not for CO 2 . Indeed, air masses associated with continental retro-plumes crossing vegetation zones during the day contribute to the absorption of CO 2 (photosynthesis phenomenon), and thus to a strong depletion of CO 2 in slow-moving air masses. The variability within each cluster is relatively small for CO 2 (average variability: 1%), marked for CH 4 (average variability: 2.4%), and very marked for CO (average variability: 36%). For all species, most of the clusters have minimum concentration levels between May and October and maximum in November to March, likely due to frequent changes (length of retro-plumes, Figure 7) in transport pathways, each with a different regional impact over the study period. This variability is less important for CO 2 due to the stronger influence of the biosphere as noted earlier. The highest concentrations are associated with continental air masses (clusters B, C, D). From May to October no air masses were associated with Cluster B. CO 2 , CH 4 , and CO concentrations are lower in Cluster A due to the origin in the marine boundary layer of the remote Atlantic Ocean shown in the Figure 7a.  Figure 13 illustrates the interannual variability of the clusters. The seasonal pattern described above is well reproduced from year to year, albeit with significant differences. Most differences occur at the transition between the four seasons. Cluster C varies significantly between years. Its contribution to air flows arriving at the measurement site in February 2017 (≈10%) is low compared to that of February 2015 (≈40%) and 2016 (≈40%). It presents almost no contribution in March 2017 while its contribution is significant in March 2015 (≈25%) and 2016 (≈45%). As for B, its contribution  Figure 13 illustrates the interannual variability of the clusters. The seasonal pattern described above is well reproduced from year to year, albeit with significant differences. Most differences occur at the transition between the four seasons. Cluster C varies significantly between years. Its contribution to air flows arriving at the measurement site in February 2017 (≈10%) is low compared to that of February 2015 (≈40%) and 2016 (≈40%). It presents almost no contribution in March 2017 while its contribution is significant in March 2015 (≈25%) and 2016 (≈45%). As for B, its contribution during GDS varies less (i.e., 10% interannual variability average). For other seasons, no clear trend emerges, reflecting specific atmospheric conditions. Therefore, this shows that interannual variability in transport patterns likely contribute to interannual variability of concentrations. Moreover, the proportion of trajectories from the South (clusters A and D) is globally similar during each month throughout the year.

Impacts of Transport on the Concentrations of CO, CO2 and CH4
The large-scale advection pathways of air masses arriving in the Lamto region were analyzed using clustering of PES. The origins and seasonal and interannual variability of these transport modes partly determines the interannual and seasonal variability of CO, CO2, and CH4. This method also allows the quantitative attribution of in-situ measurements to potential source regions (Figure 7). Here, the long-range transport of air masses of oceanic origin (cluster A) clearly shows lower levels of CO, CO2, and CH4 (Figure 9), which parallels the lower concentration of pollutants in marine air masses. Ncipha et al. [17] showed that air flows over oceanic regions provide cleaner marine air in southern parts of South Africa. For these authors, the consequence of the dominance of the oceanic fluxes (westerly fluxes) is the seasonal minimum in surface CO2 mixing ratio. It was observed that this oceanic cluster is much more frequent during the wet seasons (Figure 11), coinciding with the West African Monsoon (WAM) period, which could have an impact on concentrations. Indeed, the WAM period is associated with less significant variability in temperature and radiative flux, which are the main explicative variables of the seasonal dynamics of carbon fluxes in the region [36]. In addition, contributions from retro-plumes of exclusively Atlantic origins are ≈35% over the study period, which shows that the transport of air masses associated with the WAM strongly influences

Impacts of Transport on the Concentrations of CO, CO 2 and CH 4
The large-scale advection pathways of air masses arriving in the Lamto region were analyzed using clustering of PES. The origins and seasonal and interannual variability of these transport modes partly determines the interannual and seasonal variability of CO, CO 2 , and CH 4 . This method also allows the quantitative attribution of in-situ measurements to potential source regions (Figure 7). Here, the long-range transport of air masses of oceanic origin (cluster A) clearly shows lower levels of CO, CO 2 , and CH 4 (Figure 9), which parallels the lower concentration of pollutants in marine air masses. Ncipha et al. [17] showed that air flows over oceanic regions provide cleaner marine air in southern parts of South Africa. For these authors, the consequence of the dominance of the oceanic fluxes (westerly fluxes) is the seasonal minimum in surface CO 2 mixing ratio. It was observed that this oceanic cluster is much more frequent during the wet seasons (Figure 11), coinciding with the West African Monsoon (WAM) period, which could have an impact on concentrations. Indeed, the WAM period is associated with less significant variability in temperature and radiative flux, which are the main explicative variables of the seasonal dynamics of carbon fluxes in the region [36]. In addition, contributions from retro-plumes of exclusively Atlantic origins are ≈35% over the study period, which shows that the transport of air masses associated with the WAM strongly influences the air quality in the Lamto region. High CO, CO 2 , and CH 4 concentration levels associated with cluster B come exclusively from northeast-North African air masses, transiting through some West African countries (Ghana, Togo, Benin, Nigeria, Niger, Burkina-Faso). The diversity of the origins of these retro-plumes provides some evidence of the implication of transport from Northern Africa and emissions from fires (≈72% of the carbon balance in Africa, Ref. [66]). Large urban emissions in these countries could significantly increase atmospheric concentrations of the three studied trace gases species in these air masses. This corroborates the work of Jonquières et al. [67] based on measurements of the TROPAZ campaign in December 1987 in Côte d'Ivoire. These authors showed that high CO, CO 2 , and CH 4 concentrations are observed in air masses originating from combustion zones located in the north-eastern and northern African regions. In addition, they highlighted that the increase in concentrations of these species was due to the fact that the air masses were continuously loaded with combustion products during their passage over active fires in the last two days before their sampling in the study region. Moreover, Edwards et al. [68] and Pradier et al. [69] have also highlighted that any air masses transiting through these regions would potentially be loaded in emissions from biomass combustion. Frequent in hot period (December-March), cluster B corresponds to air mass below 2000 m generally associated with Harmattan flux. This flux is typically associated with dust transport from North and North-East Africa to the Gulf of Guinea, and is also charged with combustion products from anthropogenic activities and fires [61,[70][71][72][73][74]. Based on modeling studies, D'Almeida. [75] and Touré et al. [73] reported that the export of air from the continental boundary layer is mainly directed (60% of Saharan and Sahelian dust) to the Gulf of Guinea, remaining essentially confined in this layer and in the lower troposphere. However, few observations are available to confirm or disprove this prediction. Moreover, the seasonality of this continental cluster corresponds to that of the Saharo-Libyan anticyclone, which is a high-pressure system that could have an impact on air mass transport and dispersion (e.g., [17,18]). The dynamics of this anticyclone could favor air mass accumulation rich in CO, CO 2 , and CH 4 , and could explain the high concentration levels of these species. Moreover, the continental episodes represent ≈21% of the total number of retro-plumes compared to 35% of the ocean cluster. High concentration levels are also observed in cluster C that could be explained by the presence of the Saharan thermal depression whose period of occurrence corresponds with the observation of cluster C peaks ( Figure 11). Indeed, this thermal depression, which marks the alternation between two transport regimes, is observed from February to March in Burkina Faso and Niger and induces a significant variation in dust and aerosol levels [76]. Cluster D is almost the same as cluster A with low CO, CO 2 , and CH 4 levels compared to clusters B and C. We expected higher levels due to the air masses of continental origin coming from the northeast of the measurement station ( Figure 7). This could be explained by the fact that these continental air masses correspond to days when potential sources (e.g., fires, anthropogenic emissions) are mixed and diluted in cleaner air masses. In addition, the oceanic components could also be considered as the cause of low concentration levels. Furthermore, the seasonality of each cluster ( Figure 11) is well marked, due to the continuous influence of long-range transport of air masses originating from multiple directions during the year (cf. studies of Jonquières et al. [67]).

The Advantage of PES Clustering
Clustering of PES is useful for identifying the influences of unknown sources (and sinks) on atmospheric concentration variations of species at observatories. Our classification method focuses only on the transports and does not include emissions. When a high number of measurements is recorded, classification of the data before analysis is necessary. Indeed, cluster analysis is a well-known and accurate method for data classification, and represents an objective alternative compared to the more subjective method of trajectory classification [12,17,27,28,77]. The objective method was used in many studies for retro-plumes clustering since the first tentative made by Moody and Galloway [78]. For example, Brankov et al. [79] used clustering of back-trajectory simulated by the Hysplit model to analyze the role of synoptic scale circulations on observed pollutant levels at the Whiteface Mountain site (New York). In addition, applying back-trajectory clustering on observations in Munich (Germany), Lan et al. [77] have shown that the principal sources of CO 2 emissions were found in both the north and south-east directions of the measurement site. To assess the aerosol source regions of the investigated air masses over the cities of Abidjan (Côte d'Ivoire), Accra (Ghana), and Lomé (Togo) from June to July 2016 during DACCIWA project, Denjean et al. [80] analyzed the backward-trajectories simulated by the Hybrid Single Particle Lagrangian Integrated Trajectory Model (HYSPLIT). Our work is a continuation of those studies, but using LPDM outputs instead of single back-trajectories. These LDPM outputs are more quantitative than single trajectory positions (see e.g., [27]) and attempt to account also for the atmospheric turbulence and convection [30], which conventional back-trajectory excludes. Except the studies of Henne et al. [18] in Kenya, no attempt has been made yet in the area (West-Africa) and on the continent, to our knowledge, using a clustering technique based on LPDM footprints. The method shows a clear and different regional impact on the Lamto measurements. We recall that the clustering in this study is based on the potential emission sensitivity (PES), and not on the contributions of sources or concentrations themselves.

Conclusions
We have analyzed CO 2 , CH 4 , and CO concentration levels recorded at LTO from 2014 to 2017. The dataset has been classified using clustering of the footprints of the individual measurements (i.e., PES) simulated by the Lagrangian FLEXPART model, and correlation analyses. The application of clustering analysis to retro-plumes identified four clusters (A, B, C, and D). These four clusters have shown differences in the seasonal means and medians of CO 2 , CH 4 , and CO concentrations. The plumes associated with these clusters can be described as follows: − Cluster A (≈37% of the retro-plumes) is clearly associated with oceanic and maritime air masses trajectories from the Souths. − Cluster B (≈21% of plumes) indicates continental origin. − Cluster C (≈11% of the retro-plumes) is associated with air mass advection from all directions including plumes of Sahelian origin. − Cluster D (≈31% of the retro-plumes) is attributed to the advection of air masses which have a significant oceanic signal.
The use of a set of four groups in this study also made it possible to identify different variations in the measures. High CO 2 , CH 4 , and CO concentrations were observed in cluster B and it was found that an excess of about 128.5 ppb of CO, 74 ppb of CH 4 , and 6.3 ppm of CO over background concentrations could be explained by long-range transport of air masses grouped in this cluster. This highlight both the combined effects of emissions from biomass combustion (from November to March on the mainland) and the anthropogenic activities on CO 2 , CH 4 , and CO levels recorded at Lamto. In contrast, cluster A observations correspond to low levels of CO 2 , CH 4 , and CO. This cluster is generally observed at low altitude and is composed of humid air which results in the dilution of trace gas concentrations at Lamto. The concentration ratios ∆CO/∆CH 4 and ∆CO/∆CO 2 observed within each cluster depend on the origin of air masses. They are higher when air masses come mainly from the north and northwest (Harmattan flow) than when they are from the south and southwest (monsoon flow). However, the concentration ratios ∆CO/∆CH 4 and ∆CO/∆CO 2 obtained within cluster B show a predominance of anthropogenic emissions and combustion processes.
The correlations calculated between PES for each region and CO 2 , CH 4 , and CO concentrations show high and positive values for the continental regions. The correlation coefficients are generally significant (R ≥ 0.38 and tau ≥ 0.37) for Tropical Africa and North Africa. These two regions most affect the CO 2 , CH 4 , and CO concentrations at Lamto site. In this case, more than 40% of CO 2 , CH 4 , and CO seasonal variances are explained.
The CO 2 , CH 4 , and CO concentration variations statistically associated with the PESs of the different clusters show that the correlations are more significant between CO 2 , CH 4 , and CO and the PES associated with cluster B (R ≥ 0.47). However, the correlation value with CO 2 is the lowest (R = 0.47). This finding indicates that the amplitude of the variation of CO 2 induced by exchanges with soils and vegetation is large enough to modify the signal due to combustion sources. This shows the biospheric impact on CO 2 concentrations and suggests that CO 2 biospheric fluxes could be the main factor of the intra-seasonal variation.
The classification method presented here was successful at separating air masses of different chemical compositions (although the classification system was based only on simulated transport properties) and was independently compared to the measured concentrations. This method also allowed identifying source-receptor relationships within our dataset. Another advantage of this cluster classification method is that it is independent of our prior knowledge of sources and sinks. The technique used here, although it has sufficient resolving power, would benefit from further refinements. The results of this study furthermore induce specific conclusions and highlight the impacts of distant emitting sources on the in-situ measurements of CO 2 , CH 4 , and CO. It would be necessary to also take into account the local impacts for explaining the totality of the CO 2 , CH 4 , and CO variances on the site.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Linear Models showing relationships between species (i.e., CO 2 , CH 4 , and CO) and the SEPs over the regions of tropical Africa, North Africa, and the Mediterranean-Europe, and their correlation coefficients (R 2 ) and significance (p-value).

No.
Linear Models R 2 (2)  Figure A1. Application of the "weighted deviation and elbow criterion" method to determine the appropriate number of clusters. The x-axis represents the number of clusters and the y-axis represents the sum of the Euclidean distance within each cluster. The secant between the two tangents (dashed black curve) gives the number of clusters (in this case k = 3.75).