A Multi Linear Regression Model to Derive Dust PM10 in the Sahel Using AERONET Aerosol Optical Depth and CALIOP Aerosol Layer Products

: Due to a limited number of monitoring stations in Western Africa, the impact of mineral dust on PM10 surface concentrations is still poorly known. We propose a new method to retrieve PM10 dust surface concentrations from sun photometer aerosol optical depth (AOD) and CALIPSO/CALIOP Level 2 aerosol layer products. The method is based on a multi linear regression model that is trained using co-located PM10, AERONET and CALIOP observations at 3 different locations in the Sahel. In addition to the sun photometer AOD, the regression model uses the CALIOP-derived base and top altitude of the lowermost dust layer, its AOD, the columnar total and columnar dust AOD. Due to the low revisit period of the CALIPSO satellite, the monthly mean annual cycles of the parameters are used as predictor variables rather than instantaneous observations. The regression model improves the correlation coefﬁcient between monthly mean PM10 and AOD from 0.15 (AERONET AOD only) to 0.75 (AERONET AOD and CALIOP parameters). The respective high and low PM10 concentration during the winter dry season and summer season are well produced. Days with surface PM10 above 100 µ g/m 3 are better identiﬁed when using the CALIOP parameters in the multi linear regression model. The number of true positives (actual and predicted concentrations above the threshold) is increased and leads to an improvement in the classiﬁcation sensitivity (recall) by a factor 1.8. Our methodology can be extrapolated to the whole Sahel area provided that satellite derived AOD maps are used in order to create a new dataset on population exposure to dust events in this area.


Introduction
Aeolian mineral dust is a major component of the aerosol load in the atmosphere of our planet. Dust influences the Earth's radiative budget and therefore the climate system [1] or weather forecasting [2,3]. Desert dust is transported over thousands of kilometers and reaches environments far from emission areas. Northern Africa is the largest source of suspended mineral dust in the world with emission of 400 to 2200 Tgr/yr [4]. During Summer, the dust is transported over long distances covering the Tropical Atlantic Ocean and the Mediterranean sea [5][6][7][8]. During the dry season from October to April, the Sahel is under the influence of the northeasterly Harmattan wind [9] bringing dust to populated areas of western Africa. On one hand dust deposition provides a significant input of nutriments for marine and terrestrial ecosystems [10][11][12]. On the other hand, dust impacts the wellness and health of human beings because of its respirable fraction [13]. Among adverse health effects, dust events are suspected to play a role in the meningitis outbreaks occurring in the so-called "Meningitis belt" in Northern Africa [14] during the dry season [15][16][17][18]. However the scarcity of in situ observations of dust surface concentrations in Northern Africa is a limitating factor for a better assessment of the dust impact on health.
Satellite remote sensing offers a relevant alternative to ground stations for the purpose of dust atmospheric concentration monitoring [19]. The aerosol optical depth (AOD) has been widely used to infer the particulate matter surface concentration [20][21][22]. Deroubaix et al. [23] have shown that the OMI aerosol index is suitable to reflect the dust surface concentration in the Sahel during the core of the dry season (January to March). However [24] have underlined that columnar AOD as a proxy for dust surface concentration is one of the limiting factor in their epidemiological studies on meningetis in Burkina Faso. Indeed, AOD is a columnar integrated aerosol optical property and its relationship with surface dust concentration depends on several parameters including the vertical distribution of the dust in atmosphere. The altitude and depth of the dust layer in Northern Africa has a seasonal variation. During the Harmattan weather regime dust is advected close to the surface while during the pre-monsson and monsoon period the dust can be uplifted to higher altitude due to convective activity. This results in a seasonal cycle in the altitude of the dust transport [25,26], which has an impact on the relationship between the aerosol optical depth and the dust concentration at the surface [27].
Lidar (light detection and ranging) technology can provide the vertical distribution of the dust in the atmosphere. The spaceborne lidar CALIOP (Cloud-Aerosol LIdar with Orthogonal Polarization) aboard the CALIPSO (Cloud Aerosol Lidar and Infrared Pathfinder Satellite Observations) mission [28] launched in April 2006 provides an unprecedented set of observations on the aerosol and cloud vertical distributions from the surface layer to the top of the atmosphere. The accuracy of the top altitude of the layers derived from CALIOP has been estimated within ±0.1 km [29,30]. The scene classification algorithm [31] enables the identification of the mineral dust layers among the different scattering structure of the atmosphere. This capacity has been largely used to analyze the altitude of the transport of dust at the global scale [28,32,33]. The CALIOP dust products have also been used to analyze dust activity in the vicinity of source areas. Todd and Cavazos-Guerra [34] were able to derive a map of the summertime dust emissions in the Sahara from the CALIOP aerosol products.
The objective of the study is to improve the prediction of PM10 surface concentrations from AODs in the Sahel region by using CALIOP lidar aerosol products. The hypothesis is that the characteristics of the lowest dust layer detected by the lidar can be used to constrain the retrieval of dust surface concentrations from AODs.

Data
The International Network to study Deposition and Atmospheric composition in Africa (INDAAF) is a programme dedicated to the long-term monitoring of the atmospheric composition and atmospheric deposition fluxes in Africa. Surface PM10 concentrations are measured at specific locations over a Sahelian transect (Figure 1 [35,36]. The description of the stations and the database is given by Marticorena et al. [35]. PM10 observations are available with an hourly resolution. Daily PM10 are computed over 24 h by averaring hourly PM10 observations. Spurious variations in the daily PM10 are filtered out by site. Outliers are defined as outside 1.5 times the interquartile range of the log-transform distribution. It results in a 2% reduction in the daily PM10. The stations in Mali, Senegal and Niger are also equipped with an automatic sun photometer belonging to the Aerosol Robotic Network (AERONET). The sun photometers measure the direct sun irradiance and the sky brightness in spectral channels at 440, 675, 870 and 1020 nm [37,38]. The total aerosol optical depth (AOD) is retrieved from direct sun measurements according to the Beer-Lambert's law. We use in this study the daily level 2.0 AOD delivered by the version 3 of the direct sun algorithm.
The daily AOD is interpolated at a reference wavelength of 550 nm from the AOD at 440 and 675 nm and following the Angström law.
CALIPSO was launched in April 2006. The CALIOP lidar records atmospheric attenuated backscattering profiles at 532 and 1064 nm with an along-track resolution of 335 m [39] during day and night. The depolarization of the return laser beam is measured at 532 nm. The sampling vertical resolution of CALIOP is 30 m in the lower part of the profile from the surface to 8.2 km above mean sea level, and 60 m between 8.2 and 20.2 km [28]. CALIPSO has a sun-synchronous orbit of 99 min and repeat the same ground track every 16 days. The laser beam diameter at the Earth's surface is 70 m. We use in this study the level 2 Aerosol layer products (namely 05kmALay) from 2006 to 2014 [40].  [41]. The first one is the detection of atmospheric scattering layers [31], followed by the classification of the scene [42] and the retrieval of the extinction coefficient [43]. The level 2 aerosol layer product slices the atmosphere in up to 8 layers. The bottom and top altitude (see Vaughan et al. [31], and references therein) as well as the aerosol type and the AOD is provided for each layer.
The maximum acceptable distance between the station and the lidar track is set to 250 km. Only the tracks over land are selected around the station of M'Bour to avoid sampling the Saharan air layer [26,44]. The 250 km distance allows to have a average number of 5.5 profiles per months. A threshold on the cloud-aerosol discrimination (CAD) was set to −20 for selecting only high confidence aerosol profiles. We have retained only the profiles for which the quality criteria on extinction retrieval (namely "ExtinctionQC_532" parameter) indicates that the lidar ratio was unchanged or constrained during the retrieval. The total column AOD, the layer AOD, the top and the bottom altitude of the first detected layer starting from the ground level, which is identified as "DUST" in the feature mask are extracted. The base altitude of optically thick aerosol layer can be biased toward higher altitude due to the attenuation of the laser beam within the layer. This artifact was corrected in the version 3 of the algorithm by extending the base altitude of the layer to 90 m above mean sea level.

Method
We have made the hypothesis that the relationship between the PM10 surface concentration and the AOD is linear but influenced by the seasonal variability in the altitude of the lowermost dust layer and its AOD. In the following analysis we test a multilinear regression model on the monthly and daily PM10 and AERONET AOD. We introduce the seasonal variability by using the monthly mean annual cycle of AERONET AOD and CALIOP parametes. In addition to the lowermost layer properties (top and bottom altitude and AOD) we also use the CALIOP columnar total AOD and columnar dust AOD. The columnar dust AOD differs from the lowermost dust layer AOD as it includes all the possible dust layers. In case of only one layer, both quantities are equal.
Log-transform variables are used in the model and averages are computed using the geometrical mean. Because of the low number of observations per months (less than 6 on average) the monthly mean annual cycle is computed from the daily observations for each site. The dependent variable [PM10] represents the monthly mean PM10 concentration measured at each stations The predictor variables include the logarithm of the monthly mean AERONET [AOD] at each station, and the logarithm of the monthly mean annual cycle of AERONET AOD (Ĉ 1 ), CALIOP-derived top (Ĉ 2 ) and bottom (Ĉ 3 ) altitude of the lowermost dust layer, AOD of the lowermost dust layer (Ĉ 4 ), columnar total (Ĉ 5 ) and dust AOD (Ĉ 6 ). β 0 is the intercept, β i are the regression coefficients and is the error term.
We have applied a backward elimination based on Akaike Information Criteria [45] on the predictor variables in Equation (1) in order to select the best performance of the model. Equation (1) is applied to all data rather than for each station independently. Using the monthly mean ). However the dataset is reduced to 1335 measurements when considering a collocation with daily CALIOP observations. The impact of collocated instantaneous CALIOP parameters is discussed in the section below. The daily AOD at 550 nm (no unit) ranges between 0.01 and 3.83 and follows a rather similar pattern as the PM10 but with a weaker amplitude. The mean AOD are similar for the 3 sites: 0.39 in Banizoumbou, 0.35 in M'Bour and in Cinzana. Monthly averages are computed for months with at least 10 daily observations leading to 280 monthly PM10 and AOD data. The monthly mean PM10 and AOD ranges between 10 and 350 µg/m 3 and 0.11 and 1.26, respectively. The two stations farther inland (Banizoumbou and Cinzana) have a larger range in monthly mean PM10 than the coastal station (M'Bour). M'Bour is located downwind the dust sources and can be viewed as a receptor area for dust outbreaks coming from the Sahara [46]. So the baseline PM10 concentration in M'Bour remains at an upper level than for the stations located upwind. Local sources of sea salts or anthropogenic aerosols may also contribute to the surface concentrations but marginaly. The anthropogenic fraction in PM10 is about 11% [47] and relative abundance of dust in the coarse fraction at M'Bour is >80% [46]. The similar monthly patterns in AOD and PM10 between the 3 stations indicates that large scale transport of dust is the main driver of the variability and that a regional model linking AOD to PM10 may applied.

Predictor Variables
Predictor variables are presented in Figure 3. The PM10 monthly mean annual cycle is also diplayed in Figure 3. From January to March, PM10 concentrations are at a highest (200 µg/m 3 , all station average), while the AERONET AOD (Ĉ 1 ) is increasing from its minimal value (in December-January) to its maximum value in May. All the different AOD components (Ĉ 1 ,Ĉ 4 ,Ĉ 5 , andĈ 6 ) follows the same annual cycle with slight differences in amplitude.Ĉ 5 andĈ 6 have almost the same annual cycle thus indicating that dust is the dominant component in the atmospheric column, which is expected in the Sahel. The top (Ĉ 2 ) and bottom (Ĉ 3 ) altitudes of the layer present a bell shape with a maximum at about 4 km in June-July. The shift of the dust layer toward higher altitudes is associated with an increase in the AOD in the first half of the year and a decrease in the PM10 concentrations. In the second half of the year the AODs remain rather constant while the PM10 are increasing as the altitude of the dust layer is shifting toward lower altitude. There is a E-W gradient (between Banizoumbou and M'Bour) in the top altitude of the dust layer indicating that the dust layer sinks at it moves eastward.
The AOD components(Ĉ 1 ,Ĉ 4 ,Ĉ 5 , andĈ 6 ) are correlated and introduce multicollinearity in the regression, i.e., predictors are partially redundant with each other. Multicollinearity doesn't affect the model performance but it affects the calculation regarding individual predictors. The multicollinearity is addressed by using the variance inflation factor (VIF) [48]. A VIF above 5 indicates a multicollinearity problem.
The backward elimination applied on Equation (1) including all the predictors removes the variablê C 5 (columnar total AOD) from the model. VIF indicates thatĈ 4 ,Ĉ 5 , andĈ 6 introduce multicollinearity in the model. When removing those predictors,Ĉ 3 is discarded by the backward elimination. The predictor coefficients are presented in Table 1 for 3 cases: all set of predictors (AOD+CALIOP all), no collinearity (AOD+CALIOP alt) and AERONET AOD only. The coefficients given for (AOD+CALIOP all), although significant, must be taken with caution because of multicollinearity. However the model performance is still increased when usingĈ 4 andĈ 6 .   The introduction of CALIOP parameters improves significantly the prediction of monthly mean PM10 from AOD ( Figure 4). All the parameters are significant. The adjusted R 2 is increased from 0.15 to 0.75 and the residual standard error is reduced by almost a factor of 2. The negative coefficients on the altitude of the layer (Ĉ 2 andĈ 3 ) shows that the shift of the dust layer toward upper altitude during summer period decreases the surface concentrations. The relationship to surface concentrations is positive and negative forĈ 4 andĈ 6 , respectively. This indicates that the fraction of AOD within the dust layer is a good indicator and that finally the influence of the seasonal variability in the altitude and load of the dust layer on surface concentrations can so be approximated by the proportion of AOD with the dust layer divided by its altitude.   Figure 5 presents the scatter plot of the daily data for both the regression based on AERONET AOD only and AERONET AOD and the additional predictors from CALIOP. The number of data is drastically reduced when considering collocation of daily CALIOP observations with AERONET and PM10 observations (a factor of 5 between both dataset). Morever the performance of the model is moderately improved (e.g., R 2 increased by 8%). CALIPSO has 2 overpasses per day (day and night) at fix time. The weak impact of collocated daily CALIOP observations could be explained by the lack of representativity of the instantaneous daily lidar soundings in the very limited area of the CALIOP track. The prediction of daily observations is improved by using the monthly mean annual cyle of the CALIOP parameters. The adjusted correlation coefficient increases from R 2 = 0.24 to R 2 = 0.61 and the residual standard error is reduced by a factor of 1.4. However the dispersion of the data remains large. The root mean square error is reduced from 104 to 83 µg/m 3 and the mean absolute percentage error from 80 to 50%. There are little difference from one station to the other. The RMSE gain is 26%, 22% and 12% for M'Bour, Banizoumbou and Cinzana, respectively.

Application to Daily Observations
The ability of the multi linear regression model to improve the detection of daily high dust events from AODs is evaluated by using a threshold on both actual and predicted PM10 concentrations. The threshold is fixed at 100 µg/m 3 corresponding to unhealthy conditions. True positive corresponds to actual and predicted over the threshold while true negative corresponds to actual and predicted below the threshold.
The number of days classified as true positive is a factor of 2 higher for the AERONET+CALIOP model compared to the AERONET only model, in agreement with its ability to predict higher concentrations. The total predicted number of days exceeding 100 µg/m 3 is 2039 with a precision (true positive divided by true positive and false positive) of 0.71. The recall of the model (true positive divided by the number of true positive and false negative) is increased from 0.37 to 0.68 (a factor 1.8) indicating the improvement in the classification of large AOD that are associated to actual PM10 events.

Conclusions
The CALIOP layer aerosol layer products have been used to improve the statistical relationship between AOD and surface PM10 at three stations located in the Sahel over a period of 10 years. However there is not a direct model to predict the surface concentration from the satellite observations, it is shown that the monthly mean annual cycles of the altitude and AOD of the lowermost detected dust layer are good predictors for surface concentrations in conjunction with co-located AOD observations. The decorrelation between surface PM10 concentrations and AODs during the summer period can be attributed to the uplift of the dust layer. A linear regression model that accounts for the seasonal variability in the vertical shift of the dust layer improves the estimation of surface concentrations on a monthly and daily basis. Due to the low revisit time of the CALIOP lidar, introducing the daily CALIOP products have a limited impact on the regression analysis. The prediction of the days with unhealthy PM10 concentrations from satellite data remains challenging however it is improved by using the CALIOP layer parameters whatever the location. The lowest performance for such prediction is observed in M'Bour, that is possibly influenced by either marine atmosphere or anthropogenic emissions.
The proposed methodology can be applied to the whole Sahel area provided that passive imager derived AODs (e.g., MODIS) and CALIOP predictor variables are averaged on a regular geographical grid. In this paper the impact of biased or inaccurate AODs have been limited by the use of AERONET AODs but a further spatial extension would require an extensive validation of satellite AODs over the targeted area.