Air Pollution Measurements and Land-Use Regression in Urban Sub-Saharan Africa Using Low-Cost Sensors—Possibilities and Pitfalls

: Air pollution is recognized as the most important environmental factor that adversely a ﬀ ects human and societal wellbeing. Due to rapid urbanization, air pollution levels are increasing in the Sub-Saharan region, but there is a shortage of air pollution monitoring. Hence, exposure data to use as a base for exposure modelling and health e ﬀ ect assessments is also lacking. In this study, low-cost sensors were used to assess PM2.5 (particulate matter) levels in the city of Adama, Ethiopia. The measurements were conducted during two separate 1-week periods. The measurements were used to develop a land-use regression (LUR) model. The developed LUR model explained 33.4% of the variance in the concentrations of PM2.5. Two predictor variables were included in the ﬁnal model, of which both were related to emissions from tra ﬃ c sources. Some concern regarding inﬂuential observations remained in the ﬁnal model. Long-term PM2.5 and wind direction data were obtained from the city’s meteorological station, which should be used to validate the representativeness of our sensor measurements. The PM2.5 long-term data were however not reliable. Means of obtaining good reference data combined with longer sensor measurements would be a good way forward to develop a stronger LUR model which, together with improved knowledge, can be applied towards improving the quality of health. A health impact assessment, based on the mean level of PM2.5 (23 µ g / m 3 ), presented the attributable burden of disease and showed the importance of addressing causes of these high ambient levels in the area.


Introduction
Exposure to air pollution is the most important factor adversely affecting our health. Exposure to ambient PM2.5 (particulate matter with an aerodynamic diameter smaller than 2.5 µm) contributed to This study aims to reduce this lack of air pollution data by using low-cost sensors to assess outdoor air pollution and LUR to appraise outdoor PM2.5 exposure in the Sub-Saharan city of Adama, Ethiopia.

Study Site
The measurements were performed in the city of Adama, located 99 km south east of the Ethiopian capital, Addis Ababa. The city is commonly described as a commercial, industrial, residential and recreational center for the last couple of decades [40]. Adama is located in the Rift Valley area with an average altitude 1600 m and total area of 200 square kilometers. The weather is mostly dry except during May to October when there is some precipitation [41]. Mean annual rainfall and temperature are 859.9 mm and 21.6 °C, respectively [42]. Urban Adama had a projected 324,000 inhabitants in 2015 (although no census has been conducted since 2007) [43]. The Pan-African highway connecting Djibouti and Addis Ababa previously ran through Adama, now a new multi-lane stretch of the highway, with road tolls, has been built circumventing the city (Figure 1). The city has a large central bus station that serves the Eastern and Southeastern parts of the country. Arid and semi-arid vegetation, such as acacia and shrubs, wooded grassland, thorny bushes, herbs, grasses, and farmland, is common in Adama city and its surrounding areas. A total of 144 industrial plants are scattered around the city center with large industrial complexes clustered in the western and eastern outskirts of the city. The industries are mainly paper printing, cement Arid and semi-arid vegetation, such as acacia and shrubs, wooded grassland, thorny bushes, herbs, grasses, and farmland, is common in Adama city and its surrounding areas. A total of 144 industrial plants are scattered around the city center with large industrial complexes clustered in the western and eastern outskirts of the city. The industries are mainly paper printing, cement manufacturing, brick manufacturing, tomato-canning plants, cottage industries, textile, food, beverage factories, metal, Atmosphere 2020, 11, 1357 4 of 21 plastic and wood engineering, and textile manufacturing plants. The Adama vehicle fleet is dominated by small vehicles for public transport and motorcycles. According to the Adama City Transport Office (ACTO), 70% of the vehicles are bajaj (three-tire small car), 23% motorcycles, 7% minibuses and less than 0.1% are city buses. The four main intersections in Adama have reported daily traffic volumes of 16,000-32,000 vehicles [44] which gives an indication of traffic flows in the city. The majority of the roads are gravel (255,600 km), red ash (136,300 km) and cobblestone (17,200 km). The total length of asphalt roads in the city is 94.5 km [45].
Air sampling sites were selected to represent the anticipated spatial variation of Adama air pollution according to the ESCAPE protocol [32]. Prior to site selection, all residential, traffic, industrial, and green areas were identified, as well as the known solid waste burning sites, and land use. Based on the assessment, the city was classified in traffic, urban background, and regional background. According to ESCAPE, a minimum of 20 measurement sites should be selected to develop a LUR model. In this study, 20 sites (14 urban sites and 6 traffic sites) were selected during the dry season and 20 during the wet season (due to sensor malfunction data could however only be retrieved from 14 of them-8 urban background and 6 traffic sites). Due to the limited number of air monitoring sensors and fear of vandalization, air pollution measurements were not conducted at any region background site.

Measurements
Field measurements were conducted in Adama, Ethiopia, in February 2019 (the dry season) and in September 2019 (the wet season). The PM2.5 measurement sites were selected based on the ESCAPE protocol [32]. The measurement sites were classified into regional background, urban background and traffic types. A regional background site is identified by an area which is not continuously built in and without influence of nearby industry or traffic-any air pollution is not produced in the area but transported with winds. An urban background site is identified by a continuously built up area but not heavily influenced by nearby traffic or industry-air pollution generated from sources within the area creating an urban background that is higher than regional background sites. A traffic site is identified as one where the pollution level is mainly influenced by vehicle emissions from a nearby road. Using a Global Positioning System (GPS) (GARMIN GPSMAP64), each measurement site's location was geo-coded and the altitude was measured. During the wet period 20 Purple Air PA-II-SD sensors were used. Due to lack of electricity they were powered with powerbanks (Rawpower RP-PB41) which after 3 days were replaced with newly charged ones. During February, 5 Alphasense N3 sensors were used, powered with powerbanks that sampled for 1-2 days and were then moved to new sites. After 3 moves, all 20 sites had been sampled, see Figure 2. The Alphasense measurements had a time resolution of 1 s while the Purple Air was averaged over 1 min. The 5 Alphasense sensors had previously been run in parallel overnight on-site (in Adama) and showed good agreement; the measured PM2.5 concentration ranged 20.6-21.8 µg/m 3 (average 21.1 µg/m 3 (standard deviation 0.6 µg/m 3 ) (see Figure S1). The 20 Purple Air sensors had been run in parallel measuring diesel exhaust in laboratory settings showing good agreement ( Figure S2). Both sensor types measure the amount of scattered light to estimate particle mass concentrations. That means that low-cost sensor measurements are sensitive to both alterations in particle size distribution and refractive index [46]. Another well-known issue for low-cost optical sensors is high humidity, with consequent particle growth due to water uptake [47]. Both sensor types have built-in temperature and relative humidity (RH) sensors. Relative humidity followed typical diurnal patterns with maxima of~50% and 60-70% during nighttime for the dry and wet season campaigns, respectively. Although the highest RH peaks could affect the reported PM concentrations, especially during the wet season, no correlation between high RH and PM was seen during the spring campaign. Hence no correction was made for high RH to the data used in the LUR model. To ensure the accuracy of sensor measurements, user-defined calibration for the target PM source is recommended [28]. This, however, requires on-site access to reference mass concentration instruments, such as a tapered element oscillating microbalance (TEOM), which was not available (see Section 4.3). The Purple Air sensors can be connected to WiFi as a means of collecting data and overseeing that the sensor is functioning. For the purpose of this study, however, data was downloaded on-site due to lack of WiFi.

Geographic Predictor Variables
In order to construct a LUR model, information about geographical predictor variables were collected. In total, information about 52 geographical predictor variables were collected (for a full list of potential variables, see Table 1). Earlier studies have shown important variables that can predict levels of PM2.5 include road variables, population density, land use and altitude [48][49][50][51].  For comparison see Figure S3 in which sensors were not moved.
To ensure the accuracy of sensor measurements, user-defined calibration for the target PM source is recommended [28]. This, however, requires on-site access to reference mass concentration instruments, such as a tapered element oscillating microbalance (TEOM), which was not available (see Section 4.3). The Purple Air sensors can be connected to WiFi as a means of collecting data and overseeing that the sensor is functioning. For the purpose of this study, however, data was downloaded on-site due to lack of WiFi.

Geographic Predictor Variables
In order to construct a LUR model, information about geographical predictor variables were collected. In total, information about 52 geographical predictor variables were collected (for a full list of potential variables, see Table 1). Earlier studies have shown important variables that can predict levels of PM2.5 include road variables, population density, land use and altitude [48][49][50][51]. Primary road distance in meters between 300 m and 500 m m + * The next most important roads in a country's system. (Often link larger towns.) ** The next most important roads in a country's system. (Often link towns.) *** The next most important roads in a country's system. (Often link smaller towns and villages.) **** Roads which serve as an access to housing, without function of connecting settlements. ***** Access roads to, or within an industrial estate, camp site, business park, car park, alleys, etc. ****** Rural roads. ******* Bus stations and large parking lots.

Land Use
Land-use data were obtained from the Adama city urban land development and management office through a master plan conducted in 2019, aiming at describing the land use in the city for the coming 10 years. Based on the masterplan four classes were selected as relevant in relation to PM2.5 emissions and manageable to adjust to the land use when the study was conducted. These classes were industrial areas, residential areas, water, and transportation facilities (see Figure 1). Satellite images retrieved via World imagery in ArcMap, collected with Worldview 2 (resolution 0.5 m) and Worldview 4 (resolution 0.31 m) between 2016 to 2018 [52] (covering different parts of the area), were used to manually adjust the planned land use to represent conditions when measurements were conducted. In general, most corrections were made in the outer parts of the city, where the city has plans to expand into new areas. For industrial, residential and transportation areas, total area covered in m 2 were calculated for 100, 300, 1000 and 3000 m buffers around the measurement stations and added to the final model.

Industrial Areas
Industrial areas were included independent of type of industry, and were considered as plausible sources of particulate emissions. The distance to the closest industrial area from the measurement stations ranged between 35 m to 1487 m, with a mean distance of 490 m.

Residential Areas
Residential areas were included as a measure of human activity and population density, factors that can be expected to raise PM2.5 levels. In the master plan three types of residential area (housing, mixed residences and residences) were combined into one residential class. In the outskirts of the city, major adjustments were needed as the masterplan included several areas with newly planned settlements.

Transport Administration Areas
This class describes bus stations and major parking lots. Living close to this land use type is expected to be associated with higher levels of PM2.5 exposure due to high levels of traffic. In total, there were 16 transport administration areas.

Informal Settlements
In addition to the land-use classes obtained from the masterplan information about informal settlement areas were collected by a local geotagger. In total 18 areas were identified as informal settlements in the city. These areas were characterized by high population density, old houses and frequent small-scale waste burning.

Water Bodies
The water bodies are seasonal pathways or reservoirs where water flows or is collected during the rainy season. During the dry season people dump and burn solid waste in the empty canal as well as in the vicinity to these. Information about location of water bodies was obtained from the masterplan.

Road Traffic
Information about traffic density and road net distribution has been found to be an important variable in many studies [49]. In this study information from Open street map [53] was collected to represent the geographical distribution of the road network. Open street map also contained information about the type of road, and this information was used as a proxy for traffic intensity. The original data from Open street map contained 16 different classes of road. These were aggregated and reclassified into 7 classes including Motorway, Primary, Secondary, Tertiary, Residential, Service and Other roads ( Figure 3). Spatial distribution of the road network is presented in Figure 4. Geographical predictor variables with the total length in meters within 100 m, 300 m and 500 m buffer were calculated for each road type separately, as well as combined for any type of road. Variables with distance to the closest road of each type were also calculated.
All available predictor variables are shown in Table 1.
Atmosphere 2020, 11, 1357 8 of 21 Atmosphere 2020, 11, x FOR PEER REVIEW 8 of 22 Spatial distribution of the road network is presented in Figure 4. Geographical predictor variables with the total length in meters within 100 m, 300 m and 500 m buffer were calculated for each road type separately, as well as combined for any type of road. Variables with distance to the closest road of each type were also calculated.  Spatial distribution of the road network is presented in Figure 4. Geographical predictor variables with the total length in meters within 100 m, 300 m and 500 m buffer were calculated for each road type separately, as well as combined for any type of road. Variables with distance to the closest road of each type were also calculated.

Exposure Modelling
The ESCAPE protocol [32] was followed in order to develop a LUR-model. According to the protocol there is a recommendation to include 20 measurements when developing a LUR-model for PM. During the wet season, only 17 of the 20 sensors (Purple Air) registered data, but during the dry season all sensors (Alphasense) were functioning. The measurements from the dry season were, therefore, selected to develop a LUR-model. Correlations between measurements of the two periods were tested in order to examine if measurements from the two campaigns could be combined. The correlation between the two periods was, however, small (Pearson: p-value 0.437) and the values from the two periods were therefore not combined.
In a manual forward selection procedure, a total of 52 predictor variables were firstly tested univariate. Variables which were found to be statistically significant were then added to a multivariate model. The variables with highest R 2 were added first, followed by the second and so on. Only variables with a coefficient in the expected direction were entered; for instance, having more industrial land close to the measurement station would be expected to increase PM2.5 concentration and thereby expected to be positive. If, by adding a new variable, the direction of the estimate changed, that variable would not be allowed. New variables also had to increase the R 2 with at least 1% to be entered. When all variables that were statistically significant had been tested, coefficients that had a p-value above 0.1 in the final model were to be excluded.
In order to validate the model a number of diagnostic tests were performed. Tests for influential observations were performed through calculating Cook's D on the final model. A value higher than 1 indicated an influential observation [54]. Mulitcollinearity among the variables in the final model was also tested by calculating variance inflation factors (VIF) [38]. A VIF value higher than 3 was considered to indicate collinearity [55]. Spatial autocorrelation among the residuals in the final model were tested with Morans I [49].
The model was internally validated by leave-one-out cross-validation [32,49,56]. New models with the variables from the final model fixed were developed with n-1 of the measurement sites included. For these models an average adjusted R 2 value and root mean square error (RMSE) value was calculated to test internal validity.

Health Impact Assessment
In order to estimate the health impact based on the measured PM2.5 levels, we used World Health Organization (WHO) AirQ software [57] to estimate the attributable burden of disease due to PM2.5 in the area. AirQ + was developed with the specific purposes of (1) reflecting the current state of the science on the health effects of air pollution; (2) ensuring that researchers and governmental officials worldwide could have access to a tool to inform and ultimately support actions to improve air quality; and (3) to provide a large audience with an educational tool that includes summaries of the information that needs to be gathered and organized to understand the impacts of air pollution on health. AirQ+ [58] estimates the number of avoided premature deaths and illnesses attributed to improving air quality using a simple algebraic equation often referred to as a health impact assessment (HIA): where ∆Y = the estimated number of premature deaths or illnesses, β = the risk estimate (or Beta coefficient) from an epidemiologic study, ∆C = the defined change in the concentration of the air pollutant examined; Yo = the baseline rate (i.e., incidence) of deaths or illnesses; and Pop = the population exposed to air pollution.
Assessment of air pollution: we did only include a mean level of measured level of the area and no spatially distributed levels as we could not use the LUR model. We inserted the mean measured level of PM2.5 (23 µg/m 3 ).
Assessment of baseline rates of death or disease: as we lack good data on baseline rates in this study area, we can only present data in percentage and not in numbers.
Assessment of population: these data can include demographics but were limited here to population size of 214,000 inhabitants in a 200 km 2 area.
Risk estimates: a concentration-response parameter based on epidemiological evidence of effect per unit change in pollution here we used the Global Burden of Disease 2015/2016 Integrated function [57].

Measured Particulate Matter (PM2.5) Levels
The measured average PM2.5 concentrations as well as min and max levels for the 20 sampling sites during the spring campaign are shown in Table 2.  Since the measurement period was not more than two days at each site, it was not possible to look at the general variation between days in average PM levels, i.e., to estimate if the measured periods could be considered representative for the sites. However, this could be done using data collected at one of the sites (4), where 5-day measurements were conducted during August 2018 using a DustTrak (DRX 8533). These 5 days had an average PM2.5 concentration of 49 µg/m 3 , the standard deviation in PM2.5 level of the daily averages were 7.4 µg/m 3 . Also, the average standard deviation between days in PM2.5 levels measured at the 14 sites during the wet season in 2019 using Purple Air was 8.4 µg/m 3 (standard deviations for each site ranging from 3.6 to 18.8 µg/m 3 ). The variation between days was not higher than 20% of the average PM2.5 concentration of the measurement period. In 10 of the sites it was below 15% and at three sites it was between 25 and 29%. Two of those cases were due to especially high morning concentrations (presumably due to high traffic density), while the third series showed elevated concentrations during one lunchtime. In fact the one hour moving average concentration at that site peaked at 700 µg/m 3 and was consistently above 160 µg/m 3 between 11:10 and 12:40 during this particular lunch time; while the rest of the sampled period (featuring four each of morning rush, lunch time and afternoon rush) did not feature any concentrations above 160 µg/m 3 at this site (see Figure S3).
The diurnal pattern is consistent across all days in a week (with occasional additions of other peaks), with elevated levels during early morning and late afternoon. These elevated levels can be explained by city rush hours, but also by the fact that the heavy truck drivers, which pass on their way to the port, use these times of the day to take a break in Adama. A typical example is given in Figure 5. See Figure S3 for overview of the recorded PM2.5. Relative humidity followed typical diurnal patterns with maxima of ~50% and 60-70% during nighttime for the spring and fall campaigns, respectively. Although the highest RH peaks could affect the reported PM concentrations, especially during the fall, no correlation between high RH and pm was seen during the spring campaign. Hence, no correction to the data used in the LUR Figure 5. A 4-day time series measured by Purple Air illustrating the typical diurnal pattern, with elevated levels during early morning and late afternoon rush hours, seen throughout the Adama measurement sites.
Relative humidity followed typical diurnal patterns with maxima of~50% and 60-70% during nighttime for the spring and fall campaigns, respectively. Although the highest RH peaks could affect the reported PM concentrations, especially during the fall, no correlation between high RH and pm was seen during the spring campaign. Hence, no correction to the data used in the LUR model was made.

Land-Use Regression (LUR) Modelling
The univariate testing of the variables gave three statistically significant variables (p-value > 0.05): Primary road within 300 m (p-value: 0.016), Primary Road within 500 m (p-value: 0.043) and Distance to closest road (p-value: 0.020). The Primary road within 500 m contained the Primary road within 300 m, and one of these variables therefore had to be excluded. Primary road within 300m were selected to be entered to the multivariate model as it had a higher explanatory degree (R 2 = 0.283) compared to 500 m (R 2 = 0.209). In accordance with the ESCAPE protocol, an outer ring buffer with all primary roads between 300 m to 500 m distance was instead calculated and univariately tested, but was found to be statistically insignificant (p-value: 0.252). The statistically significant variables were then added with a controlled forward-selection method, where the variables with the highest R 2 were added first followed by the second highest. In the final multivariate model both variables had p-values lower than 0.1 and were therefore maintained in the final model (see Table 3). The final Land use regression model was 25.855 + (0.006 × Primary road distance in meters within 300 m) + (−0.383 × Distance to nearest road). This model thereby only contained road traffic related variables. A multicollinearity test of the two variables in the final model gave a variance inflation factor of 1.145, which indicates no multicollinearity (lower VIF than 3). In the diagnostic test for influential observations one station with id 17, where found to have a Cooks D value above 1, indicating an influential site. A sensitivity analysis was therefore performed to investigate if this measurement station should be omitted from the model nor not. Developing a model without including station 17 rendered in the alternative model 25.651 + (0.011 × Primary road within 300 m − 0.373 × Distance to closest road) with an R 2 of 0.499. The reason for influential sites could be extreme measurement levels or extreme values on the predictor variables, but could also be related to predictor variables including a high number of measurement stations with the value 0 [49]. The measured concentration of PM2.5 of 24 µg/m 3 was not considered as extreme. However, one of the two predictor variables in the model, Primary road 300, had the value 0 for 17 of the measurement stations. The measurement station with id 17 was one of these stations, and it is thereby likely that the many 0 values in this variable resulted in this station being influential. Excluding the measurement station would have been a possibility, but this would have made the values in the Primary road 300 variable even more extreme. Another solution could have been to exclude the variable Primary road instead [49]. We tested to develop a model where Primary Road 300 was exchanged for Primary road 500 instead (as this variable also fell out as statistically significant in the univariate testing), but it fell out as statistically insignificant in the new multivariate model (p-value: 0.170). This would have resulted in an alternative final model including only the variable distance to the nearest road with a R 2 value of 0.224.
No spatial autocorrelation among the residuals of the final model were found with an insignificant Moran's I value of 0.078 (p-value: 0.374). The average adjusted R 2 in the cross-validation was 0.336 and the RMSE was 4.31.

Discussion
Studies of exposure to air pollution and health effects are seriously lacking in the Sub-Saharan region. This study is an attempt to correlate exposure levels of PM2.5 outdoors to health outcomes by using low-cost sensors in combination with a LUR model to assess pollution levels which, in turn, can be used to allocate individual outdoor exposure to a large cohort of pregnant women.
The developed LUR model could explain 33.4% of the variance in the concentrations of PM2.5 at the measurement stations. Applying this model in epidemiological studies would, thereby, be problematic, as only one third of the outdoor levels of PM2.5 could be explained by the model.

Seasonal Variation
Two measurement campaigns were conducted, one during the wet season and one during the dry season. The goal was to measure over at least 5 consecutive days at least 20 sampling sites simultaneously (in accordance with the ESCAPE protocol [32]). In our first campaign, the ordered sensors (Purple Air) did not arrive as promised, so five Alphasense sensors were brought instead as an emergency solution. These were used as efficiently as was possible: at 5 sites simultaneously for 1-2 days then changed to 5 new places etc., covering a total of 20 places. In the autumn campaign, 20 Purple Air sensors were used, which enabled us to measure for 5 consecutive days at 20 places simultaneously, but when data were downloaded it was found that only 17 of them had logged data to the SD-card. The choice to use the Alphasense data only for this study was made based on the fact that 20 sites is an absolute minimum to build a reliable model. Attempts were made to combine the two data sets, but a sensitivity analysis showed that none of the variables univariately tested for the measurement campaign in August 2019 did fall out as statistically significant. Therefore, it was not possible to build a model based on these measurements and on the available geographical predictor variables. Also, there is the question about seasonal variability-if there is an inherent difference in PM2.5 levels between the wet and dry season due to precipitation or in-transport of particles to the Adama area, it would not be feasible to combine the two methods. Attempts were made to assess whether there is any seasonal variability, but our amount of data was too limited to say anything for sure.
Due to the problems with the PM data from the meteorology station described, this could not be used to look at seasonal variation either. Ethiopia's wet season is not a monsoon type of wet season, but rather a season when occasional rainfall occurs. Therefore, it could be assumed that wet deposition does not influence the particle concentrations very much. When comparing the wetter season sensor data with the data collected during the drier season, the general PM2.5 levels were higher during the wet measurement period than during the dry. Traffic sites showed an average PM2.5 level of 56 µg/m 3 during the wetter season as compared to 24 µg/m 3 during the dry season. For the urban sites the corresponding averages were 45 µg/m 3 and 25 µg/m 3 for the wetter and drier seasons, respectively. There are a few possible explanations for these findings. Firstly, the measurement periods should have been longer. It is not unlikely that the measurements either during September or during February were not conducted during representative time periods. Secondly, different types of sensor were used during the two periods. Li et al. [28] conducted a comparison of several different low-cost sensors and found that Alphasense showed levels about 5 times higher than Purple Air when measuring Arizona Road Dust, but levels about 1/3 of Purple Air when measuring sea salt particles and 1/4 when measuring incense particles. Arizona Road Dust is a commercial test particle type that is around 2-4 micrometers and quite light in color, the generated sea salt composed of particles mainly in the nanometer scale, which in general is below the detection limit for the sensors, and incense combustion generates mainly sub micrometer particles. The outdoor sources in Adama are most likely dominated by vehicle exhaust and waste burning, indicating that the lower levels measured by Alphasense during the dry season maybe should be upscaled. If the TEOM data from the monitoring station in Adama would have been trustworthy, it could be used to calibrate both sensor types, and it is likely that such an operation would have resulted in less discrepancy between the measurements from the two seasons. A third explanation may be if there is more in-transport of dust from regions outside of Adama during the wetter season. There are large desert areas north (Sahara) and east of Adama, and if the winds are more northern or eastern during the wetter season, this might, at least partially, provide an explanation. Two years (2018 and 2019) of wind direction data was obtained from the meteorological station, where the wind direction had been noted four times a day, every day. Analyses showed that during 32% of the dry season and 34% of the wet season, the wind direction had been denoted as "0". Given that 360 degrees also was a reported wind direction we had to assume that 0 degrees was a way of stating that no measurements were taken at that time point (even though "n/a" also was reported occasionally). From the resulting wind roses (Figure 6a,b), it is obvious that the predominant wind direction during the dry season is Northeastern, while during the wet season it also blows from West-southwest. itor (BAM), as described by e.g., [26]). There were hopes that this dataset could be The West-southwest winds during the wet season could, of course, contribute to in-transport of PM2.5 from those regions outside of Adama where several industries are located (steel factory, flour factory, car assembly facility and textile industry). The acquired wind data are, however, not quality checked, and there are indications (such as the disproportionately many 0 degrees) that the data are not trustworthy.

Unexplained Concentration Peaks
It should be noted that while traffic definitely contributes to PM2.5 at both urban and traffic sites, elucidating the relative importance of traffic is not feasible from our dataset. That would require more auxiliary data on pollutant concentrations, and/or detailed information on traffic volumes, and ideally comprehensive physicochemical particle characterization as well. High concentrations during morning and afternoon rush hours do not exclude high contributions from non-traffic sources. Furthermore, PM2.5 peaks that cannot be explained by morning and afternoon rush hours occur. These peaks are often significantly high, see Figures 2 and S3, for examples, and can influence not only the average daily value but also the value for longer, e.g., five days, There were hopes that this dataset could be the West-southwest winds during the wet season could, of course, contribute to in-transport of PM2.5 from those regions outside of Adama where several industries are located (steel factory, flour factory, car assembly facility and textile industry). The acquired wind data are, however, not quality checked, and there are indications (such as the disproportionately many 0 degrees) that the data are not trustworthy.

Unexplained Concentration Peaks
It should be noted that while traffic definitely contributes to PM2.5 at both urban and traffic sites, elucidating the relative importance of traffic is not feasible from our dataset. That would require more auxiliary data on pollutant concentrations, and/or detailed information on traffic volumes, and ideally comprehensive physicochemical particle characterization as well. High concentrations during morning and afternoon rush hours do not exclude high contributions from non-traffic sources. Furthermore, PM2.5 peaks that cannot be explained by morning and afternoon rush hours occur. These peaks are often significantly high, see Figure 2 and Figure S3, for examples, and can influence not only the average daily value but also the value for longer, e.g., five days, measurement periods. These peaks were not found to occur specifically in the traffic-related sites, rather quite often in the urban background sites. Without proper source attribution approaches it is difficult to state with certainty what causes these peaks. It is likely, however, they are to some extent caused by outdoor waste burning. These burning events are of varying magnitude, occur at different places, and with irregular frequencies. Outdoor burning to eliminate household waste is a large contributor to environmental pollution [24], and the smoke consists of several toxic substances in both particle phase and gas phase such as polyaromatic hydrocarbons [59], heavy metals [60], and dioxins [61]. Hence, it is not necessarily a good idea to treat these peaks as outliers and remove them from the data, even though the data would be more consistent and shorter measurement periods then could still be used to generate data representative for a longer period. The only way, as we see it, to both include these events and generate representative data, is to measure for time periods substantially longer than five days. Additionally, at the urban background sites, of which informal settlements are part, a lot of cooking occurs outdoors. These emissions, from charcoal, wood, or dung, can affect the outdoor air pollution on a small regional scale, and might be another reason for peaks which stand out from the diurnal pattern.

Using Low-Cost Sensors
There are obvious drawbacks to using cheap sensors instead of more advanced equipment, but when many sites need to be sampled at once, there are few options. It is well known that data from sensors can be reliable first after having been adjusted using data from a reference instrument (in the case of PM2.5 such an instrument would e.g., be the TEOM or the beta attenuation monitor (BAM), as described by e.g., [26]). Hopes were that this dataset could be retrieved from the Adama Monitoring Station, but when analyzed there were obvious flaws in the data, e.g., the PM2.5 values were frequently higher than the PM10 values. When visiting the monitoring station it was found that it was equipped with two new TEOMs, but none of the inlets appeared to ensure the correct cut-off in particle size. In the settings of this study, there were further issues that made the use of sensors problematic. There was very limited access to electrical power, so 40 powerbanks with a capacity of 26,800 mAh had to be purchased and delivered to Ethiopia-which indeed is challenging in itself-and these power sources had to be changed after 3 days of measurement. The lack of WiFi both restricted us from being able to monitor the functionality of each sensor in real time, as well as adding the need to download data manually from each sensor. In the case of Purple Air, there is no feedback on whether the SD card has been put back correctly in the sensor and if not, it gives full indication that it is measuring but no data are saved on the SD card.

LUR Model
A previous LUR model had used 48 h measurements for NO 2 in a West African town with satisfying results [38]. In this study we tested if short measurement periods also could be used for PM2.5. The difficulties that we encountered indicate that short measurement periods of PM2.5 do not form such a good basis for LUR modelling as do NO 2 . The West African study was conducted in a region with low traffic density (the main source of NO 2 ), while the sources for outdoor PM2.5 are more numerous, such as traffic, suspension of road dust, open waste burning, industries, and solid fuel burning for cooking. According to ESCAPE, short-term PM measurements can be used for LUR modelling if there also is a fixed, functioning monitoring station available that can provide reliable data from reference instruments, which was not the case in our study.
The overall explanatory level of the LUR model in this study is considered as weak in comparison to PM2.5 land-use regression models developed in European settings [49]. Two studies conducted in South Africa showed R 2 values ranging from 0.29 to 0.76 [39,51]. The final model in this study had a R 2 of 0.33. To put our results into perspective, a previous study by Gulliver et al. [62] considered model R 2 of 0.47 to be an acceptable level when they modelled PM10 using LUR and another model in Munich found an R 2 of 0.36 for PM2.5 to be acceptable [63]. Ryan and LeMasters [48] found R 2 in the range of 0.54-0.81 when reviewing 12 LUR models for primary pollutants. Many fewer models have been developed in the settings of Sub-Saharan Africa. In an African context, two previous LUR models for PM2.5 showed R 2 of 0.76 [51] and R 2 of 0.29 [39], where our model performance is more in line with the latter. However, we did conclude that our model performance was not acceptable enough for epidemiological studies. In concordance with many of the LUR models developed in the ESCAPE study, the final model in this study included traffic related variables, which are considered as central predictor variables for model performance. This implies that our lower performance might be due to limited traffic intensity data, such as traffic intensity and car speed [49]. This was also the case for one of the South African studies, where traffic related variables were included in annual, winter and summer models for PM2.5 [51]. Even so, these models also contained other predictor variables including population density, urban area and open space. In contrast to this the other South African study found no important traffic-related variables, but instead grills, waste-burning sites and population density [39].
The performance of the LUR model might have been improved by including other important geographical predictor variables such as traffic intensity, waste burning, and a better estimate for population density. To some extent we did have a proxy for population density by including both residential areas and areas of informal settlements as a possible predictor variable. It is well known that areas of informal settlement often are very dense areas in terms of population. In this study it was, however, not a predictor variable for the PM2.5 levels. A possible explanation for not finding any association between informal settlements and PM2.5 levels could be difficulties in obtaining data for all these areas. We are aware that we are missing out on some of these areas. We also attempted to make traffic counts for each road type in order to evaluate how homogenous the different road types were, and possibly including traffic intensity through assigning traffic to all road segments based on these traffic counts. Unfortunately, due to communication misunderstandings, we did not manage to collect enough traffic counts for each different type of road. In addition, we also attempted to include more local small point sources such as waste burning sites previously done in a South African study [39]. However, when returning for additional measurement periods we could see that small waste-burning sites were spatially and temporally unpredictable, and thereby not feasible to include in the model.
The importance of carefully selecting the buffer radius for the predictor variables has earlier been acknowledged [48]. PM2.5 sources can be very local and using shorter search radius might, therefore, have been a possibility to improve the model.
The variables included in the final model were considered as plausible, as road traffic would be considered as likely important emission sources of PM2.5. This is also commonly found in LUR models predicting PM2.5 levels [49,56]. Even so there were a number of expected sources such as waste burning and cooking at residencies and industrial emission sources that were not captured by the predictor variables. The variables included in the final model seemed stable with low multicollinearity (low VIF). No spatial autocorrelation was found when tested among the residuals of the final model, thereby not giving any guidance regarding missing out on predictor variables that would be influential for limited parts of the study area.
Due to a limited number of measurement stations internal cross validation were used instead of omitting measurements to validate the model [49]. The cross-validated model had a R 2 of 0.336, which was very close to the original models R 2 of 0.334. A model with a cross-validated R 2 that is close to the original is often considered to be a stable model [38].
It could be argued that an alternative final LUR model could have been selected. The diagnostic test for influential observations indicated that one of the stations, with id 17, were influential (Cooks D above 1). According to the ESCAPE studies three criteria needed to be fulfilled in order for a measurement station to be excluded when a high Cook's D value were detected [49]. These criteria were: (1) the site was very influential, that is heavily change the estimate, changed direction of estimate or turned estimate insignificant; (2) all possible model suggested the measurement station as influential; (3) local partners and the ESCAPE exposure group agreed retrospectively that the stations was not representative. Omitting the measurement station with id 17 did give a higher estimate for the variable primary road within 300 m, but it did not change the direction or make any of the estimates insignificant. The measurement site was also considered to be representative for traffic sites and the measurement value was not considered to be extreme (even if it could have been suspected to be higher due to its closeness to the road with most traffic in the area). Therefore, it was decided to keep measurement station 17 in the final model, as the station was not considered to be erroneous.
In order to be able to use the master plan as a source for land use, some manual adjustments were needed. These adjustments were made based on satellite images and local knowledge on the land use. When adjusting the classes in the masterplan to conditions when the measurements were performed, most of the corrections need to be made in the outskirts of the city. This means that the risk of having misclassification in the land-use variables is, therefore, likely higher in areas in the outskirts of the city.
In summary, we set out to conduct a LUR model of PM2.5, due to various reasons mentioned above, the performance of the model was not deemed applicable to be used for epidemiological studies nor in health impact assessments. Instead, we used the measured levels to do a cruder estimation of health effects.

Health Effects
The PM2.5 levels in this area are always above WHO air quality guidelines for health [64] of 10 µg/m 3 . They are also above the United States Environmental Protection Agency (US EPA) National Ambient Air Quality Standards [65] of 12 µg/m 3 and generally above the EU air quality directive [66] of 20 µg/m 3 . Sometimes the levels were two orders of magnitude higher than these guidelines, especially near traffic or other local pollutants. This shows the necessity to mitigate the levels by adopting cleaner technology and diverting traffic away from densely populated areas.
This, of course, has implications for health and although we could not find a suitable model to assess the spatial distribution of pollutants for more detailed health impact assessment and epidemiological studies it should not derail findings of detrimental levels. Using a WHO AirQ tool to estimate the health effects indicate that the PM2.5 levels are attributable to one fifth of all mortality due to acute lower respiratory infections in children and attributable to between one fifth and one third of mortality in adults. The costs for society, families and individuals in terms of the health burden are large and should necessitate changes in mitigation to lower these levels. One start would be to set national Ethiopian air quality standards which can help guide sustainable choices. The importance but also the challenges of working on reducing air pollution levels on this continent has been reviewed [67]. In this city there is an easy solution, to divert traffic away from populated areas, as a brand new circle road has been built, albeit with a tariff hindering this transition. If health costs are included in policy choices, removing or lowering the tariff can be a solution to also remove old and dirty transit trucks from the city center. Adopting waste collection would also benefit the air quality as would a switch to cleaner energy for both cooking and transport.

Conclusions
In this study a land-use regression model was developed that is able to explain a third of the variance in the concentration of PM2.5 in Adama. Applying this model in epidemiological studies would be problematic as a large proportion of the variation in PM2.5 concentration is left unexplained. The work with developing a land-use regression model in this study has highlighted some of the challenges that can arise related to measuring PM2.5 and collecting data for important geographical predictor variables. Even so, the health impact assessment conducted based on the PM2.5 measurements showed the significant health impact this exposure has on public health in Adama. There is much potential in assessing air pollution using low-cost sensors, but the lack of electricity and WiFi in the SSA region has to be taken into consideration when planning such studies. The importance of air pollution as the second leading cause of death in Ethiopia after malnutrition has been highlighted before by Global Burden of Disease studies [68]. Our air pollution measurements, thereby, confirm the importance of addressing the causes of these high ambient levels.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4433/11/12/1357/s1, Figure S1: Simultaneous co-located overnight measurements from the five Alphasense instruments performed in Adama prior to the dry season campaign. Figure S2: The agreement between the 20 Purple Air sensors used during the wet season. Figure