Hospitalization Due to Fire-Induced Pollution in the Brazilian Legal Amazon from 2005 to 2018

: Fire is widely used in the Amazon as a ubiquitous driver of land management and land cover change. Regardless of their purpose, ﬁres release a considerable amount of pollutants into the atmosphere, with severe consequences for human health. This paper adds to the extant literature by measuring the causal effect of ﬁres on hospitalizations, using the approach of instrumental variables, whose validity is assessed with multiple statistical tests. A wide range of confounders are added as covariates, seizing on the accuracy enhancement potential of a broad and ﬁne-grained dataset that covers 14 years of the whole Amazon territory at a municipal–monthly level. The results reveal a positive effect of ﬁre on hospitalizations due to respiratory illnesses in general, and particularly in those due to asthma. A 1% increase in pollution concentration would increase hospitalizations by 0.14% at a municipality–monthly level. A total of 5% of respiratory hospitalizations were estimated to be attributable to ﬁre-induced pollution, corresponding to 822 cases per month. The analysis demonstrates that the coupling of econometrics and remote sensing data is a promising avenue towards the assessment of impacts caused by ﬁres, which may be applied to other regions of the world subjected to anthropogenic ﬁres.


Introduction
Biomass burning in agriculture is a global source of air pollution and, consequently, of morbimortality. The practice, which is more frequently reported in developing countriessuch as China, India, and Brazil [1], but also Indonesia [2]-has been evidenced as a source of respiratory illnesses and related hospitalizations [2][3][4][5][6][7]. In addition to the impairment of health and eventual premature mortality [8], already overloaded national healthcare systems are subjected to higher pressure.

Data
The variables considered may be classified into three categories: instrumental variables (IVs) and the main explanatory variable (ExpV), dependent variables (DpVs), and control variables (CrVs) ( Table 1). Such categories are suggested by the data analysis approach of two-stage least squares regressions, which estimate the effect of the main explanatory variable (pollution; ExpV) on health dependent variables (DpVs). For this, (1) other relevant predictors were added as control variables (CrVs), and (2) instrumental variables (fire and wind; IVs) were explored as a means of effect identification. Therefore, variables' categories were strictly due to their roles (or positions) in the regression models. Detailing only the less common factors, mortality of infants and mortality due to the four most deadly chronic diseases in Brazil were used to proxy populations' health levels (in accordance with Aquino et al. [26] and Harris and Kohn [27]). Health professionals, hospital beds, and facilities were sued to proxy the supply of healthcare. Other sources of pollution were proxied by urban zones, road density, and vehicle fleets.

Data
The variables considered may be classified into three categories: instrumental variables (IVs) and the main explanatory variable (ExpV), dependent variables (DpVs), and control variables (CrVs) ( Table 1). Such categories are suggested by the data analysis approach of two-stage least squares regressions, which estimate the effect of the main explanatory variable (pollution; ExpV) on health dependent variables (DpVs). For this, (1) other relevant predictors were added as control variables (CrVs), and (2) instrumental variables (fire and wind; IVs) were explored as a means of effect identification. Therefore, variables' categories were strictly due to their roles (or positions) in the regression models. Detailing only the less common factors, mortality of infants and mortality due to the four most deadly chronic diseases in Brazil were used to proxy populations' health levels (in accordance with Aquino et al. [26] and Harris and Kohn [27]). Health professionals, hospital beds, and facilities were sued to proxy the supply of healthcare. Other sources of pollution were proxied by urban zones, road density, and vehicle fleets.

Instrumental Variable Data
It is useful to clarify that an instrumental variable (IV) is understood first of all a strategy of identification-that is, a procedure that enables the estimation of a parameter capturing the specific effect of a particular explanatory variable (ExpV) on the target dependent variable (DpV) [15]. In addition, and relatedly, an IV is needed because, due to the observational nature of data, an ExpV is assigned to locations in a way that is not Remote Sens. 2022, 14, 69 4 of 19 random, but rather correlated with omitted unobservable predictors; thus, its effect is not identified and, consequently, not measurable as a standard regression coefficient. In short, an IV is any variable correlated with the ExpV and not correlated with the DpV through any other causal path besides the one mediated by the ExpV [15]. We used the MCD64A1-v6 Burned Area data product of the Moderate-Resolution Imaging Spectroradiometer (MODIS) of the Terra and Aqua satellites, collected using the R package "MODIStsp" [28]. The MCD64A1 has a monthly periodicity, spatial resolution of 500 m, and data starting from November 2000. This product uses an association of surface reflectance with active fires, along with an algorithm that checks the temporal Remote Sens. 2022, 14, 69 5 of 19 changes in the vegetation index [29]. Globally, it presents 99.7% overall accuracy, with 40.2% commission error and 72.6% omission error [30]. In the Amazon region, it shows a great similarity to regional and supervision-based products, underestimating 2.9% of the total burned area, but detecting more fires in the northern and northwestern areas than in the southwest [31].
For thermal anomalies, we used collection 6 of the product MYD14A1 from Aqua MODIS, available on the BDQueimadas website [32]. This product presents active fires with date and quality indices at a global scale, reproducing them at 1 km spatial resolution [33]. This product is considered a "reference satellite" to the Progama Queimadas of the INPE, and it is the basis for comparisons and trend analysis for the whole Brazilian territory.
It is helpful to clarify how the two fire products are used in this paper. Given the influence of cloud cover on the availability of burned area measurements [31,34,35], the robustness of results generated with such data were subjected to an assessment with thermal anomalies, which are cloud-free, in the place of burned area in the econometric analysis (as reported in Section 3.4). Nevertheless, there was no municipal-monthly observation for which missing burned area values at a finer level (for instance, at cell-grid-day) were numerous enough to prevent calculation of burned area at the observational unit level (what was due both to the monthly timescale of the dataset and to Brazilian Amazon municipalities being large, at 6300 km 2 on average).
The wind data were obtained using the ERA5 monthly aggregate product, which combines model data with observations across the world, providing a global monthly product with 0.25 degrees of spatial resolution [36]. This product provides the U and V components of the wind, which we converted into wind speed and direction using the "rWind" package [37].

Explanatory Variable
Pollutant information was introduced by the land aerosol optical depth (AOD), gathered by the MCD19A2 version 6 data product, and reported in the blue band (0.47 µm). This product is a daily MODIS Terra and Aqua combined multi-angle implementation of atmospheric correction (MAIAC) at 1 km spatial resolution [38].
The measures of AOD reflect the sum of aerosols from natural and anthropogenic sources, such as emission from industries, vehicles, water vapor, dust, and burned biomass [39]. The product presents a high overall correlation with the ground measurements ( ∼ =95%) over South America, and is better over the forest, savannah, grassland, and cropland areas [40]. Moreover, the AOD has been widely used to estimate highly dangerous pollutants for the human respiratory system, such as particulate matter (PM), with considerable accuracy [41,42].
Even with AOD being only one dimension of the atmospheric pollution process engendered by biomass burning, previous studies have demonstrated its correlation with pollutants such as PM2.5 [18], carbon monoxide [43], and sulfur dioxide [44], which also play important roles with regards to impacts on human health. We also clarify that measurements of pollutants' concentrations were not available at the whole-Amazon level for the period of analysis

Dependent Variables
Hospitalization counts were retrieved from TabNet [45], which is a governmental platform maintained by the informatics department of the Brazilian Health System ("Departamento de Informações do SUS", DataSUS). We collected three respiratory morbidities (hospitalization code following the International Classification of Disease (ICD) number 10, where diseases of the respiratory system are those between J00 and J99) that are commonly accounted for in the literature as indicators of fire pollution impact: asthma (ICD10: J45-J46), pneumonia (ICD-10: J12-J18), and bronchitis (ICD-10: J20-J21). The total hospitalization due to respiratory illnesses was also considered. This information was filtered by municipality of residence and month of attendance. Due to the higher vulnerability of children and the elderly to pollution, hospitalizations Remote Sens. 2022, 14, 69 6 of 19 of such age groups due to any respiratory illnesses were also counted separately, comprising small children (≤ 4 years old), children (between 5 and 14 years old), the elderly (≥ 65 years old), and all ages.

Control Variables
Estimated population data were collected from the DataSUS portal [45], which compiles the yearly estimates produced by IBGE, being compatible with the Brazilian legal provision. These data are available yearly for each Brazilian municipality from 1992 to 2019.
Gross domestic product (GDP) is provided by the IBGE [46], and it is released yearly by municipality, in both current and aggregated prices.
Mortality of infants and mortality due to chronic diseases were collected from the DataSUS portal, filtering by municipality of residence and month of attendance. Infant mortality refers to the death of fetuses and children under one year old. For chronic disease mortality, we considered four groups, in accordance with Malta et al. [47]: cardiovascular diseases (ICD-10: I00 to I99), cancer (ICD-10: C00 to C97), respiratory issues (ICD-10: J30 to J98), and diabetes (ICD-10: E10 to E14).
Data on health professionals, hospital beds, and facilities were also obtained from the DataSUS portal.
Weather information came from two different products: the temperature was acquired from MOD11A2, which provides an 8-day average of land surface temperature with a 1 km spatial resolution [48], while precipitation data came from CHIRPS (the Climate Hazards Group InfraRed Precipitation with Station data), which is a global rainfall dataset with 0.05 • spatial resolution [49]. This product has a good performance both globally and locally, and particularly in the BLA, where its values explain 73% of the precipitation, with a root-mean-square error (RMSE) below 15 mm [50].
Regarding the proxies for other sources of pollution, road density was generated by extracting the road component from the total road network provided by OpenStreetMap [51], while the area of the municipalities was provided by the IBGE. The fleet of cars was found on the DENATRAN website [52], which provides the number of registered cars by month and by municipality for the whole of Brazil.
Lastly, urban density (km 2 /km 2 ) was measured using the MapBiomas product. These land use and land cover data are available annually from 1985 to 2019, with 30 m spatial resolution, based on Landsat images and produced using pixel-based classification and machine learning [53,54].

Spatial Aggregation
Spatial variables (BA, TA, wind, AOD, temperature, precipitation, roads, and urban zones) were assembled with Google Earth Engine, and were clipped and aggregated at municipal-monthly level. Then, the exported tables were processed in the RStudio environment to merge all variables (spatially and non-spatially explicitly) into a single table compatible with the Stata 14.1 format.
It is worth mentioning that over 20 years of data, the municipalities' boundaries experienced several legal updates, including the creation of new municipalities. To follow those changes, we used each annual current delimitation provided by the IBGE to perform each clip and aggregation.

Empirical Analysis
First, we present the main assumption grounding our data analysis, which is suggested by atmospheric science and epidemiological literature [18,22,55]. Once burnings of considerable size and number occur in a location, pollutants are released and transported through the atmosphere, causing considerable appearance or exacerbation of respiratory illnesses. This process is a function of many variables. Fire emissions are primarily affected by fuel type, the temperature of the fire, and climatic conditions [56]. In the atmosphere, fire pollution may be mixed with pollutants of vehicular or industrial sources, and is also affected by weather conditions [57]. Finally, the number of hospitalizations is influenced by the size of population, its health level and income, and the availability of healthcare facilities and staff [6,58].
With that in mind, to evaluate the impact of fire-induced pollution on health, we followed the methodology proposed in [7,16,17,59], which used the instrumental variable (IV) technique with wind as the exogenous source of variation that identifies the effects of endogenous pollution on respiratory morbidity; He et al. [17] and Sheldon and Sankaran [7] also adopted fire as a second instrument. The IV method is based on the two-stage least squares estimator (2SLS) that corresponds to Equations (1) and (2) below, with "i" indexing the spatial unit and "t" the time period: LogAOD it is the logarithm of atmospheric optical depth; IV it is a set of instrumental variables; CrVi t is the control variable vector; LogHosp it is the logarithm of hospitalization count; Remote Sens. 2021, 13, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/remotesensing First, we present the main assumption grounding our data analysis, which is suggested by atmospheric science and epidemiological literature [18,22,55]. Once burnings of considerable size and number occur in a location, pollutants are released and transported through the atmosphere, causing considerable appearance or exacerbation of respiratory illnesses. This process is a function of many variables. Fire emissions are primarily affected by fuel type, the temperature of the fire, and climatic conditions [56]. In the atmosphere, fire pollution may be mixed with pollutants of vehicular or industrial sources, and is also affected by weather conditions [57]. Finally, the number of hospitalizations is influenced by the size of population, its health level and income, and the availability of healthcare facilities and staff [6,58].
With that in mind, to evaluate the impact of fire-induced pollution on health, we followed the methodology proposed in [7,16,17,59], which used the instrumental variable (IV) technique with wind as the exogenous source of variation that identifies the effects of endogenous pollution on respiratory morbidity; He et al. [17] and Sheldon and Sankaran [7] also adopted fire as a second instrument. The IV method is based on the two-stage least squares estimator (2SLS) that corresponds to Equations (1) and (2) below, with "i" indexing the spatial unit and "t" the time period: is the logarithm of atmospheric optical depth; IV it is a set of instrumental variables; CrVit is the control variable vector; LogHospit is the logarithm of hospitalization count; Ɛit and uit are time-variant disturbance terms, and bi. and ci. are time-invariant disturbance terms. The Greek letters represent parameters to be estimated.
Regarding the IVit vector, three instrumental variables were employed: burned area (Ba) in logarithmic form (LogBA), wind direction (WDirec), and wind speed (WSpeed). These were combined, either in original form or as interactions (denoted with "*"), into four alternative sets of instrumental variables, all of which included LogBA separately. The first and most comprehensive set included the two interactions of burned area with the two wind measures (i.e., Ba * WSpeed, and Ba * WDirec). The second and third sets included only one of the possible fire-wind interactions (either Ba * WSpeed or Ba * WDirec). The fourth and least comprehensive set did not include any interactions (only LogBA). The last three sets were used as potential alternatives to estimation when the most comprehensive set was rejected in its validity by the overidentification test. In this way, the least comprehensive was the last possible alternative in case the two double-instrument sets were rejected. Now, regarding the CrVit vector, the control variables captured factors influencing supply and demand for healthcare belonging to the following thematic classes: These variables are crucial to estimate when they address weather and socioeconomic confounders that could bias the estimation of the causal effect of pollution on hospitalizations. If these variables were not included, they would be left to the disturbance term, thus potentially creating omitted variable bias [15].
The identification strategy was applied both to the whole count of hospitalizations and to counts associated with specific ages and illnesses. Confounders, whose influence is mitigated by the IV approach, may affect illness propensity in the form of viral, bacteriological, and environmental factors varying across illnesses [61,62]. Therefore, since it and u it are time-variant disturbance terms, and b i . and c i . are time-invariant disturbance terms. The Greek letters represent parameters to be estimated.
Regarding the IV it vector, three instrumental variables were employed: burned area (Ba) in logarithmic form (LogBA), wind direction (WDirec), and wind speed (WSpeed). These were combined, either in original form or as interactions (denoted with "*"), into four alternative sets of instrumental variables, all of which included LogBA separately. The first and most comprehensive set included the two interactions of burned area with the two wind measures (i.e., Ba * WSpeed, and Ba * WDirec). The second and third sets included only one of the possible fire-wind interactions (either Ba * WSpeed or Ba * WDirec). The fourth and least comprehensive set did not include any interactions (only LogBA). The last three sets were used as potential alternatives to estimation when the most comprehensive set was rejected in its validity by the overidentification test. In this way, the least comprehensive was the last possible alternative in case the two double-instrument sets were rejected. Now, regarding the CrV it vector, the control variables captured factors influencing supply and demand for healthcare belonging to the following thematic classes: These variables are crucial to estimate when they address weather and socioeconomic confounders that could bias the estimation of the causal effect of pollution on hospitalizations. If these variables were not included, they would be left to the disturbance term, thus potentially creating omitted variable bias [15].
The identification strategy was applied both to the whole count of hospitalizations and to counts associated with specific ages and illnesses. Confounders, whose influence is mitigated by the IV approach, may affect illness propensity in the form of viral, bacteriological, and environmental factors varying across illnesses [61,62]. Therefore, since distinct definitions of the dependent variable may be subject to distinct levels of influence from sources of illnesses not related to pollution, the performance of the identification strategy was more broadly assessed.
The estimation was carried out on the Stata program, with a balanced panel structure (at a monthly-municipal level) with fixed effects (state, year, and month) and a heteroscedasticity-autocorrelation "robust" variance-covariance matrix. To check whether the approach used was valid, the following testing sequence was adopted:

1.
Generalized Hausman test for the null of consistency of the random-effects estimator (no omitted heterogeneity bias) against the alternative of consistency of the fixedeffects estimator only. Here, the "xtoverid" command developed by Schaffer and Stillman [63] was applied; 2.
Pollution exogeneity test for the null that ordinary least squares (OLS) would yield consistent estimates over the IV estimates, i.e., pollution is exogenous. For this, the "dmexogxt" command from Baum and Stillman [64] was applied; 3.
Sargan's overidentification test for instrument validity which assumes validity under the null hypothesis that all instruments are valid, while rejection is interpreted as indicating that at least one of the instruments is not valid [65]. For this, the "xtoverid" command from Schaffer and Stillman [63] was used; 4.
Tests for instrument weakness in the first stage, which were based on post-estimation procedures with robust covariance matrix: a. Joint significance of the instruments (robust F-test), with the test statistic compared to the rule-of-thumb value of 10, as proposed by Staiger and Stock [66] and Shao and Stoecker [67]; b.
Stock and Yogo's IV weakness test, which presumes homoscedastic errors in the instruments, and its null hypothesis is that the instrument is weak. This is rejected whenever the "minimum eigenvalue statistic" [65] exceeds the "critical value" at a 10% level.
When exogeneity was not rejected in step 2, OLS estimation was applied. Conversely, IV estimation was pursued, for which all three instruments were initially used. Then, if such a set of instruments was not accepted in step 3, the list was reduced stepwise until no instrument was deemed invalid or only one instrument remained.

Simulation: Hospitalizations Attributable to Fires
The count of hospitalizations attributable to fires was estimated asĤosp it =β 1 .γ 1 .logBa it + β 1 .γ 2 .Ba it * Wspeed it +β 1 .γ 3 .Ba it * WDirec it . The first component is the product of the estimated coefficients measuring the effect of fires on pollution (α 1 ), and of pollution on total hospitalizations (β 1 ) and the observed value of fires. The two remaining components, in line with estimated econometric models, incorporate the effects of interactions between fires and wind speed and direction-the most comprehensive set of instrumental variables was used (due to the linearity of the econometric models, such coefficient products yield the same result as calculating the differences in predicted hospitalizations between the scenarios with observed fire levels and with the counterfactual null fire level. It is in this sense that hospitalizations attributable to fires are estimated). The results obtained at the level of the i-th municipality and t-th month were summed across the whole dataset and then divided by the sum of observed hospitalizations; the latter was also calculated for the whole dataset, which generated the estimate for the share of hospitalizations attributable to fires.

Data Description
Despite having available data for the last two decades (2001-2020), not all variables were available for this whole period; therefore, the time window of analysis started in August 2005 and ended in December 2018. Descriptive statistics are shown in Table 2.   Table 2. For the main variables (burned area, AOD, and hospitalizations), graphical representations are presented in both the space ( Figure 2) and time (Figure 3) dimensions.     On average, hospitalizations due to pneumonia (11.34) are more frequent than those for asthma (3.10) and bronchitis (0.73), with the illnesses' shares being 55%, 15%, and 4%, respectively. Thus, the three illnesses together represent around 74% of all hospitalizations, while the other 26% are composed of other respiratory diseases such as pharyngitis, laryngitis, and flu. The distribution of hospitalizations for respiratory diseases follows an asymmetric pattern, where the 15 municipalities with the highest values (over two deviations) are mostly in the state of Pará.
From the age group perspective, the average monthly contribution of the small children group represents approximately 39% of the total, while children and the elderly represent 12% and 18%, respectively. Thus, the contribution of small children and the elderly, together, represents more than half of those hospitalized monthly for respiratory diseases, emphasizing once again the vulnerability of these groups.
The AOD values vary between 0.03 and 3.35, with an average of 0.24 for the region; their distribution follows a normal pattern, in which extreme values are found at the border of the BLA (less than two deviations) and in an area that extends from Rondônia to the northern border of Pará with Maranhão (more than two deviations).
The variation of the main variables over time ( Figure 3) shows similar seasonal patterns for fire and AOD, but a different pattern for hospitalizations. This discrepancy is clear from the asynchronous peaks, with burned area having two distinct peaks-a smaller one at the beginning of the year (February-March) and another at the end (August-September)-while hospitalizations peak in May. On average, hospitalizations due to pneumonia (11.34) are more frequent than those for asthma (3.10) and bronchitis (0.73), with the illnesses' shares being 55%, 15%, and 4%, respectively. Thus, the three illnesses together represent around 74% of all hospitalizations, while the other 26% are composed of other respiratory diseases such as pharyngitis, laryngitis, and flu. The distribution of hospitalizations for respiratory diseases follows an asymmetric pattern, where the 15 municipalities with the highest values (over two deviations) are mostly in the state of Pará.
From the age group perspective, the average monthly contribution of the small children group represents approximately 39% of the total, while children and the elderly represent 12% and 18%, respectively. Thus, the contribution of small children and the elderly, together, represents more than half of those hospitalized monthly for respiratory diseases, emphasizing once again the vulnerability of these groups.
The AOD values vary between 0.03 and 3.35, with an average of 0.24 for the region; their distribution follows a normal pattern, in which extreme values are found at the border of the BLA (less than two deviations) and in an area that extends from Rondônia to the northern border of Pará with Maranhão (more than two deviations).
The variation of the main variables over time ( Figure 3) shows similar seasonal patterns for fire and AOD, but a different pattern for hospitalizations. This discrepancy is clear from the asynchronous peaks, with burned area having two distinct peaks-a smaller one at the beginning of the year (February-March) and another at the end (August-September)while hospitalizations peak in May. Figure 4 presents the monthly variation of the burned areas considering the wind speed and the prevailing wind direction. Between June and September-the period that covers the burning season in the region-there is the presence of winds coming from the northwest, and the average wind speed increases, reaching a maximum between August and September, at the peak of fires. Subsequently, the wind pattern returns to the southwest direction, with winds that can reach up to 8 m/s.

Econometrics Estimates
A first and descriptive assessment of a "weak instrument"-that is, of the possibility that the instrument is not significantly related to pollution-is provided by the correlation between the instruments and pollution, as shown in Table 3. Only the logarithm of burned area (LogBA) did not exhibit a significant correlation with pollution, violating the instrument relevance requirement. Nevertheless, this is not conclusive evidence, as it is biased by socioeconomic and weather confounders that are addressed by the multivariate approach presented subsequently (see "weak instrument test" in Table 4), as attested by the positive and significant coefficients estimated in the first-stage regressions for all age groups and illnesses.  Figure 4 presents the monthly variation of the burned areas considering the wind speed and the prevailing wind direction. Between June and September-the period that covers the burning season in the region-there is the presence of winds coming from the northwest, and the average wind speed increases, reaching a maximum between August and September, at the peak of fires. Subsequently, the wind pattern returns to the southwest direction, with winds that can reach up to 8 m/s.

Econometrics Estimates
A first and descriptive assessment of a "weak instrument"-that is, of the possibility that the instrument is not significantly related to pollution-is provided by the correlation between the instruments and pollution, as shown in Table 3. Only the logarithm of burned area (LogBA) did not exhibit a significant correlation with pollution, violating the instrument relevance requirement. Nevertheless, this is not conclusive evidence, as it is biased by socioeconomic and weather confounders that are addressed by the multivariate approach presented subsequently (see "weak instrument test" in Table 4), as attested by the positive and significant coefficients estimated in the first-stage regressions for all age groups and illnesses.

LogBA
BaWSpeed BaWDirec AOD 0.0023 ns −0.0672 *** −0.0644 *** Significance levels: *** p < 0.001, ns not significant. Table 4 shows the results obtained with the estimator that was indicated by the four-step test procedure for each dependent variable. Detailed estimation results, also containing control variables and the complete set of instruments (LogBA, BAWSpeed, and BAWDirec) and estimators (OLS, 2SLS, fixed effects, and random effects), are presented in the Supplementary Materials.  Table 3. Correlation between the instrumental variables and the explanatory variable.

LogBA
BaWSpeed BaWDirec AOD 0.0023 ns −0.0672 *** −0.0644 *** Significance levels: *** p < 0.001, ns not significant. Table 4 shows the results obtained with the estimator that was indicated by the four-step test procedure for each dependent variable. Detailed estimation results, also containing control variables and the complete set of instruments (LogBA, BAWSpeed, and BAWDirec) and estimators (OLS, 2SLS, fixed effects, and random effects), are presented in the Supplementary Materials.
It should be noted that for all models tested, the value of the Hausman test indicated a significant difference between the coefficients generated by the fixed-and random-effects estimators, indicating consistency only of the former.
Among the four IV sets, the IVc01 (LogBa + BaWSpeed + BaWDirec) and IVc02 (LogBa + BaWSpeed) were the only sets that met all criteria in the four-step testing for a valid instrument (Section 2.4) for the total (respiratory) hospitalization dependent variable. However, according to the predefined testing order, the larger set (IVc01) was the one that grounded estimation of the effect of pollution from fires on total hospitalizations for respiratory diseases in the BLA. The size of the estimate was such that if AOD increased by 1%, total hospitalizations would increase by 0.1383%, in the average municipalitymonth. An increase of one standard deviation (0.17) would imply an increase of 2.35% in hospitalizations.
A positive effect was also estimated for asthma, bronchitis, and for the elderly. In the former, the IVc04 instrument set was applied, revealing that hospitalizations would increase by approximately 0.08% for each 1% rise in the AOD level. For bronchitis and the elderly, pollution was exogenous and, thus, OLS estimation was implemented, resulting in 0.002% (p < 0.001) and 0.0103% (p < 0.01) effects on hospitalization, respectively.  Counterintuitively, a negative coefficient was estimated for small children and children. For these two hospitalization counts, the IVc04 set (LogBa) passed the four-step test for a valid instrument, indicating that each increase of 1% in AOD levels decreases the hospitalization of small children by 0.34% (p < 0.001), and of children by 0.18% (p < 0.01). Hospitalization due to pneumonia was not significantly impacted by AOD.

Hospitalization Attributable to Fires
It was estimated that 5% of respiratory hospitalizations observed in the horizon of analysis were attributable to fires. The monthly trend is shown in Figure 5 and, in absolute terms,~822 people were estimated to be hospitalized per month due to fire-induced pollution, with a slight increase during the fire season.

Hospitalization Attributable to Fires
It was estimated that 5% of respiratory hospitalizations observed in the horizon of analysis were attributable to fires. The monthly trend is shown in Figure 5 and, in absolute terms, ~822 people were estimated to be hospitalized per month due to fire-induced pollution, with a slight increase during the fire season.

Robustness Check
As an assessment of the robustness of the results obtained, burned area was replaced by point fire detections based on thermal anomalies, which are observed even under weather conditions (e.g., severe cloud coverage) that prevent detection of burned areas. The analysis was re-conducted as shown in Table 5, where all of the new instruments presented positive and significant values.

Robustness Check
As an assessment of the robustness of the results obtained, burned area was replaced by point fire detections based on thermal anomalies, which are observed even under weather conditions (e.g., severe cloud coverage) that prevent detection of burned areas. The analysis was re-conducted as shown in Table 5, where all of the new instruments presented positive and significant values. Table 5. Correlation among the instrumental variables and AOD.
The variation over the years, and its average by month (Figure 6), show a similar seasonality to that presented by the burned area, wherein the count increases from May onwards, reaching its peak in August-September; however, the low peak of the beginning of the year (February-March)-as observed for the burned area-is not visible.

Robustness Check
As an assessment of the robustness of the results obtained, burned area was replaced by point fire detections based on thermal anomalies, which are observed even under weather conditions (e.g., severe cloud coverage) that prevent detection of burned areas. The analysis was re-conducted as shown in Table 5, where all of the new instruments presented positive and significant values. Table 5. Correlation among the instrumental variables and AOD.
The variation over the years, and its average by month (Figure 6), show a similar seasonality to that presented by the burned area, wherein the count increases from May onwards, reaching its peak in August-September; however, the low peak of the beginning of the year (February-March)-as observed for the burned area-is not visible.  Table 6 presents the results based on specification selected by the four-stage testing procedure. For each of the seven hospitalization counts, thermal anomaly count was employed as an instrumental variable that interacted with wind direction and wind speed. The use of this product returned negative effects of pollution for all models in which the IV estimator was appropriate (i.e., total, small children, children, the elderly,  Table 6 presents the results based on specification selected by the four-stage testing procedure. For each of the seven hospitalization counts, thermal anomaly count was employed as an instrumental variable that interacted with wind direction and wind speed. The use of this product returned negative effects of pollution for all models in which the IV estimator was appropriate (i.e., total, small children, children, the elderly, pneumonia, and bronchitis), whereas in the only model estimated with OLS (asthma), the effect was positive.
Such results suggest that, aside from the substantial correlation between point and areal fire detections (0.95; Figure 7), their statistical behavior is not similar enough regarding the relationship between pollution and hospitalizations. Even with the higher frequency of point detections by the TA product [68], the correlation with BA at low levels (less than 5000 km 2 )-especially between February and April ( Figure 7)-led to a mismatch in the early peak of burning that occurs when hospitalizations start to increase (Figures 3 and 6). This may have favored the appearance of negative estimates in the robustness check, due to the appearance of a well-defined valley.

Discussion
The coefficient (0.1383%) here estimated for total hospitalization is tenfold larger than the one calculated by Rocha and Sant'anna [59] for the BLA in which, using the interaction between the wind direction and pollutants from neighboring municipalities as an instrumental variable, the authors found that an increase of 17.3 μ/m 3 (one standard deviation, or a 107% increase compared to the average) in PM2.5 levels causes an average increase of ~1.5% (0.95 hospitalizations per 100,000 inhabitants) in total hospitalizations. It should be emphasized that the authors did not attempt to measure the effects of fire or of fire-induced pollution, but of pollution in general. Nevertheless, larger coefficients were estimated in other papers, such those by Mendonça et al. [21] and Mendonça et al. [69], suggesting that our estimates are not outside the range of the literature.

Discussion
The coefficient (0.1383%) here estimated for total hospitalization is tenfold larger than the one calculated by Rocha and Sant'anna [59] for the BLA in which, using the interaction between the wind direction and pollutants from neighboring municipalities as an instrumental variable, the authors found that an increase of 17.3 µ/m 3 (one standard deviation, or a 107% increase compared to the average) in PM2.5 levels causes an average increase of~1.5% (0.95 hospitalizations per 100,000 inhabitants) in total hospitalizations. It should be emphasized that the authors did not attempt to measure the effects of fire or of fire-induced pollution, but of pollution in general. Nevertheless, larger coefficients were estimated in other papers, such those by Mendonça et al. [21] and Mendonça et al. [69], suggesting that our estimates are not outside the range of the literature.
Additionally, the negative coefficients obtained in this paper for Small Children and Children are paralleled by the estimates by Smith et al. [19] along the eastern and southern edges of the BLA region, using geographically weighted Poisson regression for AOD and respiratory disease in children during the 2005 and 2010 droughts. This relationship opposes results previously found for the BLA by Carmo et al. [22], Jacobson et al. [23], and Rocha and Sant'anna [59]. This may be due to the larger hospitalization peak not occurring in the dry season, when the fire level peaks, but towards the end of the wet season, when humidity is still high and temperature low. This fact was not highlighted by the cited studies, but has been observed by others [19,20,70,71]. Particularly, Silva et al. [70] and Andrade Filho et al. [71] argued that the colder and more humid wet season weather may drive hospitalization of children due to respiratory illnesses, with the former authors having found such correlations in BLA data, and the latter authors arguing that such a factor could exert stronger influence than aerosol pollution.
The wind analysis follows the patterns presented by Gatti et al. [72] and Cassol et al. [73] for the Amazon region, including the dispersion of aerosols, in which there is a change in the predominant wind direction at the beginning of the fire season. This factor can spread smoke plumes to neighboring municipalities, states, and even other Amazonian countries, representing a transboundary issue exemplified by the study of Sheldon and Sankaran 2017 [7].
The main limitations of this study are worth detailing. Related to the statistical approach, the difficulty of establishing the causal effect of fire pollutants on hospitalizations is also revealed in the data description (Section 3.1). At first, when looking strictly at spatial variation, simultaneously high values of the three main variables (burned area, AOD, and hospitalizations for respiratory diseases) were not geographically widespread. This demonstrates the need for going beyond descriptive patterns and bivariate correlation coefficients, and towards an identification strategy that also accounts for a range of potential predictors as covariates. Such an approach to the estimation of pollution's effect on health proved successful in this paper, leading to results that are consistent with those of previous studies. In particular, Souza et al. [74], analyzing the relationship between deforestation, hot spots, pollutants (PM2.5), and hospitalizations of indigenous people from the BLA, also found a positive relationship.
Finally, the outcome of the robustness test should be interpreted as a lack of robustness of estimates specifically with regard to alternative measures of fire. Nevertheless, with regard to statistical accuracy and precision, the effects estimated were robust with fire measured as burned area.
Hospitalization data are also limited, including only patients who remain at the hospital under supervision for a period equal to or longer than 24 h [75], which excludes inpatients with mild symptoms, as well as outpatients. In other words, the level of pollution exposure required for a person to be hospitalized is above that which triggers mild symptoms and outpatient visits; is the former is less likely and, thus, generally less prominent in the data.
Other dimensions of the fire phenomenon beyond the spatial domain embraced and the count of active events-such as fire radiative power and combustion efficiency-may also influence the amount of pollution released and the consequent health impact [56,76]. These could be considered in future analysis as control variables in order to further isolate the pollution and health effects of the dimensions considered in this text from the influence of other dimensions.
It is helpful for future research on this topic to emphasize the need for careful analysis of the relationship between fires, pollution, and hospitalizations. This paper is proof that in this matter, a descriptive analysis seeking patterns visible to the naked eye may be misleading, as attested by the contrast between Section 3.1, in which such an endeavor was pursued, and Section 3.2, in which a proper statistical analysis was conducted. More precisely, the data description revealed no clear and directly proportional relationship between fires, pollution, and hospitalizations. Nevertheless, such a relationship was indeed present in the data, as unveiled by a more detailed and statistically sound multivariate analysis, which filtered out the influence of other variables that could act as drivers of pollution and hospitalizations. Such filtering thus proved to be critical for observing how the three main variables were actually related in the dataset. In this regard, we advise future researchers both to not take visual patterns for granted-which can serve as refutations or corroborations of hypotheses under test-as well as to guard against spurious correlations by resorting to a multivariate analysis that includes other factors suggested by the available knowledge, beyond the main variables of fires, pollution, and measure of health impact. These recommendations are aligned with the state of the art of the emerging literature on the econometrics of pollution, which also emphasizes the need to resort to causal inference methods for accurately measuring the effect of fire-induced pollution on health (Deryugina et al. [16], Rangel and Volg, [77], and He et al. [17]).

Conclusions
In this article, we estimated the hospitalization attributable to fire-induced pollution in the Brazilian Legal Amazon (BLA), using a balanced panel at a monthly-municipal level, with available data from the past 20 years (2005-2018). The impact of fire-induced pollution on age groups and specific illnesses was detailed. For this, we applied the instrumental variable approach by exploring the speed and direction of wind, along with fires, as exogenous sources of variation. Therefore, the evidence on hospitalizations was expanded by embracing the whole BLA with a 20-year time series. As a result, of the seven measures of hospitalization frequency considered, four were significantly increased by fire-induced pollution, indicating a considerable detrimental health impact on regional inhabitants. Methodologically, the potential of econometric analysis of remote sensing and georeferenced data to shed light on the impacts of anthropogenic burnings was demonstrated for the specific case of the Brazilian Amazon.
To overcome the limitations that may prevent the detection of a positive relationship between the key variables-the bias of hospitalization records towards severe symptoms, and lack of robustness to alternative fire measures-alternative analytical strategies could be pursued in future studies. These could include other identification strategies (such as the one proposed by Rangel and Volg [77]), reliance on time-series methods able to better model seasonality, consideration of other pollution measures, and even downscaling the time aggregation to capture intra-monthly peaks.