Artiﬁcial Neural Networks to Estimate the Inﬂuence of Vehicular Emission Variables on Morbidity and Mortality in the Largest Metropolis in South America

: The emission of pollutants from vehicles is presented as a prime factor deteriorating air quality. Thus, seeking public policies encouraging the use and the development of more sustainable vehicles is paramount to preserve populations’ health. To better understand the health risks caused by air pollution and exclusively by mobile sources urges the question of which input variables should be considered. Therefore, this research aims to estimate the impacts on populations’ health related to road transport variables for S ã o Paulo, Brazil, the largest metropolis in South America. We used three Artiﬁcial Neural Networks (ANN) (Multilayer Perceptron—MLP, Extreme Learning Machines—ELM, and Echo State Neural Networks—ESN) to estimate the impacts of carbon monoxide, nitrogen oxides, ozone, sulfur dioxide, and particulate matter on outcomes for respiratory diseases (morbidity—hospital admissions and mortality). We also used unusual inputs, such as road vehicles ﬂeet, distributed and sold fuels amount, and vehicle average mileage. We also used deseasonalization and the Variable Selection Methods (VSM) (Mutual Information Filter and Wrapper). The results showed that the VSM excluded some variables, but the best performances were reached considering all of them. The ELM achieved the best overall results to morbidity, and the ESN to mortality, both using deseasonalization. Our study makes an important contribution to the following United Nations Sustainable Development Goals: 3—good health and well-being, 7—a ﬀ ordable and clean energy, and 11—sustainable cities and communities. These research ﬁndings will guide government about future legislations, public policies aiming to warranty and improve the health system. software, H.S. and J.T.B.; formal analysis, Y.d.S.T. and T.A.A.; investigation, H.S., J.T.B., and Y.d.S.T.; resources, H.S.; data curation, Y.K. and D.M.d.G.C.; writing—original draft preparation, H.S., Y.K., and D.M.d.G.C.; writing—review and editing, Y.d.S.T. and T.A.A.; visualization, T.A.A.; supervision, H.S. and Y.d.S.T.; project administration, H.S.; funding acquisition,


Introduction
According to the Lancet Commission on pollution and health [1], 9 million premature deaths were attributable to air pollution in 2015 (16% of all deaths in the world), greater than deaths by In air the pollution field, Artificial Neural Networks have been widely used for air pollutants forecasting [27][28][29][30][31][32]. For example, in 1998, Gardner and Dorling [33] already concluded that ANN is a successful methodology to predict air pollutants concentration and that it provides better results than linear statistical methods due to the non-linear behavior of data. Nagendra and Khare [34] used ANN to explain traffic effects on the concentration of PM 10 . Zolghadri and Cazaurand [35] also predicted PM 10 concentrations using ANN, and concluded that inclusion of traffic emissions data improved the results. Fernando et al. [36] predicted PM 10 concentrations at metropolitan areas, showing that ANN are easier, faster and more economic without compromising prediction precision. Additionally, Cabaneros et al. [30] reported a review of researches with a focus on air pollution forecasting using ANN from 2001 to February 2019, resulting in 139 peer-reviewed articles.
However, ANN have been used less to estimate the human health impacts of air pollution. Wang et al. [37] studied the impact of nitrogen oxides (NO x ), sulfur dioxide (SO 2 ), carbon monoxide (CO), total suspended particles (TSP), and PM 10 on mortality for respiratory diseases in Beijing (China) using ANN. Kassomenos et al. [16] predicted the daily number of hospital admissions for cardio respiratory diseases due to CO, NO 2 , SO 2 , ozone (O 3 ), black smoke concentrations, and meteorological data (temperature, relative humidity, and wind speed and direction) in Athens, Greece. The authors applied Generalized Linear Models (GLM) and ANN, showing the superior performance of the last. Fontes et al. [38] developed ANN to predict the impact of traffic emissions on human health, using as input traffic and meteorological data. Sundaram et al. [39] applied ANN to predict respiratory, cardiovascular and total mortality due to temperature, relative humidity, and CO, SO 2 , NO x , hydrocarbons, O 3 , and particulates concentrations. Tadano et al. [40] assessed PM 10 , temperature and relative humidity impact on hospital admissions for respiratory diseases in Campinas city, Brazil. Polezer et al. [20] showed that ANN can be applied to forecast air pollution impact on human health, even when the dataset is short or has many gaps (a Brazilian data reality). Araujo et al. [41] used ten distinct ANN, four ensembles and Generalized Linear Models (GLM) to estimate PM 10 impact on hospital admissions for respiratory diseases in Campinas and São Paulo cities, Brazil, concluding that ANN and ensembles had better performances than GLM.
Subsequently, to estimate health effects related to road transport variables, we considered monthly input data from 2003 to 2017 for: road vehicles fleet; distributed and sold vehicles by fuels (gasoline, ethanol, diesel, and hybrid/electric); vehicle average mileage; concentration of gases (carbon monoxide-CO, nitrogen oxides-NO x , ozone-O 3 , and sulfur dioxide-SO 2 ); and particulate matter less than 10 m in aerodynamic diameter (PM 10 ); as output, we considered morbidity and mortality from respiratory diseases (International Classification of Diseases ICD-10/J00-J99). As mentioned, São Paulo was the target of our research.
To do so, we used the Variable Selection Methods and Mutual Information Filter and Wrapper together with three Artificial Neural Network (ANN) architectures: Multilayer Perceptron (MLP), Extreme Learning Machines (ELM), and Echo State Networks (ESN).
This research will provide useful information to the development of new technologies, aiming more consistent public policies to improve population life quality, such as air quality, health, and sustainable urban mobility.

Deseasonalization and Stationarization
Many real-time series present seasonal behaviors, mainly due to variations in the weather during a time window [42]. For example, the temperature changes according to the season, being higher in the summer and lower in the winter, and the rainfall periods are different in each season depending on the location [25,42].
The aforementioned cases show that the seasonal pattern is present in the real world and that it may disturb the prediction capability of the models. In this sense, the application of a mathematical preprocessing technique named deseasonalization prior to the effective data insertion in the models and the prediction can be useful [26].
Siqueira et al. [26] presented a procedure to deseasonalize monthly seasonal streamflow series. This method produces a new series with zero mean and unit variance, which is approximately stationary, and is given by Equation (1): where S i,m are the samples of the original series which is transformed into a new series Z i,m ; µ m and σ m are the mean and the standard deviations of each month m, respectively, estimated using Equations (2) and (3): where i = 1, 2, . . . , N y are the corresponding year for the month m = 1, 2, . . . ,12.
In addition to the deseasonalization application, data stationarization is a good way to transform a series into a stationary one, even if it does not have a seasonal component [43,44]. In this case, the mean and standard deviation are calculated to the whole series, using Equations (4) and (5): where N is the total number of samples available. Then, the new stationary series is given by Equation (6) [42]: It is important to mention that the seasonal component is reinserted after predictions and before the calculation of performance metrics.

Input Selection
Variable selection is an approach aiming to identify the best set of inputs that increase the approximation capability of a predictor. This process is very important, as it can reduce the number of parameters to be adjusted by the model, which leads to computational effort reduction and helps understand the data [45]. We used Mutual Information and Wrappers as variable selection techniques.

Mutual Information
The Mutual Information (MI) method is a type of filter approach to subset selection. Filters are characterized by data-based methods, as the subset is chosen according to some correlation between the input candidate and the target [45,46].
The MI comes from the information theory and measures the nonlinear dependence between inputs [26,47]. The mutual information among two independent variables (x and y) is given by Equation (7): where f x (x i ) and f y (y i ) are the respective marginal distributions of x i and y i , f xy (x i ,y i ) is the joint probability density function, and (x i , y i ) are the respective i-th pair of samples, with i=1, 2, . . . , N.
The aforementioned variables calculation in Equation (7) is given by city-block kernel functions, because they do not assume some distribution a priori. In addition, the bootstrap method is used to determine the confidence level [43].

Wrappers
The wrapper methodology for subset selection is quite different from the filters as it depends on the predictor [45]. Considering N, the total number of inputs, subsets are created, and the predictor adjustment and validation to each subset are done; we then observe the achieved performances. Therefore, the selected subset is the one that leads to the best results.
The wrapper's main advantage is that the model is taken into account to analyze the influence of each input. However, to be model-dependent, its computational effort tends to be elevated, as the model has to be adjusted to each candidate subset [46].
In this sense, forward selection is often used to reduce the number of adjusted models. To do so, initially an empty subset of inputs is considered. Then the model is adjusted considering each single input variable and the one that leads to the best performance is selected to the final subset. After that, the model is trained considering two inputs, the one selected in the previous step and the remaining (N-1) variables. The second variable selected is the one that leads to the smallest error, along with the first. These steps are repeated until the size of the subset is N, which means all of the variables are in it. The subset that leads to the best error metrics is chosen [26].

Artificial Neural Networks
This research used three architectures of Artificial Neural Networks (ANN): Multilayer Perceptron (MLP), Extreme Learning Machines (ELM), and Echo State Networks (ESN), to estimate road traffic variables' impact on human health. Due to the similarities in their structure, the ELM and ESN are collectively known as Unorganized Machines (UMs) [25].

Multilayer Perceptron
The Multilayer Perceptron (MLP) is the most known and used Artificial Neural Network (ANN) worldwide. It is a universal approximator, since it is capable of approximating any nonlinear, continuous, limited, and differentiable function [24]. The MLP is composed of groups of artificial neurons called layers: the first layer is responsible for transmitting the input signal to the hidden layers; these produce a nonlinear mapping, transforming the input signal to another space; and the last one is the output layer, which receives the transformed signal and, by means of linear combinations, generates the network output [24]. In the MLP architecture, the same layer neurons are not connected and do not exchange information, while neurons of disjoint layers are connected. This is a characteristic of feedforward networks, which do not have recurrence or feedback loops [48].
The information processing of an MLP can be expressed as follows. Initially, consider u as the vector containing the network inputs, b as the polarization vector (bias), and w i k n as the weights of the intermediate layer, with n = 1,..., N being the index of each input, k = 1,..., K indicating the neuron, and w 0 1k the connections of the output neuron. Therefore, the output of the network is given by Equation (8): where f is the activation function of the hidden neurons and f s is a linear activation function of the output neurons.
The training process of a neural network consists of adjusting the synaptic weights of the artificial neurons [24]. In the MLP, an iterative process is applied, usually by means of an unrestricted linear optimization algorithm solution [49]. To do so, the error gradient between the output network and the desired response is calculated using the backpropagation algorithm. The steepest descent is the algorithm most often applied to reduce this error [24].
It is noted that the user has to define the stop criterion. Often, the maximum number of iterations is adopted. It is important to mention that the cross-validation method must be applied to define the most appropriate topology and to avoid overfitting [49].

Extreme Learning Machines
Extreme Learning Machines (ELM) are single layer feedforward neural networks, introduced by Huang et al. [50]. In this proposal, the neurons' arrangement is quite similar to the traditional MLP architecture. However, ELM present a remarkable dissemblance in the training process [48].
The proposers proved that randomly generated weights of the hidden layer can stand untuned. The only previous condition is that the activation function of the intermediate neurons needs to be continuously differentiable [48]. Additionally, they demonstrated that ELM are universal approximators, as the insertion of new neurons in the intermediate layer leads to a decrease in the output error [26]. The training process of an ELM becomes simple and solely involves finding the best set of neuron weights in the output layer. This problem can be performed by solving a linear regression problem. Huang et al. [48] suggest the use of the Moore-Penrose generalized inverse operation to overcome the task, as this technique simultaneously minimizes the norm of the output weight vector and the mean square error between the network output and the desired signal.
In this work, considering the set of inputs u n = [u n , u n+1 , . . . , u N ] T , with n = 1, . . . , N being the corresponding input data, the activation x h n of the neurons in the intermediate layer (the output of this layer) is given by Equation (9): where W h is the matrix containing the hidden layer weights, b is a column vector of bias and f h (.) specifies neuron activation functions. The network output signal y is generated according to Equation (10): where W out is the output layer weights. Finally, the training process, as mentioned, is performed by the application of the Moore-Penrose inverse operation, according to Equation (11): where X h is the matrix containing the hidden layer outputs, X T h X h −1 X T h is the pseudo-inverse matrix, and d is the vector with the desired outputs (target).

Echo State Networks
Echo State Networks (ESN) were proposed by Jaeger [51]. They present similarities with ELM in terms of their simplicity in the training process. The main difference is that ESN are Recurrent Neural Networks (RNN), presenting feedback loops of information, with some outputs reinserted as inputs [52].
Jaeger [51] proved that, under specific conditions, the W matrix representing the dynamic reservoir is a nonlinear transformation directly influenced by the recent history of the input signal (hence the term echo). It allows the intermediate layer to be set in advance and can be kept unchanged during the training. Therefore, only the output layer has to be adjusted by means of least squares problem solution. The author named this condition "echo state propriety" [52].
ESN present recurrent connections within the intermediate layer, called the dynamic reservoir. The neurons activation in the reservoir (the output of this layer) is influenced by the current input and the previous state, according to Equation (12): where u n = [u n , u n+1 , . . . , u n−K−1 ] T is the input signal, W in is the input layer, W is the reservoir that presents the synaptic weights of the feedback connections and The ESN output signal is calculated by Equation (13): Jaeger [51] suggests a way to generate the reservoir that respects the echo state propriety. In this proposal, he created a W sparse matrix according to Equation (14). The elements (synaptic weights) of this matrix are given by: Finally, as in ELM, the training process is performed applying the Moore-Penrose inverse operation, as in Equation (15): where X T X −1 X T is the matrix containing the reservoir outputs and d is the desired response.

Case Study
The goal of this study is to forecast mortality and morbidity for respiratory diseases, considering variables regarding road vehicles transport, in São Paulo city, Brazil. The input variables are categorized as: road vehicles fleet; distributed and sold fuels amount (gasoline, ethanol, diesel, and hybrid/electric); vehicle average mileage; gases (CO, NO x , O 3 , and SO 2 ); and PM 10 concentrations. The considered outputs were morbidity and mortality for respiratory diseases. Monthly data for a long period (from January 2003 to December 2017) were used instead of a shorter daily series due to lack of daily information about road transport variables, such as vehicles fleet and fuel distribution.
São Paulo is the most populous city in South America and the seventh most populous city on Earth; it has approximately 12,038,175 inhabitants in a 1521 km 2 area and has 8,036,824 motor vehicles circulating every day. It is also an economic hub of Latin America and has a Human Development Index (HDI) of 0.805. The city presents air pollutants concentration around twice the Air Quality Guidelines established by the World Health Organization [9,53,54].
The temporal series of motor vehicles number in São Paulo city were obtained from the statistical series of the National Department of Transit [53]. The National Agency of Petroleum, Natural Gas and Biofuels [55] provided the distribution of fuels for retailers (gasoline, ethanol, diesel, and hybrid/electric). The vehicle average mileage, PM 10 , NO x , O 3 , SO 2 [g/m 3 ], and CO (ppm) concentrations were provided by the Companhia Ambiental do Estado de São Paulo (CETESB, São Paulo State Environmental Protection Agency) [54]. The pollutants concentrations were collected from ten stations of São Paulo city held by CETESB: Capão Redondo (PM 10 [54]. As the air quality database is daily, we calculated the monthly average. Finally, the morbidity and mortality rates for respiratory health problems (Codes J00-J99 from the International Classification of Diseases-ICD 10) were obtained from the Informatics Department of the Unified Health System [56]. It is necessary to highlight that the information comprises only public health, disregarding data from health insurances and private morbidity.
These targets are the main outcomes related to air pollution, the reason to be considered. In the studied period, February 2011 was the month with the highest morbidity (1461), and May 2006 had the highest mortality (29). The minimums occurred on January 2008 (169 for morbidity) and on March 2008 (2 for mortality). Figures 1 and 2 show the behavior of the morbidity and mortality databases, respectively. It is possible to observe the seasonal behavior of the morbidity data. NOx, O3, SO2, CO); and Nossa Senhora do Ó (PM10, O3) [54]. As the air quality database is daily, we calculated the monthly average. Finally, the morbidity and mortality rates for respiratory health problems (Codes J00-J99 from the International Classification of Diseases-ICD 10) were obtained from the Informatics Department of the Unified Health System [56]. It is necessary to highlight that the information comprises only public health, disregarding data from health insurances and private morbidity.
These targets are the main outcomes related to air pollution, the reason to be considered. In the studied period, February 2011 was the month with the highest morbidity (1461), and May 2006 had the highest mortality (29). The minimums occurred on January 2008 (169 for morbidity) and on March 2008 (2 for mortality). Figures 1 and 2 show the behavior of the morbidity and mortality databases, respectively. It is possible to observe the seasonal behavior of the morbidity data.  The computational step involved the 11 aforementioned input variables, and the desired signals (target) were mortality or morbidity. We applied the variables selection method (Mutual Information Filter and Wrapper), together with the three neural models previously described: MLP, ELM, and ESN. Additionally, we evaluated the performance considering all inputs at the same time. The adopted performance metrics were the Mean Square Error (MSE), the Mean Absolute Error (MAE), and the Mean Absolute Percentage Error (MAPE) [42]. NOx, O3, SO2, CO); and Nossa Senhora do Ó (PM10, O3) [54]. As the air quality database is daily, we calculated the monthly average. Finally, the morbidity and mortality rates for respiratory health problems (Codes J00-J99 from the International Classification of Diseases-ICD 10) were obtained from the Informatics Department of the Unified Health System [56]. It is necessary to highlight that the information comprises only public health, disregarding data from health insurances and private morbidity.
These targets are the main outcomes related to air pollution, the reason to be considered. In the studied period, February 2011 was the month with the highest morbidity (1461), and May 2006 had the highest mortality (29). The minimums occurred on January 2008 (169 for morbidity) and on March 2008 (2 for mortality). Figures 1 and 2 show the behavior of the morbidity and mortality databases, respectively. It is possible to observe the seasonal behavior of the morbidity data.  The computational step involved the 11 aforementioned input variables, and the desired signals (target) were mortality or morbidity. We applied the variables selection method (Mutual Information Filter and Wrapper), together with the three neural models previously described: MLP, ELM, and ESN. Additionally, we evaluated the performance considering all inputs at the same time. The adopted performance metrics were the Mean Square Error (MSE), the Mean Absolute Error (MAE), and the Mean Absolute Percentage Error (MAPE) [42]. The computational step involved the 11 aforementioned input variables, and the desired signals (target) were mortality or morbidity. We applied the variables selection method (Mutual Information Filter and Wrapper), together with the three neural models previously described: MLP, ELM, and ESN. Additionally, we evaluated the performance considering all inputs at the same time. The adopted performance metrics were the Mean Square Error (MSE), the Mean Absolute Error (MAE), and the Mean Absolute Percentage Error (MAPE) [42].
The data were divided into three sets: The deseasonalization procedure was applied to the following variables (using Equation (1)) due to their monthly seasonal behavior: morbidity, gasoline consumption, diesel consumption, PM 10 , NO x , O 3 , SO 2 , and CO concentrations. The other variables (mortality, ethanol consumption, number of electric/ hybrid vehicles, and mileage round) were only stationarized, using Equation (6).
The models' performance is summarized in Table 1, "WRP" being the acronym to Wrapper, "MI" being Mutual Information, "All" the use of all inputs, and "NN" the number of neurons in the neural models' hidden layer. The achieved values are the mean of 30 independent simulations and the best performances are highlighted in bold. The number of neurons was defined empirically by previous tests, initiating with three, then five and after that with an increase by increments of five until reaching 200 units. In addition, all the ANN present only one hidden layer [57].
Finally, we applied the Friedman's test with a 95% confidence interval. All cases achieved p-values very close to zero (below 0.01). This allows admitting that the obtained results are significantly different when changing the neural model.
To assure that using ANN is the best choice, we applied a Generalized Linear Model (GLM) with Poisson Regression for the morbidity case. As expected, the achieved errors were higher than all neural models (MSE = 170,424.47, MAE = 346.87, and MAPE = 57.69).
The computational results allow analyzing many aspects of the neural networks application to this problem.
In all cases, the use of the deseasonalization increases the neural models' performance. This indicates the relevance of withdrawing the seasonal data component. As discussed, this work performed nonlinear mapping, as we are using exogenous variables to estimate mortality and morbidity. However, studies that involve time series forecasting using only endogenous inputs, addressed this approach many times, since some linear models, like the AutoRegressive Integrated Moving Average (ARIMA), cannot work with seasonal variables [42].
Regarding the addressed variable selection methods, it is important to mention that the use of mutual information withdraws the following variables of the subset: number of electric and hybrid cars, CO, and SO 2 . However, it did not bring a performance gain, since in all scenarios the results were worse. Considering the wrapper methodology, although we do not show it numerically, the computational effort and the time spent to finish the simulations were many times higher than in the other cases. Variables code: (1) road vehicles fleet; (2)  Additionally, we need to highlight that, in spite of the variable selection methods excluding the low fleet of electric and hybrid cars (roughly 0.03%) from the analysis, this input variable should be considered in future studies, as its production and use is being encouraged.
Due to the conflicting results based on distinct error metrics to evaluate the performance, we adopted MSE as the most important one, because ANN tries to decrease MSE during the training process [24].
With that in mind, the Extreme Learning Machines (ELM) achieved the best overall results for morbidity, with a Mean Square Error (MSE) of 25,684.56, and the Echo State Network (ESN) reached the best performance for mortality (MSE of 22.80), both using deseasonalization and all inputs. These are important results because those architectures (the Unorganized Machines) are relatively new and the adjustment of their free parameters is simple and fast, requiring only a short time to be trained. As discussed in Section 2, such architectures do not perform an iterative process during training tasks, unlike the classic MLP.
In the same way, withdrawing any variables is not an advantage. Therefore, we can state that all of them are important and must be considered.
Unfortunately, there is no consensus regarding what is the best approach to estimate air pollution health risks [16,20,25,26,41], as it depends on the database behavior. Even for conventional regressions, such as the Generalized Linear Model, the best distribution may depend on the dataset, as reported by Ardiles et al. [58]. Therefore, the best results for mortality and morbidity being achieved by different models was expected. Figure 3 presents the boxplot of the MSE to the three neural models, considering deseasonalization and all inputs, for mortality (number of deaths) ( Figure 3a) and morbidity (number of hospital admissions) (Figure 3b). In both cases, the neural model achieving the best results also presents the smallest dispersion. It is another indicative that the models are suitable for these problems.
Sustainability 2020, 12, x FOR PEER REVIEW 11 of 15 results also presents the smallest dispersion. It is another indicative that the models are suitable for these problems.  Figure 4 shows the best results achieved for the addressed data, comprising all inputs with deseasonalization. Figure 4(a) shows the observed and estimated results due to mortality (number of deaths) in the test set, and Figure 4(b) shows the same outputs due to morbidity (number of hospital admissions) in the test set, demonstrating that the models are suitable to such problems. It is clear that the models predicted morbidity better than mortality, as the last one is a result of long-term exposure, more dependent on innumerous factors, such as people's life style and genetics. Another reason may be the patterned behavior of the morbidity series.

Conclusions
Decision-makers should base their future actions on analyses that are able to relate a set of variables that may have influence on critical issues. In this context, we used Artificial Neural Networks (ANN) as a tool that presented good performance on forecasting the impacts of road transport variables, including air pollutants concentration and fuel consumption, on São Paulo's population's health. The results of our work will guide the government regarding future legislations and public policies aiming to warranty and improve the health system.  Figure 4 shows the best results achieved for the addressed data, comprising all inputs with deseasonalization. Figure 4a shows the observed and estimated results due to mortality (number of deaths) in the test set, and Figure 4b shows the same outputs due to morbidity (number of hospital admissions) in the test set, demonstrating that the models are suitable to such problems. It is clear that the models predicted morbidity better than mortality, as the last one is a result of long-term exposure, more dependent on innumerous factors, such as people's life style and genetics. Another reason may be the patterned behavior of the morbidity series. results also presents the smallest dispersion. It is another indicative that the models are suitable for these problems.  Figure 4 shows the best results achieved for the addressed data, comprising all inputs with deseasonalization. Figure 4(a) shows the observed and estimated results due to mortality (number of deaths) in the test set, and Figure 4(b) shows the same outputs due to morbidity (number of hospital admissions) in the test set, demonstrating that the models are suitable to such problems. It is clear that the models predicted morbidity better than mortality, as the last one is a result of long-term exposure, more dependent on innumerous factors, such as people's life style and genetics. Another reason may be the patterned behavior of the morbidity series.

Conclusions
Decision-makers should base their future actions on analyses that are able to relate a set of variables that may have influence on critical issues. In this context, we used Artificial Neural Networks (ANN) as a tool that presented good performance on forecasting the impacts of road transport variables, including air pollutants concentration and fuel consumption, on São Paulo's population's health. The results of our work will guide the government regarding future legislations and public policies aiming to warranty and improve the health system.

Conclusions
Decision-makers should base their future actions on analyses that are able to relate a set of variables that may have influence on critical issues. In this context, we used Artificial Neural Networks (ANN) as a tool that presented good performance on forecasting the impacts of road transport variables, including air pollutants concentration and fuel consumption, on São Paulo's population's health. The results of our work will guide the government regarding future legislations and public policies aiming to warranty and improve the health system.
In spite of the a low relation between some inputs and the outputs, showed by the variable selection methods, the results demonstrated that withdrawing any variables is disadvantageous. Therefore, we may assume that all considered variables are important and must be considered. Even electric vehicles, accounting for only 0.03% of the vehicles fleet in São Paulo city, cannot be discarded, showing the importance of public policies in favor of electric cars.
The analysis showed that the Extreme Learning Machines (ELM) achieved the best overall results on morbidity, and the Echo State Network (ESN) reached the best performance on mortality, both using deseasonalization and all inputs. These are important results because the considered architectures are relatively new and the adjustment of their free parameters is simple and efficient, requiring only a short time to be trained.
In future research, it is important to develop an easy-to-use software including ELM and ESN and other neural models, even adding regression models to facilitate decision makers' applications.
Considering the United Nations' sustainable development goals (SDG) [59], which are actions needed in order to change the world, our study makes an important contribution, as it shows some relevant aspects that, if controlled or reduced, may decrease mortality and respiratory diseases related to air pollution (SDG 3-good health and well-being), and also mentions the importance of encouraging the use of electric cars and renewable fuels through tax incentive laws, increasing the participation of Brazil in the global energy matrix (SDG 7-affordable and clean energy). Therewith, we will contribute to reaching the goal of making cities and people conglomerates inclusive, secure, resilient, and sustainable (SDG11-sustainable cities and communities).