1. Introduction
This study employed a novel approach to gauge the levels of pollutants found in inhaled air using autonomic human responses as discerned using a suite of biometric sensors. The environmental and social context has a significant impact on human well-being. The issue of air pollution is of particular concern, as reported by the World Health Organization’s findings that both outdoor and indoor pollution contribute to more than 7 million premature deaths each year [
1]. Air pollution can come from various sources, including natural events such as wildfires and volcanic eruptions, as well as human activities such as vehicle emissions, industrial processes, and the operation of coal-fueled power plants.
The air quality standards established by the U.S. Environmental Protection Agency under the Clean Air Act include six pollutants. These include particulate matter (PM), carbon monoxide (CO), ground-level ozone, nitrogen dioxide (NO
2), sulfur dioxide (SO
2), and lead [
2]. Some of the other pollutants include carbon dioxide (CO
2) and volatile organic compounds. Particulate matter refers to minuscule solid or liquid particles that are present in the air and are categorized on the basis of their aerodynamic diameter. They include PM
1.0, PM
2.5, and PM
10 with an aerodynamic diameter of less than 1
m, 2.5
m, and 10
m, respectively. With the small size of PM
2.5, these particulates can penetrate deeply into the lungs and bloodstream, creating adverse health effects related to the respiratory system [
3], increased mortality [
4], heart disease [
5], inflammatory responses, and adverse birth-related effects [
6]. The pollutants we considered are exemplars of the wider human exposome [
7,
8,
9], which refers to the comprehensive accumulation of all environmental exposures that an individual encounters throughout their lifetime, including chemicals and biological agents. The exposome encompasses exposures to both gases and particulates, and appropriate care should be taken to include the often ignored ultrafine particulates [
10].
Guidelines on the recommended levels of exposure to pollutants provided by the World Health Organization (WHO) [
11] and the Environmental Protection Agency (EPA) [
12] contain only two designations: short-term exposures (an average of over 24 h) and long-term exposures (a 1-year average). Brief daily encounters, such as passing a construction site, walking on a busy road, or even working in poorly ventilated indoor spaces, can expose individuals to levels higher than the recommended guidelines. The size of airborne PM has a major influence on how far it can penetrate the lungs, which in turn affects human health. The WHO acknowledges that PM with diameters below 2.5
m (PM
2.5) has a significant disease burden on human health [
11,
13], while larger particles, although less likely to reach the alveoli, can still cause health problems by irritating the eyes, nose, and throat [
11]. Therefore, research efforts focused on prolonged exposure to poor air quality, including airborne particles of varying sizes, are of particular importance when considering long-term health.
The area of respiratory health receives significant attention due to the high incidence of poor air quality caused by factors such as smoke, vehicle emissions, and dust. Prolonged exposure to these sources, all of which produce PM of varying sizes, can affect long-term health, including physiological, psychological, and neurological functioning. For example, consider the following.
Inflammation: Exposure to air pollution can cause inflammation in the brain, which can cause cognitive impairment [
14,
15].
Oxidative stress: Exposure to air pollution can increase oxidative stress, leading to cell damage and cognitive impairment [
14,
16].
Reduced oxygen supply: Air pollution can reduce the amount of oxygen available to the body, which can lead to fatigue, decreased endurance, and impaired cognitive function [
17,
18,
19,
20].
Increased respiratory effort: Air pollution can increase the effort required to breathe, leading to reduced exercise capacity and decreased performance [
18,
21,
22].
Neurotransmitter disruption: Exposure to environmental pollutants such as lead, mercury, and polychlorinated biphenyls (PCBs) can alter neurotransmitter function and cause cognitive problems [
23,
24].
Epigenetic modifications: Exposure to environmental pollutants can lead to changes in DNA methylation and other epigenetic changes, which can contribute to cognitive problems [
25,
26,
27,
28].
The breakdown of the blood–brain barrier: Exposure to air pollution can disrupt the blood–brain barrier, allowing pollutants to enter the brain and cause neurological damage [
16].
Neurotoxicity: Exposure to certain environmental pollutants, such as lead, mercury, and polychlorinated biphenyls (PCBs), can be neurotoxic and affect the nervous system [
24,
29].
CO
2 exposure has been associated with cognitive problems [
30,
31,
32] and physiological changes in lung and cardiovascular function [
33]. Long-term exposure to NO
2, which is a gaseous pollutant, has been associated with cardiovascular disease, lung cancer, and respiratory problems, modifying the severity of asthma [
34,
35,
36,
37]. The inhalation of regulated nitric oxide (NO) under controlled conditions and medications that produce nitric oxide have a wide range of therapeutic uses, such as cardiopulmonary conditions [
38,
39]. On the other hand, NO, when inhaled in excess amounts, can react with oxygen to form NO
2 in the lungs, creating lung problems [
39,
40]. A higher concentration of NO is considered toxic, although limited studies have been performed on the direct effects of NO inhalation.
In this study, we combined data sets obtained from two different experimental paradigms and provided an overview of our previous work in which biometric data from participants were used to estimate and understand the effects of inhaled ambient PM
2.5 [
41], CO
2 [
42], and NO
2 [
43] on the human body using machine learning models and now including PM
1 and NO in the study as well. While long-term exposure to air pollution can result in plenty of health-related effects, as mentioned before, short-term exposure to air pollution also has immediate effects on the human body, bringing physiological changes immediately. In this study, we examined the autonomous responses on small temporal (∼2 s) and spatial (∼2 m) scales of the five mentioned pollutants within microenvironments. To comprehensively capture cognitive and physiological changes brought upon by air pollution, we made use of several sensors to measure as many biological measurements as possible, which included skin temperature, respiration rate, blood oxygen saturation (SpO
2), heart rate, the galvanic skin response (GSR), the pupil diameter of the left eye, the pupil diameter of the right eye, the distance between the pupils, and the measurement of electrical activity in the brain and heart using electroencephalography (EEG) and electrocardiogram (ECG), respectively.
Since the relations between and among variables are not always linear or functional, we made use of machine learning algorithms to perform regression for nonlinear, non-parametric, multidimensional data. The use of machine learning models has been shown to estimate ambient PM with high degrees of precision, especially PM
2.5 [
44,
45,
46]. By simultaneously measuring biological parameters and air quality components, we examined the interaction between the body and the environment while also testing the accuracy of estimating pollutants using machine learning techniques.
4. Discussion
The human body is a sensing system in itself, and it reacts to environmental variables and changes in them such as temperature, humidity, and air quality. It was previously shown that autonomous physiological and cognitive responses that result from the inhalation of particulate matter on a small temporal and spatial scale can be used to estimate PM
1 and PM
2.5 using machine learning models with very high accuracy [
46] in a study that was limited to a single participant. The inclusion of multiple participants in the experimental static bike ride paradigm in which the measurement of PM
1 and PM
2.5 was performed shows that the methodology that was implemented on a single participant can be extended to multiple participants as well, producing even better results for PM
1 and PM
2.5 with an R
2 value of nearly 1 and a very low RMSE, as shown in
Table 9. In fact, the results show that a few biometric variables are good enough to estimate PM
1 and PM
2.5 with similar results.
The time series plot of PM
1 and PM
2.5 in
Figure 17a,b shows that their true values are very close to the estimated values for the majority of the data set without any significant differences, which explains their smallest RMSE among all pollutants. This supports the conclusion made previously [
46] that two of the possible reasons why these estimates are highly accurate and precise could be (a) that these particulate matter are abundant and mix well with the ambient environment, thus having a higher probability of being inhaled by the participant and entering the sensors placed nearby and (b), with the minute size of PM
2.5, that these particulates, when inhaled, can reach deep into the lungs and bloodstream, creating many negative health effects [
3,
5,
6], thereby impacting the human body to a large extent.
Air quality components include not only particulate matter but also gaseous pollutants such as CO
2, NO
2, and NO, which were included in this study. The methodology that was implemented to estimate and understand autonomous responses in the human body can be used for gaseous pollutants such as CO
2 as well. The R
2 value, which is nearly 1, between the true and estimated values of CO
2 in the test set using a small number of biometrics supports this claim, as shown in
Table 9. Making the model simpler by considering a small number of biometrics also appears to have reduced the RMSE between the true and estimated values of CO
2, which can be seen clearly by comparing the time series in
Figure 12a and
Figure 16a. Given the several physiological changes brought about by inhaling CO
2, such as changes in lung and cardiovascular function [
33], cognitive issues [
54], sweating [
55], and the inflammation of airways, these autonomous responses can indeed be used to predict the concentration of CO
2 with high accuracy.
The results of estimating NO
2 and NO for the entire range of data were not very accurate, as indicated by the value of R
2 and RMSE between the true and estimated values of the corresponding gas shown in
Table 9. However, the scatter diagram of these two gases in
Figure 14e,h and the quantile–quantile plot of both of these gases in
Figure 14f,i indicate that the prediction is reliable to some extent for lower values of the gas when there is a higher concentration of data, as the data points in the plot are close to their corresponding 1:1 line, respectively. As the number of data points decreases for higher values of these two gases, the data points in the scatter plot and the quantile–quantile plot deviate from their corresponding 1:1 line, with one possible reason being the very small number of data points for the machine learning model to learn from in this region of data. Moreover, Pearson’s correlation coefficient (R) is highly susceptible to outliers when few data points deviate from the 1:1 line can largely affect the value of R. This could have possibly reduced the precision when the entire data set was considered for study. This claim is supported by the scatter diagram in
Figure 14b and the quantile–quantile plot in
Figure 14c of CO
2, where the data points deviate from the corresponding 1:1 line between 700 ppm and 800 ppm, one possible reason being the scarcity of data points in that region of data for the machine learning model to learn from and then to be tested on an independent test set. Improvements to the result in future work can possibly be made with either more expansive data collection or better machine learning models that can learn with a limited set of data to better minimize the error and then be tested in an independent test set.
Another possible reason for the results concerning NO and NO
2 not being highly accurate could also be that the autonomous responses during the process of data collection were dominated by PM
1 particles. As shown in the time series graph in
Figure 16b,c, the concentration of NO
2 and NO was under 80 ppb and 90 ppb, respectively, with occasional high concentration. The concentration of PM
1 particles during these trials was between 0.708 and 7.655
g/m
3. As mentioned before, since these minute particles, when inhaled, can pass through the nose and reach deep into the lungs and bloodstream, the immediate changes in the body were, thus, most likely dominated by these PM
1 particles for which the estimation of PM
1 using biometric variables was very high with the R
2 between the true and estimated values being 0.91 [
46].
The result for all these air quality components shows that a small number of biometric variables used to estimate these pollutants provide similar and, in some cases, better results. In fact, the results are significantly better for NO and NO2. Reducing the number of dimensions in a small data set, thus, seems to be more efficient in predicting the concentration, rather than a large number of input features. This aligns with Occam’s razor principle suggesting that a simpler model usually generalizes well. Moreover, the reduction of the number of variables, that is, reducing the number of dimensions, was a necessity, given the small number of data sets compared to the large number of biometric variables for which data were collected.
There were a few limitations to this study that can possibly be removed in future work. One of them was the collection of data from a single participant for CO2, NO2, and NO. Multiple trials have been conducted to mitigate the issue. Future work can include more extensive data collection from multiple participants to provide further confirmation. However, the data collection in this experimental paradigm was performed on multiple days, with multiple trials under different environmental conditions; the results are, thus, likely to hold in a variety of environmental situations, probably except for situations with extreme weather. Moreover, due to the experimental paradigm involving a static bike ride, in which the study was conducted using multiple participants, and measurements of PM1 and PM2.5 were obtained, the results will also likely hold over a variety of populations. The other limitation of the study involved readings from some of the electrodes in the EEG headset that could be distorted due to activities such as blinking, head movement, swallowing, jaw clenching, neck movement, and tongue movement, which are frequent when a participant is cycling. This results in a lot of noise in the data that can be removed, but these activities are frequent, and the procedure can significantly reduce the number of data records. However, the results show that the removal of EEG data as biometric variables also yields similar results.
The methodology used in this study presents a unique application of machine learning. The use of biological measurements as input features for machine learning models can predict the concentration of air quality components such as PM1, PM2.5, and CO2 with high degrees of accuracy. We can, thus, know the quality of air in microenvironments just by using a small set of biological measurements. Furthermore, with the use of predictor ranking, we can observe which biological parameter is most affected by these air quality components. Since the study was conducted outdoors, the participants were inhaling a mixture of varying pollutants. In order to study the direct effects of these pollutants, participants could be placed in a closed chamber with autonomous responses examined by artificially varying just one of the pollutants. A study can be conducted using just an EEG headset and observing how different areas of the brain can be affected when various components of air quality are inhaled. Future work can also conduct studies concerning other pollutants, such as lead, carbon monoxide, and volatile organic compounds.
Since this study was conducted on different days under different environmental conditions, confounding variables in the experimental setup were expected. For example, the ambient temperature can affect skin temperature and the GSR sensor as well. Future work can measure these environmental variables and identify these variables via causal analysis.