Robust Confidence Intervals for PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina

In this article, robust confidence intervals for PM2.5 (particles with size less than or equal to 2.5 μm) concentration measurements performed in La Carolina Park, Quito, Ecuador, have been built. Different techniques have been applied for the construction of the confidence intervals, and routes around the park and through the middle of it have been used to build the confidence intervals and classify this urban park in accordance with categories established by the Quito air quality index. These intervals have been based on the following estimators: the mean and standard deviation, median and median absolute deviation, median and semi interquartile range, a-trimmed mean and Winsorized standard error of order a, location and scale estimators based on the Andrew’s wave, biweight location and scale estimators, and estimators based on the bootstrap-t method. The results of the classification of the park and its surrounding streets showed that, in terms of air pollution by PM2.5, the park is not at caution levels. The results of the classification of the routes that were followed through the park and its surrounding streets showed that, in terms of air pollution by PM2.5, these routes are at either desirable, acceptable or caution levels. Therefore, this urban park is actually removing or attenuating unwanted PM2.5 concentration measurements.


Introduction
Particulate matter (PM) is a mixture of particles of different compositions, sizes, and origins, which for different reasons are in the air [1]. According to [1], the range of values of the aerodynamic diameter of these particles is from less than 100 nm up to a few micrometers. In accordance with [2], these particles can be classified according to their size into coarse particles (size from 2.5 µm up to 10 µm, PM 10 ) and fine particles (size less than or equal to 2.5 µm, PM 2.5 ). Furthermore, particles whose size is smaller than 0.1 µm are called ultrafine particles [1,2].
The reason why it is important to study particulate matter is because it affects human health. In short, it affects lungs, harms the respiratory system, and reduces life expectancy [3]. According to [4], PM 2.5 causes respiratory inflammation, cancer, and asthma [5][6][7]. Additionally, in accordance with [8], it affects both the cardiovascular system [9,10] and the nervous system [11], among others.

Study Area and Considered Data
As the study area is the same as in [12][13][14], this article shows only the figure that describes the park and the random variables considered. All this information has been taken directly from [13]. In accordance with [13], Figure 1 shows the park and routes that were followed to perform the measurements. The exact description of the random variables X 1 , X 2 , X 3 , Y 1 , Y 2 , and Y 3 is given in [13]. Therefore, a simple explanation of these variables is given below.
• X 1 is a route that represents the sidewalk on Avenida de los Shyris (shown as Shyris Av. in Figure 1) that is in front of La Carolina Park. • X 2 is a route that represents the sidewalk that is situated between La Carolina Park and Avenida de los Shyris. • X 3 is a route through the center of the park. • Y 1 is a route through Avenida República del Salvador (shown as Rep. Del Salvador Av. in Figure 1) that people follow to go to the center of the park. • Y 2 is a route through Avenida Portugal (shown as Portugal Av. in Figure 1) that people use to go to the center of the park. • Y 3 is a route that represents the sidewalk that is situated between La Carolina Park and Avenida Naciones Unidas (shown as Naciones Unidas Av. in Figure 1).
Sensors 2020, 20, 654 3 of 19 Figure 1. Park and route that was followed to perform the measurements. Urban park: Space delimited by the yellow lines; Route: Red dashed lines. (This figure has been taken from [13]).
In accordance with [12][13][14], the PM2.5 measurement instrument used in this research was a portable CEL-712 Microdust Pro monitor paired with a GPS (Global Positioning System). The calibration results of the measurement instrument are shown in [13]. The measurements were performed at a walking speed of 2 km h ⁄ and a 1.5 m height, and from 8:00 a.m. to 10:00 a.m.
because air pollution is the worst during these hours [13]. The conclusions of the analysis of the pollution levels obtained in [12][13][14] were that there are significant differences between the six variables analyzed and that the park acts as an air pollution filter. In [13], a statistical summary of each variable and graphics was initially made as a time series  [13]).
In accordance with [12][13][14], the PM 2.5 measurement instrument used in this research was a portable CEL-712 Microdust Pro monitor paired with a GPS (Global Positioning System). The calibration results of the measurement instrument are shown in [13]. The measurements were performed at a walking Sensors 2020, 20, 654 4 of 19 speed of 2 km/h and a 1.5 m height, and from 8:00 a.m. to 10:00 a.m. because air pollution is the worst during these hours [13].
The conclusions of the analysis of the pollution levels obtained in [12][13][14] were that there are significant differences between the six variables analyzed and that the park acts as an air pollution filter. In [13], a statistical summary of each variable and graphics was initially made as a time series and box-plot. Thus, the number of observations available were presented, and some of the variables indicated more fluctuations than others. In addition, in almost all variables, the existence of extremely remote values on the right of the central group was observed, as well as a lack of normality in the distribution of the variables, which was due to the lack of symmetry. High values of the shape measures (skewness and kurtosis) were also observed in the variables.
In addition, in [13], a smoothing technique based on simple moving averages was used, with the aim of reducing the influence of each individual data [29]. Furthermore, in order not to suppress any observation, variable changes were made with a view to achieving the adjustment of the data. However, although it was possible to adjust the values of certain variables to heavy-tailed distributions, it was not possible to properly adjust all of the variables.
After having performed what was mentioned in the previous paragraphs, in [13], non-parametric bilateral confidence intervals were constructed, based on the Wilcoxon-Mann-Whitney test, in order to test whether the samples taken from the six variables came from a population that have a common median [27,28]. Thus, it was concluded that the variables were classified, according to the categories of the air quality index of Quito [30], into four groups: a group formed by variable Y 3 , another group formed only by Y 2 , a third group formed by variables X 1 and X 2 , and, finally, the group formed by variables X 3 and Y 1 .
Due to the high number of outliers in the variables, it was decided to use a robust analysis in [13]. Therefore, in order to find estimates where the center of symmetry of the distribution could be found, L-estimators of the location were used, which are linear combinations of order statistics [24,25].
On the other hand, to determine the variability of the data, different scale estimators were used. Specifically, the mean of the deviations from the mean (MAD mean ), the median absolute deviation (MAD), and the semi interquartile range (SIR) were used. Additionally, the biweight midvariance scale estimator (S bi (c)) was used, which is based on an M-estimator of the location, since it has a greater efficiency than conventional scale measurements. Finally, the least median of squares (LMS) punctual estimator was used [31,32].
Robust statistics are characterized by the influence curve, which shows the influence that an observation can have compared to the rest of the observations [31]. In the case of non-robust estimators, these influence curves are not bounded. Therefore, the appearance of an observation that is considerably far from most of the data greatly affects non-bounded curves of influence and, therefore, non-robust estimators. But they do not have such a strong influence on robust estimates. Therefore, in [13], it was determined that the MAD, SIR, and LMS estimates were the most stable, and the variables were classified based on the scale estimators. These influence curves also have other properties that differentiate them from each other, such as being continuous or differentiable.

Method
A standard way to establish confidence intervals, as well as hypothesis contrasts, is to consider that the statistic given by Equation (1) follows a Student's t-distribution with n − 1 degrees of freedom: where n is the sample size, x is the sample mean, µ is the expected mean value, and s is the sample standard deviation. Thus, the classic confidence intervals for the mean are of the form: where α is the significance level and 1 − α is the confidence level. In this way, the distribution of T is symmetric and zero mean. These hypotheses are usually true if the distribution from which the sample was obtained is approximately Gaussian. However, in the case under study the hypotheses mentioned above are not met, because the variables are heavy tails. As indicated in [24], these deviations from the starting assumptions have as a basic problem the increase in the length of the confidence intervals, since increasing the standard deviation increases that length. In addition, there are other problems related to hypothesis contrasts, such as controlling the probability of a type I error (that is, rejecting the null hypothesis when it is true), because the estimator that is used is biased [26]. For the moment, confidence intervals of the form given by Equation (3) will be established, based on analyses performed in [24]: where T is a location estimator, ω is a scale estimator, t * is a constant related to the Student's t-distribution, and n is the sample size. For robust intervals, estimators based on point statistics and three families of estimators will be considered. In this sense, it is important to mention that the a-trimmed mean family, with 0 ≤ a ≤ 0.5 [24,25], consists in suppressing a × 100% observations from both the left and the right, and then finding the average of the observations not suppressed. For this estimator, an approximation of its standard deviation is the Winsorized standard error of order a of the sample (X 1 , . . . , X n ), s W (a) [26].
In a few words, if the ordered sample is X (1) ≤ X (2) ≤ . . . ≤ X (n) and k = n·a, with h being the floor function of the positive real number h, then the Winsorized sample, W (1) ≤ W (2) ≤ . . . ≤ W (n) , is obtained by changing the k lowest values of the sample with X (k+1) and the k highest values of the sample with X (n−k) . In this article, the family of estimators based on the Andrew's wave and the family based on biweight estimators are also used [26]. Therefore, the following pairs of estimators will be considered:

4.
a-trimmed mean [13] and Winsorized standard error of order a, T(a), s W (a) [26]: a is the mean of the a-Winsorized sample.

5.
Andrew's wave, (T ωa , s ωa ) [26]: Sensors 2020, 20, 654 6 of 19 Then 6. Biweight, (T bi , s bi ) [13]: then Table 1 shows the value of the estimators for each variable. However, in accordance with [24], taking into account situations with Gaussian distributions, with an outlier (one-wild), or with trimmed distributions (slash), the best results in terms of efficiency are obtained with the M-estimators of the location, that is, with the last two pairs of estimators of Table 1. Once the estimators were selected, the next step was to establish the t * constants of Equation (3). In order to do this, according to [33,34], for the first pair of robust estimators (M, MAD), the percentiles of a Student's t-distribution with n − 2 degrees of freedom were chosen. For the second pair of robust estimators (M, SIR), t * = t n−1 /1.075 was chosen. The constant t * for the family T(a), s W (a) was taken from the Student's t-distribution with n − 2 × n × a − 1 degrees of freedom. For the families of M-estimators based on the Andrew's wave and on the biweight estimators, the percentiles of a Student's t-distribution with 0.7 × (n − 1) degrees of freedom were chosen. In the case where 0.7 × (n − 1) was not an integer, the next integer greater than 0.7 × (n − 1) was chosen as the degree of freedom of the Student's t-distribution.

Confidence Interval for Each Parameter
In addition to the previous confidence intervals, another interval was included for the median that was performed using the bootstrap-t method [26]. With a (1 − α) confidence level, this confidence interval was given by Equation (12): where M is the median of the original sample, and, for the bth bootstrap sample, b = 1, . . . , B, s * is the unbiased estimator of the standard deviation, M * is the median, and t * 1−α/2 and t * α/2 are the percentiles of the statistic M * b given by Equation (13): For the case under study, as the number of samples for each of the variables was greater than 70, B = 499 bootstrap samples were generated in order to ensure that (1 − α) = 0.95 was proportional to 1/(B + 1). Table 2 shows the length of the confidence interval (see Equation (3)) of each pair of estimators. Additionally, both the intervals obtained by using the bootstrap-t method and the nonparametric intervals obtained in [13] are shown in Table 2. Table 2. The length of the confidence intervals at (1 − α) = 0.95.

Pair of Estimators
Length of the Confidence Interval (mg/m 3 )  (4) and (5)) for 10% and 20% pruning on each side of the samples. In these figures, it can be seen that the confidence intervals based on the Andrew's wave and the biweight estimators are practically the same for all variables. The intervals built from the median and the median absolute deviations are the smallest in all the variables, and for most of the variables they neither contain the trimean nor the a-trimmed mean. The nonparametric intervals for the median contain the robust intervals based on the Andrew's wave and those based on the biweight estimators, being generally a little wider. In addition, the nonparametric intervals contain almost all other intervals built with robust estimators, except those based on the a-trimmed mean and Winsorized standard error of order a.
and Winsorized standard error of order , for = 0.1 , which only suppresses 10% of the observations at each end. Furthermore, according to [13], is the variable with more observations that exceed the acceptable level of the PM2.5 concentration [30] and the one with the greatest variability. This is also ratified, because firstly their confidence intervals are the most extensive among all the variables, and secondly only the confidence intervals found through bootstrap techniques, nonparametric intervals, and some robust intervals contain the trimean.             As was done in [13], Figures 8-10 show 95% confidence intervals for the medians of the variables under study and different categories of air pollution by PM2.5 that are defined in [29]. However, unlike [13], the robust confidence intervals constructed by using the following pairs are presented here: (M, SIR) , ( , ) , and (M, * ) . It is important to mention that the confidence intervals based on the Andrew's wave have not been included, firstly due to the analogy of these with the intervals based on the biweight statistic and, secondly, because the intervals based on the biweight statistic are shorter than those based on the Andrew's wave. The confidence intervals based on (M, MAD) have not been included because they are a subset of the confidence intervals based on (M, SIR). In addition, the confidence intervals based on the 0.1-trimmed mean have also not been In addition, the fact that in some variables the intervals built with the family of estimators based on the a-trimmed mean, for a = 0.1, appear somewhat longer than others and slightly shifted to the right, indicates that they are more influenced by observations away from the center of the data on the right. The trimean of each variable is found in all non-classical intervals, except for those based on the median and the median absolute deviation. However, the a-trimmed mean with both 10% and 20% are not always found in the built confidence intervals.
In accordance with [13], X 1 has a bias towards the right. In addition, it is the only variable that has values that exceed all levels of air quality, which is why the confidence interval based on the mean and the standard deviation is very wide (see Figure 2). This interval is shifted towards high values and does not even cover the value of the median or the other robust location estimators. That is, it is greatly influenced by these extreme observations.
With respect to X 2 , in [13] it was shown that this variable is considered among the variables that exhibit a better behavior, taking into account both the location and dispersion measures. Therefore, this variable can be considered to be light tails. For this reason, all the confidence intervals shown in Figure 3 are very similar and are also smaller, in each type of interval, than the intervals of the rest of the variables. Like X 1 , X 2 only contains the trimean and a-trimmed mean for 10% pruning, while the a-trimmed mean for 20% pruning is only contained in the classical interval and in the interval built by using the a-trimmed mean and Winsorized standard error of order a, for a = 0.1.
Although, in [13], it was shown that X 3 is the variable with many observations with very low values, it also has a distribution with light tails and has more variability than X 2 . For this reason, the nonparametric confidence interval (see Figure 4) is almost twice as wide as the other confidence intervals, except in the case of the interval found by the bootstrap method. In addition, it is important to mention that all intervals practically contain the considered location measurements. The interval constructed using the bootstrap method is among those with the longest length of all the variables, except for the confidence intervals of the Y 3 variable (see Figure 7).
It has already been established in [13] that among the variables with a greater variability there is Y 1 , which has few extreme observations and a bias towards high values, and it can be seen that the lengths of the confidence intervals are also large but are displaced towards low values of the variable (see Figure 5). This is because, by suppressing the extreme observations, the remaining observations are concentrated in low values of the variable. The above justifies the fact that the interval based on the a-trimmed mean and Winsorized standard error of order a, for a = 0.1, does not contain the trimean. Finally, taking into account the similarity of the confidence intervals and the fact that the a-trimmed means for 10% pruning are outside the nonparametric confidence interval, it is observed that the a-trimmed means are only contained in the classical confidence interval and in the interval that is centered on T(0.1).
With respect to Y 2 , in [13] it was determined that it does not resemble the rest of the variables, as far as centralization measures are concerned. This variable has many observations that influence the variability. In addition, together with X 2 and X 3 , Y 2 is the variable in which the confidence intervals are more similar to each other (see Figure 6). In the case of Y 2 , it is observed that no confidence interval contains the a-trimmed mean for 20% pruning.
Having determined in [13] that the distribution of the values of Y 3 may correspond to a distribution of heavy tails, with bias towards high values, it is already possible to corroborate that the confidence interval based on the fact that the mean and standard deviation are offset from the remaining intervals (see Figure 7), except in the case of the interval based on the a-trimmed mean and Winsorized standard error of order a, for a = 0.1, which only suppresses 10% of the observations at each end. Furthermore, according to [13], Y 3 is the variable with more observations that exceed the acceptable level of the PM 2.5 concentration [30] and the one with the greatest variability. This is also ratified, because firstly their confidence intervals are the most extensive among all the variables, and secondly only the confidence intervals found through bootstrap techniques, nonparametric intervals, and some robust intervals contain the trimean.
As was done in [13], Figures 8-10 show 95% confidence intervals for the medians of the variables under study and different categories of air pollution by PM 2.5 that are defined in [29]. However, unlike [13], the robust confidence intervals constructed by using the following pairs are presented here: (M, SIR), (T bi , s bi ), and (M, s * ). It is important to mention that the confidence intervals based on the Andrew's wave have not been included, firstly due to the analogy of these with the intervals based on the biweight statistic and, secondly, because the intervals based on the biweight statistic are shorter than those based on the Andrew's wave. The confidence intervals based on (M, MAD) have not been included because they are a subset of the confidence intervals based on (M, SIR). In addition, the confidence intervals based on the 0.1-trimmed mean have also not been included, because these are not built for the median but for T(0.1). On the other hand, the bands that delimit the three lowest categories in which the air quality is classified in the city of Quito have been included, according to the levels of air pollution in this city by PM 2.5 concentrations [30].       Figures 8 and 10, the following can be said: 1. One cannot reject that variables and can have equal medians, and, with a 95% confidence level, it is rejected that they are the medians of any of the other variables, since the acceptance limits of the first two variables do not include the acceptance limits of the others. 2. One cannot reject that the variables , , , and can have equal medians. 3. When removing the observations of the tails of the distributions, the variables and are those that present less variability, and the variables and have more variability than the rest.
At this point, both [13] and this article have shown that the variables that have the greatest variability are and . These variables have the worst behavior, with very high values, because they contain critical pollution points. However, as the other variables are either routes followed through the center of the park or at the edges of the park, the pollution levels on these routes are not as critical. Therefore, the section will be finalized by showing the robust confidence band graphs for variables and , because these are the variables that have more observations shifted towards high values. Representing the graphs of the robust confidence bands for the other variables, which have a good behavior, does not contribute significantly to this article from a scientific point of view.

Robust Confidence Band Graphs
The graphs of the robust confidence bands for and are shown in Figures 11-16. The families of estimators based on the -trimmed means, Andrew's wave, and biweight estimators have been taken into account, because their location estimators's influence curves are bounded and are softer than the influence curves of the rest of the location estimators [24].
From Figures 11-16, for the estimators ( ), ( ) , it is observed that for there is a slight decrease in the location measurements and a constant amplitude of the confidence band (see Figure  11). In the graph corresponding to (see Figure 14), the location measurements change their trend, From Figures 8-10, it can be seen that for the confidence intervals found with the biweight estimators (see Figure 9) it is possible to discriminate more precisely the equality of the medians, obtaining results comparable to those obtained with the non-parametric intervals. That is, the variables can be classified into the following four groups: {Y 2 }, {Y 3 }, {X 1 , X 2 }, and {X 3 , Y 1 }. In addition, the medians of X 3 and Y 1 are strictly contained in the Desirable level, the medians of the variables X 1 , X 2 and Y 2 are contained in the Acceptable level, and it is rejected that these medians may belong to the other levels. Furthermore, the median of variable Y 3 can be in the Acceptable level or Caution level.
With respect to the families of the confidence intervals based on (M, SIR) and (M, s * ), shown respectively in Figures 8 and 10, the following can be said:

1.
One cannot reject that variables X 1 and X 2 can have equal medians, and, with a 95% confidence level, it is rejected that they are the medians of any of the other variables, since the acceptance limits of the first two variables do not include the acceptance limits of the others.

2.
One cannot reject that the variables X 3 , Y 1 , Y 2 , and Y 3 can have equal medians.

3.
When removing the observations of the tails of the distributions, the variables X 1 and X 2 are those that present less variability, and the variables Y 1 and Y 3 have more variability than the rest.
At this point, both [13] and this article have shown that the variables that have the greatest variability are X 1 and Y 3 . These variables have the worst behavior, with very high values, because they contain critical pollution points. However, as the other variables are either routes followed through the center of the park or at the edges of the park, the pollution levels on these routes are not as critical. Therefore, the section will be finalized by showing the robust confidence band graphs for variables X 1 and Y 3 , because these are the variables that have more observations shifted towards high values. Representing the graphs of the robust confidence bands for the other variables, which have a good behavior, does not contribute significantly to this article from a scientific point of view.

Robust Confidence Band Graphs
The graphs of the robust confidence bands for X 1 and Y 3 are shown in Figures 11-16. The families of estimators based on the a-trimmed means, Andrew's wave, and biweight estimators have been taken into account, because their location estimators's influence curves are bounded and are softer than the influence curves of the rest of the location estimators [24].
to the median are contemplated, and, as increases, wider confidence intervals are obtained. For low values of , the estimator shows many fluctuations. Therefore, the confidence band is much more variable. Furthermore, the confidence intervals chosen for (see Figure 13) and (see Figure 16) contain the median and the trimean.
Finally, it is important to note that has much more variability than , although the latter has higher observations than the rest of the variables. It is also confirmed that the confidence intervals for are always wider. The behavior of the families of the estimators ( , ) and ( , ) is very similar, and the intervals they generate are narrower than the nonparametric intervals that were found. The behavior of the family ( ), ( ) is very different from that of the other two families.
This family produces confidence intervals with a more constant amplitude, but it cannot be assured that these intervals contain the median. In addition, there is a large difference between the confidence intervals for and . Figure 11. 95% robust confidence band based on the -trimmed mean and Winsorized standard error: .  From Figures 11-16, for the estimators T(a), s W (a) , it is observed that for X 1 there is a slight decrease in the location measurements and a constant amplitude of the confidence band (see Figure 11). In the graph corresponding to Y 3 (see Figure 14), the location measurements change their trend, but the increase in the amplitude of the confidence band is much more noticeable. Moreover, in this graph the confidence band does not contain either the median or the trimean, unless only 10% of the central observations are taken into account.

Conclusions
In this article, a response has been given to a problem that was pending to be solved in previous research articles [13]. Specifically, robust confidence intervals were found for PM2.5 concentration measurements in the urban park called La Carolina, Quito, Ecuador. The main contributions of this article were that different techniques were applied for the construction of robust confidence intervals, that their results were compared, and that the results of the design of these robust intervals were applied to analyze whether the six variables considered in the study came from the same distribution, establishing the differences between the parameters that characterized those variables.
For the construction of the confidence intervals, the classical, non-parametric, bootstrap, and, mainly, several pairs of robust statistics were used. Classic confidence intervals make use of the hypothesis that the observations come from a Gaussian, or approximately Gaussian, distribution. However, what happened was that the variables that were given contained numerous observations with extreme values on the right, which did not meet the hypothesis that was assumed.
From the analysis of the variables under study, the following was concluded: (1) the median of was greater than that of all other variables; (2) the median of was different from all the medians of the other variables; (3) the medians of and could be the same but different from all others (that is, lower than the median of but greater than the median of the other variables); and (4) the For the estimators (T ωa , s ωa ), it is observed that for low values of c the estimates are hardly affected by anomalous observations (see Figures 12 and 15). However, as c increases, the center of the interval increases, as does its amplitude, because they contemplate observations in a greater range. However, the confidence band for Y 3 (see Figure 15) is much wider than for X 1 (see Figure 12). This also happens for the family of estimators (T bi , s bi ) (see Figures 13 and 16), although it is true that the amplitude of the band for X 1 increases proportionally more than for Y 3 .
For the estimators (T bi , s bi ), as in the previous case for low values of c, only observations close to the median are contemplated, and, as c increases, wider confidence intervals are obtained. For low values of c, the estimator s bi shows many fluctuations. Therefore, the confidence band is much more variable. Furthermore, the confidence intervals chosen for X 1 (see Figure 13) and Y 3 (see Figure 16) contain the median and the trimean.
Finally, it is important to note that Y 3 has much more variability than X 1 , although the latter has higher observations than the rest of the variables. It is also confirmed that the confidence intervals for Y 3 are always wider. The behavior of the families of the estimators (T ωa , s ωa ) and (T bi , s bi ) is very similar, and the intervals they generate are narrower than the nonparametric intervals that were found. The behavior of the family T(a), s W (a) is very different from that of the other two families. This family produces confidence intervals with a more constant amplitude, but it cannot be assured that these intervals contain the median. In addition, there is a large difference between the confidence intervals for X 1 and Y 3 .

Conclusions
In this article, a response has been given to a problem that was pending to be solved in previous research articles [13]. Specifically, robust confidence intervals were found for PM 2.5 concentration measurements in the urban park called La Carolina, Quito, Ecuador. The main contributions of this article were that different techniques were applied for the construction of robust confidence intervals, that their results were compared, and that the results of the design of these robust intervals were applied to analyze whether the six variables considered in the study came from the same distribution, establishing the differences between the parameters that characterized those variables.
For the construction of the confidence intervals, the classical, non-parametric, bootstrap, and, mainly, several pairs of robust statistics were used. Classic confidence intervals make use of the hypothesis that the observations come from a Gaussian, or approximately Gaussian, distribution. However, what happened was that the variables that were given contained numerous observations with extreme values on the right, which did not meet the hypothesis that was assumed.
From the analysis of the variables under study, the following was concluded: (1) the median of Y 3 was greater than that of all other variables; (2) the median of Y 2 was different from all the medians of the other variables; (3) the medians of X 1 and X 2 could be the same but different from all others (that is, lower than the median of Y 3 but greater than the median of the other variables); and (4) the same as (3) happened with the medians of X 3 and Y 1 , which could be the same and smaller than the others.
Speaking in terms of air pollution and urban planning, the variable that could concern the citizen who lives and/or works around the La Carolina park is Y 3 because in [13] and in this article it has been shown that this variable presents location estimates that are remarkably higher than the rest of the variables, these estimates being between the Acceptable and the Caution levels. In addition, this variable is the one that provides higher scale estimates, showing differences with the remaining behavior patterns. The foregoing observation is in accordance with the location of the street on which the route to be followed to measure Y 3 was drawn. In [12][13][14], it was shown that both the direction in which the wind blows and the type of circulation around La Carolina Park render the Avenida Naciones Unidas (shown as Naciones Unidas Av. in Figure 1), which has been represented by Y 3 , the most likely to have higher levels of air pollution due to PM 2.5 concentrations. Therefore, the conclusions given by the authors of [12][13][14] are ratified.
Before finalizing the conclusions of this article, it is important to emphasize the meaning of the results that have been obtained in this research work. Therefore, it is important to highlight that the statistical analysis presented in this article has been conceptual and that this analysis has focused on summarizing a set of data into a few values that are representative of that set. In this way, it has been possible to characterize the population under study using these few representative values of the above-mentioned set. In addition, what has been said above has been done using robust techniques, which do not take into account all the values that have been collected in the sample of the population of interest. Specifically, the use of robust techniques has allowed extreme values to have little influence on the process of characterization of the sample of the population of interest.